Ticket #992 (closed defect: fixed)
FAM errors when syncing repository with rsync
Reported by: | https://www.google.com/accounts/o8/id?id=AItOawmAKncTPlNx89GjuQyu9LtzyN37kpWeSCw | Owned by: | https://www.google.com/accounts/o8/id?id=AItOawnSjgovXZr-_V3vGkvMSR0pc5LDykRc1Nc |
---|---|---|---|
Priority: | minor | Milestone: | Bcfg2 1.2.1 Release (Bugfix) |
Component: | bcfg2-server | Version: | 1.0 |
Keywords: | Cc: | [email protected]…, [email protected]… |
Description
We operate multiple bcfg2 servers all sharing the same repository. To do this, I edit the repository at a central location and periodically copy the entire repository to all servers using rsync. Whenever the sync runs, I get a lot of these errors on the bcfg2 servers (all for different files and directories):
bcfg2-server[5473]: error in handling of gamin event for default#012Traceback (most recent call last):#012 File "/usr/lib/pymodules/python2.6/Bcfg2/Server/FileMonitor.py", line 60, in handle_one_event#012 self.handles[event.requestID].HandleEvent?(event)#012 File "/usr/lib/pymodules/python2.6/Bcfg2/Server/Plugin.py", line 866, in HandleEvent?#012 self.entries[ident].handle_event(event)#012KeyError: '/etc/default'
The bcfg2 server also doesn't see any of the synced changes and requires a restart to apply the new configuration (the previous one continues to work). I suppose rsync is somehow locking the files while it's figuring out whether they need updating.
The server is 1.2.0pre1-3~testing1~maverick2+adc4130. The rsync command I use on each server is:
rsync -ar --exclude=.git --exclude=Packages/cache --exclude=Ldap/credentials.txt --exclude=.gitignore --delete rsync://master-server/bcfg2/ /var/lib/bcfg2
Is there a "supported" way of getting the repo to a number servers with having to restart them all the time?
Attachments
Change History
comment:1 Changed 12 years ago by https://www.google.com/accounts/o8/id?id=AItOawmAKncTPlNx89GjuQyu9LtzyN37kpWeSCw
- Component changed from bcfg2-client to bcfg2-server
comment:2 Changed 12 years ago by https://www.google.com/accounts/o8/id?id=AItOawmAKncTPlNx89GjuQyu9LtzyN37kpWeSCw
Correction: running rsync twice does not work in all cases. Adding new directories (e.g. for file templates in TGenshi) results in
bcfg2-server[30724]: Failed to bind entry: Path /path/to/file
on the server and
The following entries are not handled by any tool: Path:None:/path/to/file
on the client. These entries work fine after the server has been restarted.
I wonder if it would be possible to implement a "reload" action for the server that forced the server to re-read all files in the repository.
comment:3 Changed 12 years ago by https://www.google.com/accounts/o8/id?id=AItOawmAKncTPlNx89GjuQyu9LtzyN37kpWeSCw
Here is another (somewhat simpler) way to reproduce the problem:
cp -r /var/lib/bcfg2 /var/lib/bcfg2_modified echo "foobar" >> /var/lib/bcfg2_modified/TGenshi/etc/motd.tail/template.newtxt mkdir /var/lib/bcfg2_original mv /var/lib/bcfg2/* /var/lib/bcfg2_original/ && mv /var/lib/bcfg2_modified/* /var/lib/bcfg2/
Clients connecting after that will still received the unaltered motd.tail.
comment:5 Changed 12 years ago by solj
- Milestone changed from Bcfg2 1.2.0 Release to Bcfg2 1.2.1 Release (Bugfix)
comment:6 in reply to: ↑ description Changed 11 years ago by solj
Replying to https://www.google.com/accounts/o8/id?id=AItOawmAKncTPlNx89GjuQyu9LtzyN37kpWeSCw:
Is there a "supported" way of getting the repo to a number servers with having to restart them all the time?
Nothing to do with the rest of the ticket, but the well-supported way of distributing the repo to multiple servers is to use some sort of VCS. I have not experienced any of the issues detailed in this ticket while using either svn or git for the repository backend.
comment:7 Changed 11 years ago by https://www.google.com/accounts/o8/id?id=AItOawnSjgovXZr-_V3vGkvMSR0pc5LDykRc1Nc
- Owner changed from desai to https://www.google.com/accounts/o8/id?id=AItOawnSjgovXZr-_V3vGkvMSR0pc5LDykRc1Nc
- Status changed from new to accepted
comment:8 Changed 11 years ago by https://www.google.com/accounts/o8/id?id=AItOawnSjgovXZr-_V3vGkvMSR0pc5LDykRc1Nc
- Status changed from accepted to closed
- Resolution set to fixed
I got bit by this earlier this week, and I use SVN, not rsync. I'm not sure what exactly causes it, but it seems like the FAM doesn't send the "created" event, just "changed" events.
Anyhow, DirectoryBacked? plugins were already forgiving of this behavior, so I've made GroupSpool? plugins (Cfg, TGenshi, etc.) forgiving as well. If a "changed" event is received on a file the plugin doesn't already know about, it will warn and process it as a "created."
Fixed in https://github.com/Bcfg2/bcfg2/commit/1781ba4ec2ec749d9a5773adf202322381fd8bff
I have some additional insight:
On 1.2.0pre3-1~testing1~natty1+4f76272 I monitored what goes on in the area where the error is thrown.
First I prevented the KeyError? from occuring by doing this in line 929 of Plugin.py:
Within the same method, I inserted a debug message (see below).
I observed that the changes from rsync are only recognized by bcfg2-server after rsync has run a second time (even though the changed files are already present after the first run and nothing changes between the two runs).
Here is my debug output for the first run:
and for the second run:
So for some reason, the first run has an additional call to HandleEvent?, yet still fails to recognize the change.
The KeyError? mentioned in my original message is apparently unrelated and due to that rsyncing created "changed" actions for directories in HandleEvent?() which weren't expected in the code. The above "patch" fixes the KeyError? spam.
Unfortunately, I'm at a loss as to why it takes two rsync runs to recognize a change.