Subversion master undergoing emergency maintenance
Posted on: 2014-12-03 17:52:32+00:00
The primary master machine that hosts the Apache Software Foundation's subversion repositories is currently undergoing some emergency maintenance due to disk errors.
We do not currently have an ETA on when this will be fixed.
In the meantime, there will be no access to commit to SVN.
The read-only mirror at svn.eu.apache.org is still working.
UPDATE: 18:30 UTC, 3 December 2014
The machine that hosts the SVN master suffered root filesystem corruption. This corruption led to a severe degradation of the SVN service, and to repair the issue the service was taken down.
This filesystem is separate from the filesystem that hosts the SVN repositories. We expect no data loss from this issue. (And we have multiple copies of this data available to us.)
We'll be keeping this blog post updated with more details as they become available.
UPDATE: 21:30 UTC, 3 December 2014
We've removed the master from DNS rotation, so read-only access remains accessible everywhere.
Commits to SVN remain disabled while we work on restoring the service.
UPDATE: 04:45 UTC, 4 December 2014
The service remains offline while we work on moving the service to a new host. During the work to resolve the failed disks on eris (the previous host) it became apparent that it would not be the best use of our time to keep working on this (and we had frankly lost faith in the disks).
We are now several hours into this move. The data has been synchronised to the new host, and now we are working on porting the configuration of the old host into puppet and making it fit the new setup on which it will be run. We don't currently have an exact time when we think it will be finished, but we are hopeful it will be during Thursday 4th December 2014.
We'd like to apologise the downtime, but we are taking actions that we feel are in the best interests of a key piece of foundation infrastructure. As always you can come and find us in the Hipchat channel #asfinfra - https://www.hipchat.com/gdAiIcNyE if you have any questions.
UPDATE: 11:18 UTC, 4 December 2014
We are performing sanity checks on the new puppetized configuration. For historical reasons, our svn system has relied on specially crafted versions of svn, which we are attempting to replace with canonical release versions instead, so as to easier set up a new host, should we experience another major outage. This entails a lot of rewriting of scripts, but we expect most of this to have been done now, pending a full system check.
Once all this is done, we will be performing authorization checks to make sure everything is as it should be, and when satisfied, we will reopen the svn repo for committers.
The ETA is still uncertain, but remains a hopeful "today" (Thursday, December 4th).
UPDATE: 16:15 UTC, 4 December 2014
We are nearly there. We are currently putting the finishing touches to the config, and we will begin closed testing within the infrastructure group very soon. Assuming this goes well we will aim to open the service as soon as possible after this.
The delay will come when we ensure that no data could be lost as a result of re-starting the service. Data security and provenance is our utmost concern.
More news to follow in the next couple of hours hopefully.
UPDATE: 03:01 UTC, 5 December 2014 [FINAL UPDATE]
Well. As of 5 minutes ago the main subversion service was restored. Only one repository is currently not available, the dist repository used by projects to stage dev and release outputs. This will be fixed ASAP.
If you spot any issues with the service, in the first instance please hop onto HipChat and chat to us - https://www.hipchat.com/gdAiIcNyE. Or you can use the usual email address firstname.lastname@example.org if you prefer that.
This outage has forced us to review the setup of the primary subversion host and as a result of this we have made many changes to bring it inline with our current practice and standards. This involved re-engineering quite a lot of things that had accumulated over the years, and like many a good onion the more layers we peeled back the more we sobbed.
We are happy to report that this host is now completely managed with puppet, and is delivering metrics to our instance of Circonus very happily.
Once again thank you for your patience and we hope that the service feels a lot more sprightly on it's new host.
On behalf of the Apache Infrastructure Team