Unplanned downtime today

By on January 5, 2012

This morning around 6:00 am UTC Bitbucket began to fail with intermittent 500 errors, which continued for several hours.

Our investigation shows that the root cause was our syslog server crashing. The syslog queues on all of our other servers filled up and they became unresponsive.

We’re currently re-examining our syslog configuration, particularly looking at on-disk queuing, to ensure this single point of failure is avoided.  We’re also adding new monitoring to detect if a similar situation were to reoccur.

We’re very sorry for the inconvenience this downtime caused. This and other service related updates can be found on our status site status.bitbucket.org.

  • Christian Grobmeier

    Glad you are back and hey, shit happens. Take it easy.

  • Christian Grobmeier

    Glad you are back and hey, shit happens. Take it easy.

  • Anonymous

    Yeah, glad you’re back. Ironically, even the status page was down during the downtime.

  • Anonymous

    Yeah, glad you’re back. Ironically, even the status page was down during the downtime.

  • Bitencode

    Note that your blog and (more importantly) status.bitbucket.org was also down (giving 503 errors) during that time, so status.bitbucket.org wasn’t very useful.  The only way I could find out anything about bitbucket this morning was twitter… :(
    (but I’m glad your back)

    • http://www.bitbucket.org Justen Stepka

      Thanks for letting us know that.

      We’re are looking to move our status site from our virtualized setup on EC2 to dedicated hardware to improve the service.

  • Bitencode

    Note that your blog and (more importantly) status.bitbucket.org was also down (giving 503 errors) during that time, so status.bitbucket.org wasn’t very useful.  The only way I could find out anything about bitbucket this morning was twitter… :(
    (but I’m glad your back)

    • http://www.bitbucket.org Justen Stepka

      Thanks for letting us know that.

      We’re are looking to move our status site from our virtualized setup on EC2 to dedicated hardware to improve the service.

  • Anonymous

    Thanks for the continued updates via Twitter.  They were (somewhat) reassuring :)

  • Anonymous

    Thanks for the continued updates via Twitter.  They were (somewhat) reassuring :)

  • Anonymous

    Hello,

    doh!  Really annoying when websites go down at 1am.

    Glad you got it sorted.

    cheers,

    I was thinking of two improvements when the site was down:
    1) backups
    2) mirror system

    1) Backups.  How can we make backups of all our data hosted on bitbucket?  There is some data that we currently can not backup except by manually scraping the website.  Issues is the main one still lacking the ability to backup.  A ‘download all’ link would be great.

    2) Mirror system.  A read-only mirror system could have allowed people to read their data – and sort of continue on with their work.

  • Anonymous

    Hello,

    doh!  Really annoying when websites go down at 1am.

    Glad you got it sorted.

    cheers,

    I was thinking of two improvements when the site was down:
    1) backups
    2) mirror system

    1) Backups.  How can we make backups of all our data hosted on bitbucket?  There is some data that we currently can not backup except by manually scraping the website.  Issues is the main one still lacking the ability to backup.  A ‘download all’ link would be great.

    2) Mirror system.  A read-only mirror system could have allowed people to read their data – and sort of continue on with their work.

  • http://erhanabay.com Erhan

    I wrote a little php console application to backup all your repositories.

    http://erhanabay.com/2012/01/06/bitbucket-repo-sync-application/

    You can keep them in synced with a cronjob.

  • http://erhanabay.com Erhan

    I wrote a little php console application to backup all your repositories.

    http://erhanabay.com/2012/01/06/bitbucket-repo-sync-application/

    You can keep them in synced with a cronjob.

  • Lidstromso
  • http://DevMentor.org Rajinder Yadav

    We grow form failure and discovering out weakness. I’m hoping for a better fail-safe redundant setup in the future =) … I still can’t seem to reach your site.

    • http://DevMentor.org Rajinder Yadav

      It seems my personal git path doesn’t work anymore, however going through the regular way seem to be working but slow.