The Inner Guts of Bitbucket

By on August 11, 2014

Recently our teammate and Bitbucket engineer Erik Van Zijst had the opportunity to present at Euro Python 2014 in Berlin. Check out this video of his session on the Inner Guts of Bitbucket and get a detailed overview of our current architecture at all layers from Gunicorn and Django to Celery and HAProxy to NFS.

In addition to the inside scoop into Bitbucket’s inner workings, this video covers some war stories and shows how we too have to learn things the hard way sometimes.


  • Clive
    Posted August 12, 2014 at 7:36 am | Permalink

    “Sometimes we too have to learn the hard way”…? What you mean even brilliant, world-class and modest geniuses also have to learn the hard way?!

    • Radek
      Posted March 9, 2015 at 9:21 pm | Permalink

      I think he means you learn from your mistakes

  • Posted August 16, 2014 at 5:42 am | Permalink

    Thanks for the presentation. Very interesting to see how Bitbucket works. What’s the reason for running real hardware instead of virtual machines?

    • Anonymous
      Posted March 10, 2015 at 2:20 am | Permalink

      The only reason I see behind that is they omit the virtualization overhead to make things fastest. But that is just as far as I can see.

      • Posted March 10, 2015 at 6:09 am | Permalink

        I’m aware of the virtualisation overhead. But on the other side virtualisation makes the management of the infrastructure much easier.

        • Anonymous
          Posted March 10, 2015 at 6:16 am | Permalink

          There are times when you have to chose one over another, “right tool for the job”.That’s what I tried to suggest with my thought.

          • Posted March 10, 2015 at 11:08 am | Permalink

            That’s right! And since I know Docker I have another point of view. You can run Docker on bare metal. That’s the way in the middle. It’s dynamic and still faster than regular virtualisation.

          • Erik van Zijst
            Posted March 10, 2015 at 11:15 pm | Permalink

            Bitbucket was founded in 2008 on EC2 and EBS. The switch to bare metal was made in 2010 when it joined Atlassian.

            At Atlassian we managed (and continue to manage) a lot of our own hardware and moving it on there (without virtualization) gave us a huge performance boost over EC2.

            Back then we only ran 4 machines, so managing things was a lot easier.

            These events all predate Docker, which we now use throughout Atlassian for all kinds of things and I could see Bitbucket going virtualized again at some point in the future.

          • Posted March 11, 2015 at 12:05 am | Permalink

            Thanks for the insights. I started VersionEye in 2012 on Heroku and after a couple months already everything slowed down and it become really expensive. 2013 I moved it to bare metal for the half of the price and the performance improved dramatically. Today (2015) it’s running on EC2 because of the Amazon Activate Programm. At a certain size it makes sense to own your hardware.

  • Jardel Weyrich
    Posted February 26, 2015 at 7:45 pm | Permalink

    Great presentation Erik! I like the simplicity of your architecture 😉

    Your cache solution to minimize the impact of bcrypt-ing passwords all the time seems reasonable, although, IMHO that does not seem to tackle the underlying problem – The password shouldn’t be used to authenticate multiple times in a short period, mainly if it’s an expensive operation. You partially solve that with API rate-limiting (which I think you already have) – a decorator backed by NoSQL would do. Sites rely on cookies & sessions, and REST APIs can too. But there’s another (similar) approach: Once an account is authenticated, you could hand the client a time-limited authentication token based on HMAC. From this point, the client can rely solely on this token to perform any operation that requires authentication, until it expires. You may implement a renewal process as well. Anyway, if a server gets compromised, the intruder can always modify the authentication API to log/save plaintext passwords. Can’t fight that.

    • Erik van Zijst
      Posted March 10, 2015 at 11:03 pm | Permalink

      > Once an account is authenticated, you could hand the client a time-limited authentication token based on HMAC.

      For sure!

      We’re actually rather behind the times with our reliance on basic auth for things like the API and we’re working on adding OAuth 2 based token authentication (we currently offer OAuth 1 which can be a little cumbersome to use for clients) as a cheaper and safer alternative to password-based authentication.