By pkaeding on October 30, 2012
At Bitbucket (and throughout Atlassian) we are constantly dogfooding our own products. This helps us flesh out requirements and find bugs. So, naturally, we host the Bitbucket code in Bitbucket as well. We use pull requests to review team members’ code before merging it in and deploying. And, like many of you, we have been wanting inline comments on pull requests for a long time.
As we developed the inline comment feature for pull requests and commits, we discovered a couple of problems while dogfooding. The first was related to the comment drift algorithm, and the second related to performance. Keep reading to learn how we solved these problems!
When you are doing code reviews, the ability to leave comments on a particular line is not a new feature. Atlassian Crucible has had inline comments in reviews since 2007, and I’m sure there have been other examples before that as well.
The lack of inline comments was the primary complaint from other Atlassian developers when they moved their projects to Bitbucket. Many others held off on moving their projects to Bitbucket for this reason as well. We at Bitbucket felt the pain of not having inline comments every day. But we also had a pretty good idea of how we wanted it to work, from using Crucible reviews for years. We knew what we liked about Crucible, and what we wanted to do differently.
We thought a lot about Stash, which you can think of as Bitbucket for the Enterprise offering Git repository management behind the firewall. It is a great option for larger teams that have outgrown cloud code hosting services like Bitbucket, and want to bring their repositories inside of their enterprise infrastructure. While Bitbucket and Stash serve slightly different audiences, we wanted to be sure you could move between the two products without a steep learning curve. The new UI was one area that we wanted to make consistent, but we also wanted to be sure commenting on pull requests worked consistently.
To that end, Nic Venegas and I traveled to Sydney to work closely with the Stash developers for a few weeks. This trip helped us immensely learn the intricacies of each others’ products and achieve a level of consistency between the two products. Working with the Stash team naturally made us think more about the problems that we were trying to solve from a different angle, and helped us build a more robust product.
For example, here are some specific things we aligned on:
- Tabbed view on the pull request (there are some differences on exactly what is in each tab)
- Participants list (though Stash calls it ‘Reviewers’)
- ‘Approve’ functionality
- Comment drift algorithm (see the next section for more details)
Probably the most complex part about inline comments on pull requests was handling ‘drift’ correctly. Drift (as we call it; I don’t know if there is a standardized word for it) is when additional commits amend a pull request after an initial comment is left on a line. This might change the line number of the commented-on line, and, of course, even if the line number changes, the comment should still appear in the right place.
Other times, new changes might change or remove a commented-on line, so the commented-on line no longer exists in the diff (at least, not as it was when the comment was made). We refer to these comments as ‘eclipsed’, since the new changes cover the line they were anchored to. We still show those comments in the Activity tab, with the diff as it was before the comment was eclipsed.
If keeping all this drift stuff straight sounds complicated, that’s because it is. We caught a few bugs that annoyed us EVERY DAY while we used the feature we were developing. One particularly nasty bug that caused the drift to be calculated incorrectly when the destination branch was merged back into the feature branch (or fork) after a comment was made. This is a pretty common thing to do, especially if conflicts arise in the pull request. The bug was tricky to resolve because we needed to re-think the way we were performing the diffs used to calculate drift. Instead of just relying on the diffs of the commits made, we needed to take the diff of the merges that would be applied if you were to merge the pull request. Big thanks to the Stash team for helping us figure out the cause of, and solution to, this problem. We caught this bug because we were bitten by it while using the feature we were developing.
The solution to the drift problem can be explained using the diagram above. If you aren’t interested in the nitty-gritty details, you can just skip ahead to the next section.
The first case to consider is when the source branch advances, which means that more changes are committed to the source branch of the pull request, and the pull request is edited to include these changes. Consider the diagram on the left. The original version of the pull request is to merge commit D into the main branch (which includes commits A, B, and C). The merge originally shown was M1. Now, commit E has been added to the pull request, and merge M2 is what should be shown.
The easy case, which we call the ‘fast-forward’ case, is when the comments are on files that are not touched by commit E. Since there is no further change to these files, their line numbers are unchanged, so we just need to update the comment objects in our database to confirm that they are relevant to the new revision anchors (E and C, in this case).
However, when the files commented on are touched by commit E, things get trickier. For comments on removed lines (colored red on the pull request view) and context lines (lines that were not changed at all in the pull request, but are shown to give the reviewer context), we can just fast-forward them, since these lines are coming from the main branch version of the file, not the new version described in the pull request. On the other hand, for comments on ‘added’ lines (colored green on the pull request view), we need to consider the ‘meta-diff’, or the diff between merges M1 and M2. For each hunk in this meta-diff, if it is before the line commented on, the system will drift the comment by net total of how many lines were added or removed in that hunk. If the hunk overlaps the commented line, then the comment is eclipsed. If the hunk is after the commented line, then it has no affect on the comment.
The next case to consider is when the destination branch advances (the right side of the diagram above). Suppose that now a new commit, F, was pushed to the destination branch. We can still fast-forward any comments that were not touched by commit F. The meta-diff between merges M3 and M4 is still used to calculate drift on comments on added lines. However, now we need to consider the diff represented by commit F to calculate the drift for comments on removed and context lines.
The bug that we encountered originally occurred in a case like the one pictured on the left; we were using the new commit’s diff instead of the meta-diff. This worked fine, until we merged the default branch back into our feature branch and updated the Pull Request (as you would to resolve conflicts, for example). This would be a case of the source branch advancing, but the diff from commit E wouldn’t properly express the drift.
We are constantly striving to improve the speed and performance of pull requests (and all of Bitbucket). In this process, we have made significant progress (while we were still dogfooding the feature internally). Our dogfooding server mirrors our production environment, but it is not nearly as powerful. So, code that will be slow in production will be REALLY slow there. As a result, we feel the pain of a slow site, and we work to improve it, long before the code ever makes it to production.
With all of the various diff comparisons we needed to do to render a pull request (especially on the activity tab, if there were a lot of eclipsed comments), there was a lot of work for the code to do. We ran into a few performance issues while we were developing this feature, and some of these issues were improved by code changes, while others were solved by adding caching. Some other improvements are still in the works, and should hit production in the next few days.
- Reduced the number of SQL queries needed to render the page by making sure we used Django’s select_related function wherever appropriate.
- Simplified logic used to build up the activity tab events that were displayed when commits were added to the pull request. Before we simplified this logic, it generated many more diffs than were necessary, so fixing that improved the performance immensely.
- Added template caching to activity tab items. This improvement adds to the complexity of invalidating the cache when appropriate, but it can make the page load quickly.
- Separated out the slow-loading part the main diff from quickly-loading parts, like general comments. This gets something onto your screen quickly, even if the time to getting all the content to your screen is unchanged.
With all this dogfooding, it might seem like we were just trying to solve our own problem, since on our team, we constantly use pull requests to review all code changes. Our team policy is that at least two team members need to ‘approve’ a pull request before we merge the changes in. That seems to work pretty well for us, but your team might be different. We wanted to make our solution useful to as many teams as possible. Many smaller teams find pull requests too heavy-weight, and prefer to just review commits instead. So, we made sure to include inline comments on commits as well. Hopefully this allows enough flexibility in the tool that your team can find a system that works.
So, check it out! Try browsing a few pull requests that are out there now, and get a feel for them. Create a pull request the next time you want to merge your code into your team’s main branch, and get a few extra pairs of eyes on it. I’m sure you will be able to find problems sooner, and fix them more easily!
By Jeff Park on October 25, 2012
Cool projects are popping up on Bitbucket all the time, and we thought it’d be great to share them with you from time to time in our new Who’s On Bitbucket? series. We kick it off with the Adafruit Learning System Raspberry Pi WebIDE.
Adafruit, the NYC based open-source hardware company led by Ladyada, recently released their open-source Raspberry Pi WebIDE alpha. Striving to be the easiest way to develop code on your Raspberry Pi, the WebIDE has many features to help program your device. Find your repositories neatly listed within the IDE and quickly access your source code, or talk directly to your Raspberry Pi with the built-in Terminal button. Learn more about using the WebIDE.
To get up and running, head on over to Adafruit and follow the installation and setup instructions. Any code changes you make will be synced to your Bitbucket account. Adafruit chose Bitbucket over other repository hosting sites due to our free unlimited private repositories.
For those of you who don’t know, the Raspberry Pi is a $25 highly hackable Linux computer designed to help teach kids about computers. It’s popularity among the maker community is following the same wave as the Arduino. As of this post’s publishing, there are 655 WebIDE/Raspberry Pi projects on Bitbucket.
Share your Raspberry Pi projects with us in the comments below!
By Justen Stepka, Product Manager on October 9, 2012
It’s a big day here at Bitbucket HQ. The Bitbucket team is unveiling a brand new, redesigned Bitbucket. Our goal for this huge release was to rethink and rebuild the Bitbucket web experience from the ground up. Today, we’re are excited to introduce the new Bitbucket – faster, easier and more beautiful than ever.
Meet the New UI
Every page has been optimized for speed, clarity and discoverability. We’ve rethought the way repositories are presented, helping you find the most important information. We’ve streamlined the way you navigate the site, and we’ve brought the actions that you most often take front and center. Simply put, we’ve aimed to give you the ultimate experience with Git and Mercurial. Meet the new Bitbucket.
Restructured Repository Header
The new repository header makes it lightning fast to navigate source, commits and pull requests.
We removed clutter and grouped key commands like cloning and forking together. Moreover, we made actionable information such as the number of open pull requests available at a glance.
The newly designed repository landing page features the Activity Stream, keeping teams up-to-date on recent commits, pull requests and more.
The new meta-data panel to the right of the activity stream allows users to quickly filter and search through branches, tags, forks, and followers. Additional details such as the clone URL, repo type, and access level are easy to find.
From any repository page, the clone URL is instantly accessible for copy and paste. Whether you’re browsing source or commenting on a pull request, just click on the ‘Clone’ button and you’re set.
Mac users have the added bonus of configuring their repository locally on our free Git and Mercurial client, SourceTree, with just one click.
The source browser has been completely recreated. Filter your view by branches or tags to learn more about the features your developers are working on.
A new feature to the source browser is the ability to diff between any two commits. Simply select any two commits, and Bitbucket will render a unified diff with the option to expand context and explore even more source code.
You can even open up a side-by-side diff and view file changes IDE-style.
Light-weight Code Review
The re-imagined UI was just the beginning. A huge part of developing on Bitbucket is to make it easier for teams to collaborate around code with pull requests. Not only are pull requests redesigned, we have added some more horsepower to your code collaboration with in-line comments and merge approval. This gives you a light-weight code review process built right into your development workflow.
A huge request from our users was the need to comment inline (and in context) on code. There are now two options to provide feedback, ask questions, and have discussions around your code – comment on any line of code in a pull request or comment on individual commits.
When someone leaves a comment on a changeset or pull requests you’re participating in, you’ll receive an email notification with the code comment and a link to respond.
Participants and Approvers
Anyone who has commented on a pull request will show up as a participant. As you conduct your review and submit changes, you can “approve” the pull request to signal to your team that you’re happy with what you see. Bitbucket will then display a check next to your avatar letting everyone know that you like what you see.
Compare View and Pull Requests have been designed to complement each other. Quickly compare two branches or a fork and submit a pull request.
“And one more thing…”
There are dozens of other small improvements all over Bitbucket that make it even better to use. These include:
- Markdown support for any place that you can leave comments, such as pull requests or issues
- Default repository avatars for repositories that have programming languages set
- Quick filters throughout the site
- Faster site performance – the user dashboard is now up-to five times faster
- Simpler administration and account pages
Lastly, thank you for being a Bitbucket user
We’ve agonized over every detail of the new Bitbucket, and we know that you’re going to love it. Please let us know what you think, and help us spread the word. Our never ending goal is to make Bitbucket even better. Thanks to everyone who uses Bitbucket and to those who provide feedback.
If you haven’t checked us out lately or are new to our service, Bitbucket has had a year of record growth. In the last year, we’ve added Git support, introduced Bitbucket Teams, and tightly integrated with JIRA. Sign up and check it out! As always, we offer free unlimited private repositories for up to 5 collaborators.
Cheers, The Bitbucketeers
Help spread the message on Twitter
By Jeff Park on October 2, 2012
In August, we launched the Academic refer-a-friend campaign to spread the news to universities worldwide that Bitbucket offers free unlimited private repositories for academic students and teachers. Initially, we gave out Bitbucket t-shirts for every 5 friends invited, but the feedback was so strong that we ran out of t-shirts in a week and a half! So we changed the prize for the month of September; each time you referred 5 friends entered you in a drawing for a $50 Apple Gift Card.
We are happy to announce that a winner has been chosen for the month of September:
David Hartveld from Delft, The Netherlands!
Congratulations David! We will be contacting you through the email address you provided in the next couple of days to send you your prize.
Thanks to all who referred friends. This campaign will continue to run for the month of October, so keep on referring!
By Charles on September 27, 2012
Bitbucket will be unavailable up to one hour starting Sunday, September 30, 2012 at 01:00 GMT.
During this maintenance window we will upgrade from PostgreSQL 9.0.x to 9.2.1, which allows Bitbucket to take advantage of several performance improvements. As part of this migration, we are now utilizing Bucardo to replicate between major versions of Postgres, which saves us downtime over previous migrations.
Thanks for your patience as we work to increase Bitbucket’s performance and reliability.
By Jeff Park on September 24, 2012
The JIRA DVCS Connector enables small teams to use a robust issue tracker with Bitbucket. Today we’re excited to announce a new release of the connector on the Marketplace and OnDemand, making it easier than ever to get your small team running more efficiently with powerful development tools.
What is the DVCS Connector?
The JIRA DVCS Connector integrates your Git and Mercurial repositories with JIRA, so you can link every commit to a bug or development task.
- Track commits in JIRA from all or a subset of your repositories, both public and private
- Automatically discover newly created repositories and link commits to JIRA issues
- Push your changesets to JIRA by referencing issue keys in your commit messages
Commit & Take Action
Switching between applications or even just between browser tabs can lead to distraction and lost time, so we’ve added Smart Commits for Bitbucket in the JIRA DVCS Connector. Smart commits let you update your team on progress and take action on JIRA issues, all from a single commit. Simply enter the issue key and your desired action such as closing an issue.
For more details on using Smart Commits, please see the User’s guide to Smart Commits.
Set Teams Up To JIRA With A Single Step
Syncing new JIRA users with your Bitbucket Teams is now done automatically, based on the new users’ email address. Once you connect Bitbucket to JIRA, you can add new users to both applications in a single step. Automatically send an email inviting the new developer to join your team on Bitbucket.
Go Git with JIRA
A streamlined JIRA + Bitbucket experience is only a click away. Download it for free today!
By Justen Stepka, Product Manager on September 19, 2012
Earlier today at 2am San Francisco time Bitbucket experienced about three hours of 500 error page responses for users attempting to access the user newsfeed and repository overview pages. The outage was caused by a kernel panic on our Redis server, which is responsible for pages that display recent events related to a user. We are very sorry for the inconvenience this outage has caused.
After rebooting the Redis server, the index that Redis uses to serve the newsfeed content was found to be corrupt, which caused certain pages on Bitbucket to fail. For users accessing pages deeper into the site, such as pull requests, commit views, wikis and issues the site continued to work as expected. During this time Git and Mercurial access continued to work over both HTTP and SSH. After identifying the cause of the problem, we turned off the newsfeed for all of Bitbucket bringing an end to the 500 errors.
With the newsfeed temporarily disabled, we began investigating the corruption problem and discovered a forum post with instructions and a repair tool to fix the corrupted index. We then used the instructions to repair the index and restore full service to Bitbucket.
During this outage we have identified areas for improvement and are implementing changes to the way we manage the operations of Bitbucket:
- Improve our escalation procedures so that the response times are faster during non-office hours
- Update the Bitbucket codebase so we do not have the dashboard and repo overview fail when Redis becomes unavailable
- Increase the number of tests that status.bitbucket.org performs triggering our automatic phone alert system
We are very sorry for the inconvenience this outage has caused.
By Justen Stepka, Product Manager on September 18, 2012
Want more than 5 private collaborators for free? All you need to do is get your friends on Bitbucket. For each friend you invite who joins Bitbucket, we’ll upgrade your free 5 user personal account by 1 additional user. You can do this up to 3 times for the chance to have 8 private collaborators – all for free.
From your dashboard, send an invitation to anyone by simply entering their email address. When your email invitation is accepted, your 5 user free plan will be upgraded with an additional user.
What about team accounts?
The Bitbucket invitations feature is only available for individual accounts. Team accounts of course are still free for up to 5 users, and $1 / user after that with unlimited private repositories.
By Jesper Noehr on August 24, 2012
I wanted to take a moment to talk about some infrastructure changes we’ve made on Bitbucket lately, and apologize for some flakiness this week.
Over the years, the architecture behind Bitbucket has changed significantly. On day 1, we ran Apache with mod_wsgi on EC2, but today our stack looks completely different. And this week, we made yet another major change.
So what did we do?
Bitbucket is already segregated into smaller parts; we have a pool of Django workers, mercurial workers, and git workers. However, up until today, everything has run on every machine. This is really just a leftover from the early days where we had a handful of machines on EC2. It makes a lot of sense to split up your service, designating a set of machines to handle specific things. This makes it easier to measure, profile & improve.
Using clever routing and inspection, we’ve shortened requests paths all over the place. This is good news for everyone: You have less hops to get your data, and we have less moving parts when things act funny. Simpler is always better. This also means we can re-route traffic when necessary, and easily provision new workers and stick them in the pool.
Our offering over the past few years, has grown beyond a handful of virtualized machines to many racks of expensive hardware. We have automated a lot, most importantly deployments. It simply became unmanageable with the sheer amount of machines we would need to SSH into and develop carpal tunnel. So it made a lot of sense to simplify where we could, increasing reliability, measurability and transparency.
I’ll get a picture for you guys soon. The rest of this post is rather technical, and unless you’re interested in that sort of thing, you can skip right to the end.
Technical details (nerd porn)
Previously, when you cloned a git or mercurial repository, the request went something like this:
Let me explain: First you hit one of our load balancers, running HAProxy. HAProxy proxies you through to Django. Why Django? Because we need to authorize/authenticate your request. That’s all done there. We then make use of a feature in nginx called “X-Accel-Redirect”. It’s a special header you can return in your response, and it tells nginx “go look here”. So if you did an X-Accel-Redirect to a local file, nginx would serve that file. However, if you do an X-Accel-Redirect to a location, nginx will replay the entire request as it came in, at a new location. This is very handy, as we let Django authenticate the request, and pass it on to our mercurial workers. That way, they don’t need to even have knowledge of Bitbucket, and can just be mercurial work horses.
And so they have been. But this introduces a dependency, namely Django. It’d be a lot better to get rid of that, and get to your destination as early as you can.
What we’ve done, is develop a small WSGI library, called singlemalt (get it?). It’s a thin middleware that authenticates, and provides hooks for authorization for each individual thing. We plugged this under the hood of hgweb, gitweb, etc. That gives us transparent authentication, and enough flexibility to reuse the library throughout different services. It’s nothing special, but for us, it was worth the investment. This helps keep things simple and consistent. We also took the liberty of improving health checks across the services, too.
The new request chain looks like this:
Yay! No more Django dependency, and you get to talk straight to hgweb. We did this by using the ACL feature of HAProxy–they’re akin to rewrite rules. We look at incoming requests, and based on various headers (like User-Agent), we determine who you should talk to. This let’s us entirely by pass parts that aren’t strictly necessary, freeing them up to serve normal web page requests.
This is the result
This graph shows sessions per second, per backend. Backends are pools of remote workers HAProxy will forward requests to.
The red line, servers-ssl, was the backend that served all requests, including Django, hgweb and gitweb. After we deployed the new routing, look at how the traffic first dropped significantly, and a new light green line appears. That one shows sustained sessions to the new servers-hg backend.
As a side note, look at the difference in mercurial traffic vs. web page traffic! Mercurial sure is chatty.
Shortly thereafter, we began re-routing gitweb traffic as well, causing a further drop of the red line, and the introduction of a new, purple line.
Mind you, we have several load balancers, so this only represents what a single one puts through.
Having made these changes, we have increased our fault tolerance across the board. Eliminating dependencies such as Django in the request chain, now means that if all the Django workers are busy (or down), you can still interact with your repositories. Or if someone decides to bombard our sshd, Django will still serve requests.
This was a huge rollout! Along the way, we tripped a few times, but in retrospect, it helped us identify problem areas immediately, and we appreciate the patience & understanding of those affected at the time.
One thing that helped us immensely, especially with retaining our sanity, was to break this up into smaller bits, that we could roll out individually. We used branches and pull requests for this.
PS: HAProxy is a seriously neat piece of software, and Willy Tarreau is a boss for creating it.
By Jeff Park on August 20, 2012
Did you say FREE for academic users?
You bet. Bitbucket offers free unlimited public and private repositories for academic users and teams.
How do I get in on this?
Easy. Sign up for a Bitbucket account using your academic email address. We’ll do the rest. We’ve expanded the automatic academic account upgrade process to include all high-level educational domains such as .edu, ac.il, edu.sg, etc.
I’m a student. How is Bitbucket going to help me?
Good question. Here are a few ways:
- Learn DVCS - Git and Mercurial are used by professional dev teams worldwide. Differentiate yourself with Bitbucket.
- Collaborate with classmates and teachers - Easily share code with your peers and professors for quick feedback and easy grading.
- Keep every project, forever - Having a history of your work is a great way to build an online portfolio to showcase to future employers!
Refer Classmates. Win A Shirt.
So go ahead and invite your friends – we’ve added a referral link to academic users’ Bitbucket dashboard. Invite 5 of your school friends, and we’ll send you a free Bitbucket t-shirt!