By Kelvin Yap on February 19, 2016
<This is a cross-post from Atlassian Blogs>
Atlassian is dedicated to unleashing the potential in software teams. We want to help you work smarter and faster. This is the reason we keep adding new features to Bitbucket – branch permissions, merge checks, smart commits, smart mirroring and many more.
Last year, we were working on solving another big problem for our users: tracking large files in a Git repository. Git large file support (LFS) provides the ability to store really big files where they’re needed, not just where they fit. We want to make Git right for everyone and that’s why we decided to collaborate with GitHub on building a standard for large file support.
Why Git LFS?
It’s a known issue that Git doesn’t play nicely with large files, and it’s not just developers who struggle with large files and version control. Native Git’s limitations make it challenging for team members like designers, tech writes, sys admins, and build engineers to work closely with developers.
They often need to store their assets in stand-alone systems or cloud storage providers because of Git’s historic inability to track version history for large assets (schematics, graphics, or other media files). Co-locating non-code assets with the code itself means the large assets can be updated quickly, easily versioned, and become a natural part of the deployment pipeline. As a design team incorporates feedback on their interface assets, iconography, and images, they can deploy those changes in a fully-built version of the product and see the results.
Git LFS also allows people like designers to version the assets they share with their team, just like all the other assets in a repository. This means developers can find the latest revision of an asset they need to use without needing to track down the right version of a file or interpreting what “menubar_v2_final_v3_final_final.psd” means. (Sounds almost too good to be true, doesn’t it?)
For mobile software teams and game development teams, Git LFS relieves the pain of working with the ever-larger image, texture, and video assets that cater to the ever-improving resolution on mobile devices. It also means tests and builds don’t need to rely on an external step to succeed. And if you’re building a video- or audio-intensive app, you can keep those stored in Git LFS too.
Git LFS in Bitbucket
Naturally, we want to do everything we can to make Git more accessible to all team members. And that’s reflected in the changes we’re making to Bitbucket, our Git repository manager, available as a cloud service or on-premises server. Any software team using Git LFS can now view every version of project assets through the Bitbucket interface and in select media types we’re now making it easy to compare the latest version updated with our visual diff tools.
Bitbucket Server implemented Git LFS in v4.3, and we’ll be releasing support for LFS in SourceTree (our desktop client for working with Git) soon. We’ve also made it really easy to use LFS in Bitbucket Server so you can get started quickly. But Atlassian’s LFS journey is far from complete. We’re bringing LFS to Bitbucket Cloud soon.
Git for everyone!
We’ll continue collaborating with GitHub on the LFS protocol to extend the capabilities of Git for all software teams using features like file locking to prevent change conflicts, better handling of vector and layered files, and more. In Bitbucket, we’re making it easier to work with media files and our design teams are helping us bring a great UI experience to every member of your software team like designers, tech writers, marketing, etc. – even if they don’t write code.
If you haven’t used Git yet, check out our collection of Git tutorials and articles. Or if you’re ready to get started, signup for a Bitbucket Server account. If you’re already using Bitbucket Server,enable Git LFS for your team and let us know what we can do better.
By Roger Barnes on February 9, 2016
<This is a cross-post from Atlassian Blogs>
Are you on one of those teams that finds all kinds of ways to stretch the limits of its development tools? If you’re at a big company, working on big projects stored in big repositories – possibly repos that are shared with teammates across multiple continents – the answer is probably “yes”.
Using Git at massive scale can be so inefficient that it poisons your team’s productivity. So I want to bring you up to speed on the antidote we’ve developed. It’s called smart mirroring, and it’s now available in Bitbucket Data Center.
Does your team need smart mirroring?
Some teams do, some don’t. But the teams who do need it have a few things in common.
For starters, we’re talking about teams with hundreds, if not thousands, of developers. A few thousand coders will tax any repository server, even if that server is sitting just a few feet away from you. And as important as the performance of Bitbucket itself is, there are other factors in play.
We’ve noticed that an increasing number of large development teams using Git are geographically distributed, with little or no control over the network performance between themselves and their Bitbucket instance. These teams suffer from high latency and every operation they perform competes for limited bandwidth. In addition, these same teams often need to work with large repositories for a variety of reasons (sometimes even good reasons!).
All these factors conspire to rob developers of valuable time, making them wait long periods – often hours – to clone a large repository from across the globe. It can get so bad that people have resorted to sending portable drives around the world via mail. That kinda sucks.
If this sounds like you, smart mirroring will help.
How smart mirroring improves Git performance
Bitbucket Data Center has always been able to run multiple application nodes in a local cluster to help serve all those users and build bots that demand performance and availability. Smart mirroring takes the performance improvements a step further for Git read operations in a way that’s tailored for distributed teams working with large repositories.
It works by setting up one or more active mirror servers to operate with read-only copies of repositories in remote locations, automatically kept up-to-date from the primary Bitbucket instance. A mirror can host all of your primary instance’s repositories, or just a subset. Mirror servers delegate user authorization and authentication to the primary server, so no additional user management is required. And you can connect as many mirrors to your Bitbucket Data Center instance as you need at no additional cost.
Aside from dramatically improved Git performance, developers are automatically presented with alternate clone locations in the Bitbucket interface, so administrators don’t have to provide extra training. Once set up, the mirrors are fully self-serve.
Predicting performance improvements
The performance gains you can expect vary as a factor of network bandwidth and repository size. What it basically comes down to is how slow remote cloning is for you today. In a simple test, we saw that a 5GB repository took over an hour to clone between San Francisco and Sydney. But with smart mirroring, that time was to just a few minutes.
We heard from one customer where a remote user had a clone that took 9 hours. (9 f’ing hours!) They could expect a more substantial performance increase – basically, a whole working day given back to each developer who clones a large repo.
Imagine: the mobile team in Bangalore, that web team in London, the secret project team working from a lab in Thailand… all able to reap the benefits of Git, without suffering from the tyranny of distance.
Whether you’re just adopting Git or already a guru, your distributed teams should be able to make cool stuff without unnecessary delays. If you’re ready to talk to a real live human about smart mirroring and how it can help your team, get in touch with one of our customer advocates using our handy contact form. If you’ve temporarily lost your ability to speak due to banging your head against your desk while wondering if that 8GB clone is ever going to complete (or you’re busy hand-delivering a hard copy of it to your teammate in Poland), you can get more information about our Data Center offerings online.
Interested in learning more? Join Roger Barnes, Senior Bitbucket Product Manager, on March 3rd at 11:00am PST, CET, and AEST to learn:
- Benefits of using Smart Mirroring in Bitbucket Data Center 4.2.0 or higher
- How Smart Mirroring works: configuring a mirror server to an upstream instance
- Steps for setting up a mirror to improve Git clone speeds for distributed teams
- Troubleshooting tips for any issues you might encounter with installing a mirror
- How to point your continuous integration server (Bamboo, Jenkins,…) at a mirror
Register for Webinar
Did you find this post helpful? Please share it on your social network of choice and help your fellow Git users end their performance woes!
By Abhin Chhabra on January 26, 2016
Sometimes, developers mess up. We don’t mean to do it, but sometimes, our code is exposed to situations we didn’t anticipate. The best we can do when that happens is to fix our mistakes and learn from it. Some of us even try to find ways to make sure this doesn’t happen again. How? Document our mistakes and make sure developers can discover them in the future.
Over time, these learnings grow into an unmanageable collection of documents. The effectiveness of these documents decreases because no developer writes code with multiple pages full of caveats open in a separate window. We’ve heard from our customers that it would be great if there was an easy way to scan and access a checklist that is available during development and pull request reviews.
Many projects have solved this problem by having a CONTRIBUTING.md file in their repository, which not only mentions how to contribute to that repository but also lists the things to look out for. A good CONTRIBUTING.md file serves as a helpful checklist during development. It would be even better if we could somehow enforce that during the review process.
So, during an internal hackathon, I built an add-on that attempts to do exactly that. The “pull request guidelines” add-on parses the CONTRIBUTING.md file in your repository and summarizes it in the form of an easy-to-scan checklist. It then makes the checklist accessible in the pull request itself, so both the author and the reviewer can easily go over the list of guidelines. The “pull request guidelines” add-on also makes it super easy to turn these guidelines into actionable tasks.
The add-on was built using Bitbucket Connect. If you like what you see and have a cool idea, you can make something like this too. Go check out this tutorial and let loose your imagination.
This add-on was a result of a couple of weeks of work, so it’s not quite done yet. But that shouldn’t stop you from trying it out. Check out the add-on page to learn more, and if you just want to install the add-on, click the button below:
Install pull request guidelines
By Roger Barnes on January 21, 2016
Strong innovators have always aspired to be faster. Fast development cycles lead to more innovation and lower costs according to a recent survey on innovation by Boston Consulting Group. We at Atlassian are committed to helping teams deliver software at speed. Last September, we announced critical new capabilities to enable teams to do just that: build faster with Bitbucket. We’re excited to announce that these features are now available:
- Smart Mirroring to improve clone performance for distributed teams, available in Bitbucket Data Center
- Git LFS support to allow collaboration on all file types of any size, available in Bitbucket Server and Data Center
- Projects for organizing multiple repositories, available in Bitbucket Cloud, Server and Data Center
Many software teams using Git can build up large repositories over time due to a large amount of historical information, use of monolithic repositories, storage of large files, or a combination of the three. Developers working from remote locations need to wait hours when cloning, which is a big drain on productivity. Smart Mirroring can drastically improve read (clone, fetch, pull) performance for distributed teams working with large repositories by making them available from a nearby server. As an example, in one of our own internal tests, we have seen clone times get 25X faster for 5GB repositories between San Francisco and Sydney.
A mirror server is simple to configure, easy to maintain, and automatically uses existing authentication mechanisms. Unlike some other solutions available in the market, you don’t need to install or configure a whole new instance in order to create a mirror server, or mirror the repositories one at a time. Administrators will love how simple it is to host a mirror server, and developers will appreciate how the vastly improved clone and fetch times speed up their workflow.
Interested in learning more about Smart Mirroring? Register for the webinar, “Speed up distributed development with Smart Mirroring for Bitbucket Data Center” happening on March 3rd.
Git Large File Storage (LFS)
Modern software teams at their core consist of not just developers but designers, QA engineers, writers, and more. These teams track assets such as graphics, videos, and other binary files that are inherently large. Git’s original performance goals for distributed version control weren’t optimized for tracking large binary files, making it unsuitable for storing large assets. With the addition of Git LFS support, software teams can track all the assets they produce together in one single place and be productive at the same time. Large files are kept in parallel storage, and lightweight references are stored in your Git repository making your repositories smaller and faster.
As organizations grow, team sizes get bigger, and more and more repositories get added. It gets progressively harder to find the repository you’re looking for. Projects make it easier for teams to organize their repositories and become more productive with Bitbucket Cloud. This feature is already available in Bitbucket Server and Data Center. We also took this opportunity to refresh our UI in Bitbucket Cloud and make it easier for you to find what you’re looking for.
Get started with Bitbucket
With the addition of Smart Mirroring, Git LFS, and Projects – Bitbucket is now more suited than ever for professional teams. Organizations of all sizes – from large enterprises such as Verizon and Nordstrom to small startups like Pinger and Kaazing – are using Bitbucket today, and we’ve heard from many that they’ll be using these new features in the coming weeks.
“Many of our customers have distributed teams that have experienced pain around storing large binary files and cloning performance. Smart Mirroring and Git LFS are two huge game changers that will boost productivity for our clients using Bitbucket around the globe. We are excited to roll it out to all our customers.” – Zubin Irani, Chief Executive Officer, cPrime
“Our developers are spread all over the world, and Bitbucket helps them remain aligned as they build powerful solutions for our customers. We are very excited about Smart Mirroring in Bitbucket which will not only improve multi-site clone performance but will also increase developer productivity of distributed teams.” – Kurt Chase, Director of Release Engineering, Splunk
Our JIRA Cloud customers picked Bitbucket as their #1 Git solution. More than 1 in 3 Fortune 500 companies trust Bitbucket and are using it every day to innovate faster. If you’re new to Git, head over to “Getting Git Right.”
Or, if you’ve already made a decision to switch to Git, click the link below!
Try Bitbucket Free
By Amber Frauenholtz on December 29, 2015
File Viewer for Bitbucket Cloud is the winner of the Codegeist 2015 Atlassian hackathon, in the category Best Bitbucket add-on.
This guest post is written by Alexander Kuznetsov, one of the developers of File Viewer for Bitbucket Cloud and co-founder of StiltSoft, an Atlassian Verified vendor and Atlassian Expert. Alexander has seven years’ experience as a software developer, five of which have been developing add-ons for Atlassian platforms. He was also the runner-up of Codegeist 2012 for the Awesome Graphs for Bitbucket Server (Stash) add-on.
With millions of developers on Bitbucket Cloud there is a huge demand for add-ons providing additional functionality. Earlier this year our team introduced Awesome Graphs for Bitbucket Cloud. Then later in October, we decided to participate in Atlassian Codegeist 2015 with the idea that Bitbucket Cloud users would appreciate the capability to view files of various formats directly on Bitbucket pages. That’s how we built File Viewer for Bitbucket Cloud.
File Viewer for Bitbucket Cloud
This add-on allows you to view 3D and 2D models, maps, tables, and PDF files that are a part of your repositories right in Bitbucket without downloading them.
It’s a pack of viewers that includes:
File Viewer adds a new button on the panel of the core Bitbucket viewer, the one you see when you click a file in the Source tab. That button is used to switch from the default view to seeing a file in the add-on viewer.
View PDF documents
While viewing files with the *.pdf extension, you can see how many pages there are in a PDF document. Pages are displayed one at a time and you can navigate between them.
View 3D and 2D models in STL and Autodesk Viewers
There are two viewers for 3D models – STL Viewer and Autodesk Viewer. The latter can be used to view 2D models as well. STL Viewer works with *.stl extension files and renders them as 3D models that you can spin and zoom. This viewer is opened when you select the ‘View as 3D model’ option.
Autodesk Viewer supports over 30 file formats. Using this viewer you can visualize and interact with 2D and 3D design data. To open it, select the ‘View in Autodesk Viewer’ option.
View CSV and TSV documents
Table Viewer presents CSV and TSV files as tables with header and sorting capability.
Map Viewer displays files with the *.geojson extension as maps that you can zoom and interact with (i.e. click it).
You can try File Viewer for Bitbucket Cloud by installing the add-on from the Find new add-ons section in your Bitbucket Cloud personal account settings. File Viewer doesn’t require any configuration. Once installed, you can start using it right away. Navigate to the Source section on the left-hand sidebar in your repository, locate a file you would like to view and select the viewer option in the ‘Default File Viewer’ menu.
We’d love to hear from you. If you have feature requests or feedback you would like to share, please contact us or post your ideas at the File Viewer forum.
By Jim Redmond on December 3, 2015
All IPs have been moved, and the old IPs are no longer handling traffic.
Thanks to you, Bitbucket is outgrowing its old network infrastructure. We’re going to make some upgrades that should make Bitbucket faster, more reliable, and ready for further growth.
What are we doing?
We’re changing our A records in DNS starting at 00:00 UTC on Tuesday, December 15, 2015. There will not be any downtime for this migration, and most people will not have to do anything differently because of this migration.
Why are we doing this?
Our new IP address space, along with some underlying network improvements, should make response times noticeably faster for about a third of our users. Just as important, these changes make it easier for us to improve upstream network connectivity and load balancing, and to perform other infrastructure projects in the near future.
How will this affect you?
Most users will not have to do anything special for this migration. Your DNS servers should pick up the new IPs within a few minutes of the migration, and your systems should start using the new IPs right away. We’ll keep the old IPs running for a few weeks afterwards just in case, though.
If you control inbound or outbound access with a firewall, though, then you may need to update your configuration. Please whitelist these new IPs now; you should be able to remove the old IPs after the migration is complete.
New destination IP addresses for bitbucket.org will be:
New source IP addresses for hooks will be:
Our server’s SSH key is not changing, so most SSH clients will continue to work without interruption. However, a small number of users may see a warning similar to this when they push or pull over SSH:
Warning: the RSA host key for ‘bitbucket.org’ differs from the key for the IP address ‘18.104.22.168’
The warning message will also tell you which lines in your ~/.ssh/known_hosts need to change. Open that file in your favorite editor, remove or comment out those lines, then retry your push or pull.
As we update records, we’ll post updates on our @BitbucketStatus Twitter account and our status site so you can follow along. We’ll also keep our knowledge base up-to-date with any future changes to the lists of IPs.
Thanks for your patience as we work to increase Bitbucket’s speed and reliability. Please contact us at email@example.com if you have any questions.
By Amber Frauenholtz on December 2, 2015
Does this workflow sound familiar? Commit, trigger a build, switch to your continuous integration tool, check the status, configure your deployment environment, execute complex scripts, switch back to Bitbucket, start again… you get the idea.
What if all this context switching could be a thing of the past? How much time would you gain back from being able to view build information and even deploy without leaving Bitbucket? We think it’s time you find out.
We’re working with industry leaders, including Amazon, Microsoft, DigitalOcean, and more to close the loop on your development workflow with new build status and deployment integrations on Bitbucket.
Since the launch of in Bitbucket, we’ve been hard at work making Bamboo integration a reality. Today your wait is over. To view Bamboo builds inside your commits, branches, and pull requests simply. Support for Bamboo Server will be available in a few weeks.
Are you using a different continuous integration tool in your workflow? Don’t fret – the community has been busy as well. Users of Wercker , but can also see build status while they work. Buildkite has you covered as well with automatic build status integration for all builds on Bitbucket repositories. Don’t see your favorite tool listed here? using our documentation.
Deploy from Bitbucket
Now that you know your builds are passing, it’s time to deploy your work. In the past, getting code from your team repository to your staging or production environments required executing scripts or configuration of complicated deployment plans – all outside of Bitbucket.
Thanks to Bitbucket Connect, you can now deploy your code from Bitbucket to several leading cloud services including Amazon Web Services (AWS) CodeDeploy, Microsoft Azure App Service, and DigitalOcean. Bitbucket’s Connect architecture takes this a level beyond a simple “click to deploy” button. The ability of add-ons to add features into the user interface means you can configure your deployment environments without leaving Bitbucket. These cloud services have worked with us to build add-ons to make your life easier. Workflow simplification for the win!
Amazon, Microsoft, and DigitalOcean see value from deploying directly from Bitbucket, and we hope you do, too.
Ready to get started?
From commit to ship, can now complete their workflow without ever leaving Bitbucket. Get started by heading to your CI tool of choice, installing a deployment add-on. Now can spend less time switching between tools and more time doing what they love – coding.
Get started with an add-on today!
By Dan Tao on November 18, 2015
Many of you have been asking for better support for continuous integration in Bitbucket Cloud. Every time you trigger a build, whether by pushing commits or creating a pull request, you have to log in to your build server to see if it passed or failed. For many of you, we know it’s been a major hassle that you’ve had no way to see the build status right within the UI – until now.
Starting today, the build status API is available with updates to the UI providing at-a-glance feedback on commits, branches, and pull requests in Bitbucket Cloud. Now, you’ll be able to know when your build is passing and when it’s safe to merge changes saving you precious time to do what you do best: coding.
When viewing the commits in your repository, you can clearly see which of those commits have been checked out and tested by your CI tool of choice. If the tests all pass, you see a green checkmark, or else we display a red warning indicator.
For a more detailed view about the status of a commit, the commit view provides a listing of the passed or failed builds (if you have multiple builds), and passed or failed tests for each build. This saves you precious time as you don’t have to browse through log files of your CI tool trying to find why the build failed.
An arguably even more useful application of the Bitbucket’s build status API is for pull requests. If you use pull requests to do code reviews (like we do), you know that one of the first questions you always ask as a reviewer is “Do the tests still pass?” This question is now easily answered by looking for a successful build status indicator in the pull request view.
You can also see the build status at a branch level which is great for distributed teams. Make sure your builds have passed before you merge your changes to the master branch.
We’re working on integrating with other CI tools using the build status API. In the meantime, if you want to use build status now, the best way is to write a simple script to post the results of your builds to the Bitbucket API. Most importantly, if you want to build an integration for Bitbucket Cloud with the CI tool of your choice, get started by taking a look at our documentation. We’re excited to see all the integrations you build in the next few weeks.
Happy coding and shipping!
By Kelvin Yap on October 21, 2015
[This is a cross-post from the Atlassian Developer’s Blog. This post is written by Stefan Saasen.]
Many users have embraced Git for its flexibility as a distributed version control system. In particular, Git’s branching and merging model provides powerful ways to decentralize development workflows. While this flexibility works for the majority of use cases, some aren’t handled so elegantly. One of these use cases is the use of Git with large, monolithic repositories, or monorepos. This article explores issues when dealing with monorepos using Git and offers tips to mitigate them.
What is a monorepo?
Definitions vary, but we define a monorepo as follows:
- The repository contains more than one logical project (e.g. an iOS client and a web-application)
- These projects are most likely unrelated, loosely connected, or can be connected by other means (e.g via dependency management tools)
- The repository is large in many ways:
- Number of commits
- Number of branches and/or tags
- Number of files tracked
- Size of content tracked (as measured by looking at the .git directory of the repository)
Facebook has one such example of a monorepo:
With thousands of commits a week across hundreds of thousands of files, Facebook’s main source repository is enormous—many times larger than even the Linux kernel, which checked in at 17 million lines of code and 44,000 files in 2013.
And while conducting performance tests, the test repository Facebook used was as follows:
- 4 million commits
- Linear history
- ~1.3 million files
- The size of the .git directory was roughly 15GB
- The size of the index file was 191MB
There are many conceptual challenges when managing unrelated projects in a monorepo in Git.
First, Git tracks the state of the whole tree in every single commit made. This is fine for single or related projects but becomes unwieldy for a repository containing many unrelated projects. Simply put, commits in unrelated parts of the tree affect the subtree that is relevant to a developer. This issue is pronounced at scale with large numbers of commits advancing the history of the tree. As the branch tip is changing all the time, frequent merging or rebasing locally is required to push changes.
In Git, a tag is a named alias for a particular commit, referring to the whole tree. But usefulness of tags diminishes in the context of a monorepo. Ask yourself this: if you’re working on a web application that is continuously deployed in a monorepo, what relevance does the release tag for the versioned iOS client have?
Alongside these conceptual challenges are numerous performance issues that can affect a monorepo setup.
Number of commits
Managing unrelated projects in a single repository at scale can prove troublesome at the commit level. Over time this can lead to a large number of commits with a significant rate of growth (Facebook cites “thousands of commits a week”). This becomes especially troublesome as Git uses a directed acyclic graph (DAG) to represent the history of a project. With a large number of commits any command that walks the graph could become slow as the history deepens.
Some examples of this include investigating a repository’s history via git log or annotating changes on a file by using git blame. With git blame if your repository has a large number of commits, Git would have to walk a lot of unrelated commits in order to calculate the blame information. Other examples would be answering any kind of reachability question (e.g. is commit A reachable from commit B). Add together many unrelated modules found in a monorepo and the performance issues compound.
Number of refs
A large number of refs (i.e branches or tags) in your monorepo affect performance in many ways.
Ref advertisements contain every ref in your monorepo. As ref advertisements are the first phase in any remote Git operation, this affects operations like git clone, git fetch, or git push. With a large number of refs, performance takes a hit when performing these operations. You can see the ref advertisement by using git ls-remote with a repository URL. For example:
git ls-remote git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
will list all the references in the Linux Kernel repository.
If refs are loosely stored listing branches would be slow. After a git gc refs are packed in a single file and even listing over 20,000 refs is fast (~0.06 seconds).
Any operation that needs to traverse a repository’s commit history and consider each ref (e.g. git branch –contains SHA1) will be slow in a monorepo. In a repository with 21708 refs, listing the refs that contain an old commit (that is reachable from almost all refs) took:
User time (seconds): 146.44*
*This will vary depending on buffer caches and the underlying storage layer.
Number of files tracked
The index or directory cache (.git/index) tracks every file in your repository. Git uses this index to determine whether a file has changed by executing stat(1) on every single file and comparing file modification information with the information contained in the index.
Thus the number of files tracked impacts the performance* of many operations:
- git status could be slow (stats every single file, index file will be large)
- git commit could be slow as well (also stats every single file)
*This will vary depending on buffer caches and the underlying storage layer, and is only noticeable when there are a large number of files, in the realm of tens or hundreds of thousands.
Large files in a single subtree/project affects the performance of the entire repository. For example, large media assets added to an iOS client project in a monorepo are cloned despite a developer (or build agent) working on an unrelated project.
Whether it’s the number of files, how often they’re changed, or how large they are, these issues in combination have an increased impact on performance:
- Switching between branches/tags, which is most useful in a subtree context (e.g. the subtree I’m working on), still updates the entire tree. This process can be slow due to the number of files affected or requires a workaround. Using git checkout ref-28642-31335 — templates for example updates the ./templates directory to match the given branch but without updating HEAD which has the side effect of marking the updated files as modified in the index.
- Cloning and fetching slows and is resource intensive on the server as all information is condensed in a packfile before transfer.
- Garbage collection is slow and by default triggered on a push (if garbage collection is necessary).
- Resource usage is high for every operation that involves the (re-)creation of a packfile, e.g. git upload-pack, git gc.
What about Bitbucket?
Monolithic repositories are a challenge for any Git repository management tool due to the design goals that Git follows, and Bitbucket is no different. More importantly, monolithic repositories pose challenges that need a solution on both the server and client (user) side.
The following table presents these challenges:
While it would be great if Git would support the special use case that monolithic repositories tend to be, Git’s design goals that made it hugely successful and popular are sometimes at odds with the desire to use it in a way it wasn’t designed for. The good news for the vast majority of teams is that really, truly large monolithic repositories tend to be the exception rather than the rule, so as interesting as this post hopefully is, it most likely won’t apply to a situation that you are facing.
That said, there are a range of mitigation strategies that can help when working with large repositories. For repositories with long histories or large binary assets, my colleague Nicola Paolucci describes a few workarounds.
If your repository has refs in the tens of thousands, you should consider removing refs you don’t need them anymore. The DAG retains the history of how changes evolved, while merge commits point to its parents so work conducted on branches can be traced even if the branch doesn’t exist anymore.
In a branch based workflow the number of long lived branches you want to retain should be small. Don’t be afraid to delete a short lived feature branch after a merge.
Consider removing all branches that have been merged into a main branch like master or production. Tracing the history of how changes have evolved is still possible, as long as a commit is reachable from your main branch and you have merged your branch with a merge commit. The default merge commit message often contains the branch name, allowing you to retain this information if necessary.
Handling large numbers of files
If your repository has a large number of files (in the tens to hundreds of thousands), using fast local storage with plenty of memory that can be used as a buffer cache can help. This is an area that would require more significant changes to the client similar for example to the changes that Facebook implemented for Mercurial.
Their approach used file system notifications to record file changes instead of iterating over all files to check whether any of them changed. A similar approach (also using watchman) has been discussed for Git but has not been eventuated yet.
Use Git LFS
For projects that include large files like videos or graphics, Git LFS is one option of integrating your large binary files and limiting the impact on overall performance. My colleague Steve Streeting is an active contributor to the LFS project and recently wrote about the project.
Identify boundaries and split your repository
The most radical workaround is splitting your monorepo into smaller, more focused Git repositories. Try moving away from tracking every change in a single repository and instead identify component boundaries, perhaps by identifying modules or components that have a similar release cycle. A good litmus test for clear subcomponents are the use of tags in a repository, and whether they make sense for other parts of the source tree.
While it would be great if Git supported monorepos elegantly, the concept of a monorepo is slightly at odds with what makes Git hugely successful and popular in the first place. However that doesn’t mean you should give up on the capabilities of Git because you have a monorepo – in most cases there are workable solutions to any issues that arise.
By Amber Frauenholtz on October 8, 2015
This guest post is written by Mike Neumegen, co-founder of CloudCannon. Mike’s passionate about startups, discovering new technologies, and web design.
Bitbucket provides developers with great workflows for collaborating on software projects. So why can’t we have these workflows when building websites for non-developers?
What if you could deploy websites straight from Bitbucket? What if you could build static websites and have the power of a full blown CMS? What if non-developers could update content and have changes pushed back to your Bitbucket repository?
The new CloudCannon Bitbucket add-on makes all of this possible.
What is CloudCannon?
CloudCannon helps agencies and enterprises build websites for non-developers. Developers build a static or Jekyll site and push it to a Bitbucket repository. CloudCannon synchronizes the files and deploys the site live. Non-developers log in and update the content inline. All changes are synced between CloudCannon and Bitbucket.
Netflix, Engine Yard, Agencies, and Freelancers use CloudCannon to rapidly deploy websites for marketing teams and clients.
Introducing the Atlassian Connect add-on
The CloudCannon Atlassian Connect add-on allows you to work on websites without leaving Bitbucket. Deploy your static/Jekyll website directly from your repository and have non-developers update content in seconds.
To get started, navigate to the Bitbucket add-on directory and install CloudCannon.
How it works for developers
Once you install the add-on, visit one of your repositories and a CloudCannon option will be available in the sidebar. Selecting this option will allow you to create sites from this repository. If you already have CloudCannon sites attached to this repository they will be visible here.
After adding a site, your files are cloned from your selected branch. A few seconds later, you have a live website. CloudCannon automatically compiles, optimizes, and deploys your site to a CDN. Any changes you make on that branch appear on CloudCannon. As a developer you can update locally through Git or using the CloudCannon code editor. Changes made on CloudCannon push back to Bitbucket.
How it works for non-developers
Non-developers update content inline without the need to understand Git or the underlying files. CloudCannon abstracts all of that away with a clean and easy to use interface. Just add class=”editable” to any element to allow the non-developer to edit it inline. They can also update metadata and create blog posts through a simple interface.
Get started with CloudCannon and Bitbucket
The CloudCannon Atlassian Connect add-on enables new workflows for the entire team. Developers can build sites locally and deploy them directly from Bitbucket. Non-developers push changes seamlessly by updating content visually.
If you need help setting up your first site, have a read of our get started tutorial, get in touch at firstname.lastname@example.org, or post a comment below.