Git Gud: Version Control Best Practices

Posted on April 8, 2024 by Gift Kenneth in Data science | 0 Comments

This article was first published on Appsilon | Enterprise R Shiny Dashboards , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Git best practices are essential for developers looking to manage their projects efficiently. In this article, we’ll dive into the key techniques that can transform your version control workflow, ensuring you leverage Git to its full potential for improved productivity and collaboration.‍

We already introduced you to Git and stated why version control is so important in our blog post, Version Control for Pharma: A Comparison of Gitflow and Trunk-based Development, but how do we get the best out of it?

Use Case Scenario

Depending on the project you are working on, there will be different approaches for implementing Git and maintaining the solution in the long run.

Minor projects and POVs don’t need a lot, although using the feature branches approach would be beneficial to track changes easily even in such a simple case. On the other hand, complex solutions which function live on production will require a more sophisticated approach.

The Complex Case

Suppose you have an application running on a production environment and no version control managing it. Any developer can just go where the code is living, edit some stuff, click deploy, and the app’s behavior will be changed in some way. Sounds like a nightmare, doesn’t it? Let’s hope there is at least an additional test environment where one can test the changes before implementing them to production.

But if it also doesn’t use version control, does anyone know where and when any changes occurred? If you’re the developer introducing these changes, will you remember in a month’s time whether you clicked ‘deploy’ after inputting the changes? Hopefully, we all agree this is not a well-designed development environment. This highlights the importance of discussing best practices in version control to improve our development environment.

What Are You Losing When Not Using Version Control?

Let’s start with the disadvantages of the lack of a version control system. When a project doesn’t track the changes being made, developers cannot be sure where and when any changes were introduced. Generally, this implies that it might be really difficult to roll back to a previous state of the project in the event that the newly implemented changes broke something. It may also be impossible to know who made the latest changes, so in a large team spread within an organization, there is no one to ask the reason for implementing a certain change.

Assuming that a project is based on different environments (dev, test, prod), if anything breaks in the code, the fixes could potentially be implemented straight to production, in an effort to speed up development. Without version control, the resulting code base might vary depending on the environment, requiring manual inspection to determine the difference.

‍

To sum up; these are the main reasons why not using version control is bad for your project:

Difficult to track changes
Rollback process is complicated
Deleted files may be lost
No clean and consistent way of implementing changes
No direct responsibility of the changes made

We hope that the drawbacks are clearly visible and will now present how a complex project should be handled in an ideal situation.

Interested in delivering software projects smoothly and efficiently? Discover Appsilon’s best practices for project delivery with R Shiny.‍

The Ideal World

If you are working with a complex solution, as described above, enabling a version control system should be the first thing you do when creating your project. If the project already exists but it doesn’t use Git, implement it right away! You have everything to gain and nothing to lose!

Branching

A project that undergoes the full development cycle ideally should make use of dev, test and prod branches (the naming, as well as the number of such branches, can of course vary depending on needs and approach), which are reflected in three environments, named accordingly. When creating any new features, adjustments of the code, or bumping up dependency versions, all developers should branch out from dev. Depending on the chosen approach, they can also merge to dev, or other branches, e.g. a release branch.

Merging

Merging, or merge requests (or pull requests), are a great part of a developer’s daily work. One should create them when the work on developing a new feature is finished, and the feature should be incorporated into the target branch. The great thing about Git is that there could be multiple merge requests open at the same time, coming from multiple developers or even the same one.

The best practice is to have at least one person responsible for reviewing the changes in the code, and to prevent the author of the pull request from merging the code without the approval of the reviewer. . Ideally, there could be more checks incorporated, like unit tests or code quality checks that run automatically when a merge request is open, and they need to finish successfully in order to be able to merge the changes.

The Cycle

When all the new features have been merged into the dev branch, it is time to add them to the test branch and start actually testing the new version of the code or the app. This could be done internally in the development team or with the help of a group of users who would use the test software and provide feedback.

If the gathered output implies that any additional features, fixes or improvements are needed, then the development cycle starts over. The developer or developers responsible have to source from the dev branch and create new pull requests, to again, get to the testing phase. After the testing phase is finished and the software can be presented to the outside world, the changes are merged to the prod branch and released. A good practice is to either use release branches or tags for the releases, accordingly.

A horizontal flowchart showing the stages of software development, starting with gathering requirements, followed by developing code, testing software, gathering feedback, and ending with deploying to production. There's a feedback loop from the testing phase back to gathering feedback. — Software Development Lifecycle Flowchart

‍

Streamline your Shiny app development with a robust CI/CD pipeline. Learn how to set it up using GitHub Actions and Posit Connect.

‍

Git Best Practices

Hopefully, you are already convinced to use version control and you have some overview on how the process should look like. There are some smaller parts of the whole picture that are still very important and should be considered on a daily basis. Let’s have a look at some best practices!

Basic Commands to Use

If you are struggling with remembering all the crucial git commands, here is the go-to list:

config – get/set repository specific or any global options like username, email, etc.
init – create an empty git repository or reinitialize the process on an existing one
clone – copy the desired git repository into a chosen (new) directory
checkout – switch branches or restore files present in the working tree. It is advisable to perform the latter using restore.stash – stash the changes in a messy working directory away. They can later be applied at a different stage.
add – add all or specific files to be committed
commit – record the changes made to the repository (save the working changes)
status – show the status of the working tree (what files were changed, how many commits were made, etc.)
push – update the remote repository with the committed changes made locally
pull – update the local repository with the changes from the remote repository (fetching new branches, new commits, etc.)
merge – join two or more development histories together, e.g. incorporate changes from another branch into yours
rebase – changing the base of your branch from one commit to another (it then looks like as if you’d created your branch from a different commit)

‍

The order of this list is not incidental, as this is the order in which you would generally execute these commands.

‍

Way of Work

Branching Effectively

When developing new changes, one should create a new branch, meant solely for this feature, fix, or any other adjustment. It’s important to do this effectively, meaning that branches should be named in a way that immediately tells you (and others) what it does. It helps to add a prefix to branches, e.g. features go in feat/, fixes go in fix/. fix/broken-icon-header-module is a great branch name because it immediately tells you what it does. Also, a team should discuss a branching strategy that they feel would be best suited for each project, and stick to it.

Pulling Changes & Resolving Conflicts

During the development of a new feature, other features may already be added to the repository. A developer should take this into account and pull the changes from the main branch often, keeping their own branch up to date. When opening a pull request, the feature branch should already be on track with the main one.

Conflicts in particular files can appear when the changes on the main branch are pulled into the feature branch, but they should always be solved locally, which means they are committed after being tested. This helps avoid unexpected and unintentional breakage in the code.

A flowchart showing two branches in a version control system with a merge conflict. The 'dev' branch has a file with the greeting 'Hi Appsilon!', and the 'feat/new-greeting' branch has modified the same file to say 'Good Morning Appsilon!'. Attempting to merge these changes results in a conflict highlighted in red. — Version Control Conflict Illustration

Conventional Commits

When a developer commits changes, they are obliged to provide a message describing the changes made. Conventional commits make it easier to keep track of the changes without even looking at the files changed. With a starter keyword such as init: or test: or fix:, the messages immediately make sense to anyone going through a project’s history.

There are more advantages to using that approach. If the repository is configured accordingly, a lot of things can then be done automatically, e.g. generating CHANGELOGs, triggering build and publishing processes, determining a semantic version bump. It also makes it easier for other people to contribute to your projects, by allowing them to explore a more structured commit history, which matters a lot especially in open source projects.

Protecting Branches

As mentioned earlier, pull requests are very important, as they provide an unbiased (hopefully) review of the recent changes. Therefore, no new code should be merged to the main codebase without it. In practice it means that all main branches (dev, test, prod) should be protected – a developer should not be able to push directly to one of these branches.

This minimizes the risk of pushing bad code into production and a proper life cycle of branches is maintained.

‍

A meme featuring the 'Roll Safe' guy pointing to his head with text that reads, 'You don't need to submit a pull request if you push to master,' implying a cheeky shortcut in the software development process — Please don’t!

‍

Ignoring Files

It is crucial to remember that not every file should be committed and pushed to the repository. These could be credentials such as API keys (beware!), environmental variables, sometimes private notebooks and other files that are critical or just simply not needed at the remote repository level. Another common example are system dependent artifacts, like .DS_Store in macOS is very often accidentally pushed to the main repository. To avoid pushing them to the server, a developer should put them in the .gitignore file, which basically tells git which files to ignore during commits. This makes it extremely easy to not commit specific files, specific file extensions or even all files in specific folders.

Git Hooks

When working with a complex project, you’ll often be having to do many different things at once. A developer might not spot some small bugs in the changes recently made – trailing whitespace, commented out code or leftover debug statements. This is where Git hooks come in handy! They are nothing more than just scripts, but the idea is that they should run automatically every time a specific event occurs in the Git repository, e.g. a new commit is pushed to the existing pull request.

They can for example inform the developer that they didn’t add a commit message or forgot to bump the developed package version. Git hooks can work on different levels (pre/post) and are widely customizable, allowing developers to automate almost any process in their workflow. The hooks are local to any given Git repository and not tracked within version control. That’s why additional frameworks were created on top of it, e.g. pre-commit, which allows for easy managing and maintaining of multi-language hooks on the pre-commit level.

Master the art of writing durable R code with our expert guide. Learn the best practices for robust and maintainable R programming.

Summary

This article’s goal was to convince you that version control is a great tool that solves a lot of problems, and it actually can be applied to anything from single-developer projects to complex solutions serving lots of users. On top of that, we wanted to show how you can make the best out of using Git.

These commands and best practices should help you on a daily basis, even if your project doesn’t fit the workflow we showed in this example. Just remember to keep your work clean and tidy – name the branches in a self-explainable way, commit often and in small batches, use conventional commit messages, prepare your merge requests the way you would like to receive them, and explore the many options different platforms give you!

Did you find this useful? Learn more about version control and collaboration in our free ebook.‍

Sources

The post appeared first on appsilon.com/blog/.

To leave a comment for the author, please follow the link and comment on their blog: Appsilon | Enterprise R Shiny Dashboards .

Want to share your content on python-bloggers? click here.

Python-bloggers

Data science news and tutorials - contributed by Python bloggers