Git and the hammer - nail dilemma
About code quality, automation and git hooks
Imagine you are in the middle of writing some awesome code for a new project of yours. You know, that code on its own isn't all, because it has to be maintainable. And what is better to ensure correct and maintainable code than to have tests and a coding style and some static analysis and whatnot. So in addition to your awesome code you have decided to use a certain coding style. And you are writing tests for your features. And in addition to your tests you also run static analysis to find possible pitfalls even faster.
But that is a lot of stuff to remember. So perhaps you make it easier to remember you start to write some build-script that allows you to call it with different parameters so that it will run the appropriate commands. So for example build test will run your test-suite and build analyse will run the appropriate command to run the static analysis and build checkstyle will ensure that your chosen coding style is used.
That makes everything a lot easier. Now you do not need to remember the exact invocation of each tool, you only need to remember three different parameters. And to make it even easier you introduce a fourth command that runs all three commands so now after a change and before building the new release of your awesome tool the only thing you need to remember is to run build complete.
This is great as we are developers and we are usually rather lazy. So now I only need to remember one command. Cool!
Let's introduce the next level of awesome! Let's listen to amazing people and use this thing everyone says is absolutely necessary. Let's use this Version Control thingy...
By now almost the only choice we have is Git. Even though it's by now already 15 years old it's by now the most widely used and supported one. So now we can commit atomic units of our code and can roll back to previous commits or develop different features in parallel. So now we commit our code to our local repository and when we are ready we can push that code to a remote repository like GitLab or GitHub.
But at one point we realize that we still have to remember that we need to run build complete before committing or pushing code. Is there not a way to automate that?
There sure is! At one point in time we might stumble upon these things called "git hooks" that allow us to run tasks on certain actions. So like on every commit or before pushing. How cool would it be to hook up our build complete to be run on every commit!
Yes! That sounds awesome! But there are some tradeoffs. Let me show you some pitfalls and then you go off and decide for yourself.
So you want to hook your build-script into the commit-hook. But how long does your build-script take to run? Currently it's only a few seconds but depending on the size of your project that might also take minutes. So suddenly the amazing advantage that git offers us – fast commits – becomes tedious waiting time. So instead of committing small chunks often suddenly the frequency of commits will go down because we don't want to wait for the build-process before committing. And nothing is more annoying than trying to commit some code just to be yelled at that the tests don't run. So we fix that. Next try. Oh! There are some code style issues! OK. Let's fix them as well. Just to then learn that the static analysis has found an issue. At this point we have lost all the momentum that git would allow us. And perhaps we are in a hurry as we just want to commit the current code because we need to immediately fix something at a different part of the code. So instead of just committing the current state to have it saved to continue at it later and to fastly check out a new bugfix-branch we first have to fix all those things now that we wanted to fix later.
You'll go through that once. Perhaps twice. And then you will learn about the --no-verify switch for git commit. It will skip your hooks, so that your awesome automation will not run. And sooner or later your muscle memory will type the -n in every commit.
OK. Then perhaps adding that build-script to the commit-hook was not that wise. But what about the pre-push hook? Every time I push stuff to the server I can check it. Can't I? Sure! That kind of makes sense. At least as long as you only push code when you have completely finished it. In corporate environments, though you should not only commit often you should also push often. Like at least at the end of the day so that someone else can take over should you become sick or need to take on something else the next day. And then you probably have to commit code that is not yet polished. And that violates some of the build-scripts. And then you remember that --no-verify...
So now that we know of git hooks we can solve everything with them, can't we? That somehow sounds like "When you have a hammer, suddenly everything looks like a nail".
So let's have a look at why we introduced these build-scripts in the first place again.
We want to make sure that code that shows up in our main branch is according to our coding styles, passes all tests, and doesn't show issues in static analysis.
That is a great idea! But do we need to ensure that on every commit? Yes, we do, when we always commit to our main branch! But wait for a second! Wasn'T one of the advantages of git that branching is easy and fast? So when we use a dedicated branch for our new development (I'm not talking about the lifetime of such a branch here!) we can do whatever we want in that branch and only have to ensure that our expectations are met when that branch is merged into the main branch. So that would mean we can add our build-script to a pre-merge hook and everything is awesome!
Yes! The only drawback is, that there is only a post-merge hook. No pre-merge... So we could do a merge, run the tests and then revert the merge using a post-merge hook. Sounds rather.... messy...
Or we create our own git alias. Like git config alias.mergeIntoMain "!currentBranch=`git rev-parse --abbrev-ref HEAD` && build complete && git switch main && git merge $currentBranch" (see the Git book for more info on aliases). So now every time we merge something into the main branch using this command the build-command is executed and should it fail the merge is not done.
And suddenly we are using a branch-based merge workflow.
Merge or Pull requests
But as soon as you want to collaborate on your code you can not expect others to not use the -n switch. So you can not be sure they actually fulfilled your requirements. Unless you actually do the merge using your git alias. And then you will either have to fix their issues yourself or send them a list of issues so that they fix them.
Or you start to use Continuous Integration. By automatically executing your build-script on the server. You can use different CI tools for that. GitHub Actions, Gitlab CI, Travis CI, Circle CI are just a few SaaS solutions that allow executing scripts on every push on the server. Why is that different from using the push hook? Well. The push hook executes the script before pushing the code to the server. So you will have to wait for the script to finish successfully before continuing with whatever you wanted to do. The CI setup will work on the server after you pushed the code. So you can continue with whatever you wanted to do and on the server the build script is executed asynchronously. And depending on the server you use after pushing code you can open a Merge- or PullRequest against your main branch so that you can do the merge on the server. And if you have Continuous Integration set up on your repository the systems are so clever to only allow merging when the associated build script executed successfully. So you can actually be sure that the code adheres to your standards before it will be merged into the main branch. Without having to run anything on your machine. Without the possibility to forget to run a script. And most importantly without having to tweak or wait for git hooks.
So now we can manage code without having to use hooks at all. So what are they there for in the first place?
Well. That is not easily answered as it depends on the hooks. Personally I would say the client-side hooks (pre-commit, prepare-commit-msg, commit-msg) should be used to make sure that the metadata for a git commit makes sense and adheres to your ideas. As the CI pipeline takes care of the code-part the only other thing that we can influence is the git metadata, mainly the commit-message. For more information on good commit-messages head over to Chris Beams article "How to Write a Git Commit Message".
There are other hooks like applypatch-msg or pre- and post-applypatch that are mainly for workflows that rely on patch-distribution via Email (that is a completely different blog-post...) or pre-rebase or post-rewrite, post-checkout or the already mentioned post-merge hook that you can use for checking whether the action is allowed or for cleanup afterward.
And then there are some server-side hooks that usually only make sense to use on a git server. So when you have deployed your own git server pre-receive, update and post-receive might be of interest to you. But most git-users will never even see them.
So in essence I would create build-scripts to allow automation but I would always let a CI-System handle the automation. And using a Merge-based workflow I would only merge code into the main-branch after the CI-System ran the build-scripts successfully.
That way I (or any other contributor) can always run the build-tools myself the same way the CI-System does it. But I can also commit whenever and whatever I want without being restricted to being able to run the build-scrips successfully. So it's not necessary to skip the hook-execution during commit or push.
The only thing I usually use hooks for is making sure that the git-metadata (i.e. the commit-message) adheres to the criteria I deem sensible and important.Tags: Development, Workflow, git