Enrico Rossi

Opensint repo rebuild git rebase and cherry-pick

The opensint is one of the most git-complicated project I’ve done. In the beginning this project was developed using arch revisioning system (aka tla) and later it was migrated to git with git-arch tool.

Its log sees more than 10 branches, 3 releases and other sub-branches used for testing purpose, all combined with merges in a very unclean way. Moreover the very first commits includes product’s datasheet, which may or may not freely redistributable. In this scenario, I had to create another repository to make the source available to the public, in which all the commits between releases were squeezed together and cleaned up of all unnecessary code and other stuff.

This until today. I have decided to play with git and find a way to rebuild the entire opensint archive from the beginning, removing all sensible data and clean it up, rearranging the patches without making the code bugged(at least not more than it was before :) ).

This is a summary of what I have done. Before start backup everything, trust me, I have failed to apply patches and messed up with the archive at least a dozen time.

First I created a new fresh tree with a void commit

git symbolic-ref HEAD refs/heads/newroot

remove every file but the .git folder

git rm --cached -r .
git clean -f -d

and create an empty commit as the first in my new tree

git commit --allow-empty -m 'init'

import the first two original commits, squeeze and edit them (remove unwanted files) then commit the result into my new tree

git cherry-pick --no-commit $(git rev-list --reverse master | head -2)
rm doc/unwanted_docs.*
git rm doc/unwanted_docs.*
git commit

now we move the first branch called 1x. This is easy because it is a linear branch, no split or merge

git rebase --onto newroot newroot 1x

this will simply apply step by step the 1x patches to the newroot branch, the first two 1x patches can not be applied to newroot since we changed the original commit before. Do a

git rebase --skip

twice to skip these wrong patches and the 1x branch will be migrated over the newroot branch. The situation before was like this

o-o-o-o 1x
     \-o-o-... master & other branches, tags

and after the newroot become like this

o-o-o-o-o 1x
     \- newroot 

now following the old master branch I encounter something like this (1, 2, 3 are patches, M is a merge)

... o-o-1-2----M-o ...

if the merge M has no conflicts than we can apply 1, 2, 3 patchset, or 3, 1 ,2 or squeeze all 3 patches together in a single one.

a) ... -o-1-2-3'-o- ...
b) ... -o-3-1'-2'-o- ...
c) ... -o-(1+2+3)-o- ...

Anyway somewhere the code become different from the original at some point (3' is different from 3, 1’ and 2’ are different from 1 and 2), but if all the patches are applied without conflicts, the code will return as the original after the patches.

Now some complications, the (a) is what a rebase will try to do automatically. Now suppose the merge M has conflicts, well this conflict must be handled manually again whatever path you choose to follow.

           /-tag: 2.5.7
... o-o-1-2----M-o-o ...
       \----3-/     \-tag: 2.5.9
             \- tag: 2.5.8

If (3) in the original is a release like rel. 2.5.8, the only way to keep the code available is (b). We can use cherry-pick to apply the single patches one by one and rebuild the tree as we want. Even more complication, suppose that (2) in the original is another release like rel. 2.5.7, than the only way to keep the original code at (2) and (3) is to rebuild the tree like the original one.

To list all the hash from A to B, and then apply all these patches to the current branch, note hash_A is not listed while hash_B is, use:

git rev-list --reverse hash_A..hash_B
git cherry-pick $(git rev-list --reverse hash_A..hash_B)

Now, with all the above, I was able to rebuild all the tree (see my git website), on some branches using rebase, on others cherry-pick and eventually by merging branches by hand (again).

If you find this usefull, please let me know.