For whatever reason, someone thought it would be a grand idea to have 250 web sites in one subversion repository, that repository, now converted to git, is now 2.5GB and bloated as can be.

Some of the data in there can just sit there and rot, but some of it is still quite useful, so as we need to access one of those sites, here's the approach I've taken to retain the commit history but pull a single site out into it's own git repository.

First we make a copy of our working repository and then cut our ties to our remote so we don't risk putting anything back we don't intend to. Then we filter out only commits having to do with our single web site folder, let's call it taco.com. The result of the filtering is that we now have a folder containing just commits and files for our single folder.

# git clone ssh://git@git.my.tld:1234/var/git/bloated-repository.git
# cd bloated-repository
# git remote rm origin
# git filter-branch --subdirectory-filter taco.com -- --all

Now we can create a new repository and bring all those commits into it, the original folder is still the 2.5GB because it has all of that wonderful change data.

# mkdir taco.com
# cd taco.com
# git init
# git remote add origin ../bloated-repository
# git pull origin master
# git remote rm origin

Et voilĂ ! You have a new folder which only contains those commits to taco.com and you can now work in that folder and add a remote repository to push to like you normally would. In my case this new folder is mere 300MB. Still rather large and in charge, but far better than 2.5GB.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>