Wednesday, June 22, 2011

SVN hack: insert missing 'trunk' root directory

The Problem

One of my Sourceforge projects got its Subversion (SVN) repository created incorrectly. The root was missing the 'trunk', 'branches', and 'tags' subdirectories. All the content that should have been under 'trunk' was in the root.

I did not have time to fix this right away, so some development was done while still structured like this. Now I need to fix it (the maven release plugin requires the correct SVN setup). I wanted to find a way to create a 'trunk' directory and move all content into it.

I will not cover details of the fix of the Sourceforge-hosted repository. The instructions for that are given here.

This involves:
  1. Create a local backup of the Sourceforge SVN repository.
  2. Load the repository locally and fix it.
  3. Create the SVN repository image on an Sourceforge shell server.
  4. Replace it with the fixed image.
  5. Commit the new image.
The Sourceforge instructions are for steps 1, and 3 through 5, so I will document step 2 below. This will be the step of relevance for most people.

The Solution

The simplest option is to just create that new 'trunk' subdirectory, and then move all the files there. This solution to the problem is well documented here, along with the drawbacks. (A less detailed discussion of this approach was also found here.)

What I did not like about this method is that it is basically a copying of the files to a new location and corresponding removals from the original location, which I found inelegant. The complete histories of the files will end up split between the root directory and 'trunk' at the time of the move. I wanted a solution where if someone looked at the full history using my method, they would not see that it was ever structured incorrectly.

I found what I felt to be a cleaner (from my perspective) approach which was roughly:
  1. Create a dump of the original repository.
  2. Edit/filter the contents to prepend 'trunk' to all paths.
  3. Create a new repository and load the dump into it.
This approach is actually mentioned in the official SVN documentation. So this should not really be called a hack.

It took some experimenting to deal with problems related to specific details, but the overall approach worked. Here are the specific commands that did it, (omitting the output since that is voluminous). I abbreviate the path to the repositories base on the local file system as $SVN_REPO_BASE.

Perform the dump and rewrite the file paths
> svnadmin dump $SVN_REPO_BASE/oldrepo | sed "s/^Node\(-copyfrom\)\?-path: /Node\1-path: trunk\//" > filtered.dump

This command uses the standard "svnadmin dump" utility, but passes the data through 'sed' to process the contents before writing the actual the dump file details-1.

Create a new repository for the fixed content
> svnadmin create $SVN_REPO_BASE/newrepo
> svn mkdir -m "Create root-level directories" file://$SVN_REPO_BASE/newrepo/trunk file://$SVN_REPO_BASE/newrepo/branches file://$SVN_REPO_BASE/newrepo/tags

We create a bare new repository which will be loaded. I originally missed the 'svn mkdir' command, and that caused the upcoming "load" command to fail details-2. All of the files in the dump will be loaded in the 'trunk' directory, but nothing in that dump file itself actually creates that directory. So we have to do that by hand before loading.

Load the new repository
> svnadmin load $SVN_REPO_BASE/newrepo < filtered.dump

Now the new repository is ready to use. If it is intended to be used somewhere else (e.g. Sourceforge) then a regular dump can be made of THIS repository and loaded elsewhere.

Caveats

  • This approach probably works best for relatively young projects, since the dumps for active, mature projects are likely to be huge and cumbersome to process.

  • This post mentions a need to use 'svndumpfilter' before piping through 'sed', on the notion that it was necessary to handle binary data. I did not need that to make the load work, even though I did have binary files in the repository.

  • The "sed" patterns are designed to match only the metadata headers in the dump, but there is an unlikely chance that the content will be matched and altered. This error can only happen if any file content exactly matches a dump header.




Notes

details-1
The file paths are encoded in the "Node-path:" and "Node-copyfrom-path:" fields in the dump, so these are rewritten to have 'trunk' prepended. The final result is the processed dump file.

details-2
The error I was getting when trying the load without first the 'mkdir' was
 svnadmin: File not found: transaction '0-0', path 'trunk/pom.xml'
I found the explanation of this error here.

No comments:

Post a Comment