Monday, June 27, 2011

Shell Trick: cron script indirection

The Problem

When you add crontab entries, it is easy to forget that they will be executed in a different environment than your usual shell. None of the settings from your .bashrc will be available. The environment will not be empty, but it will be pretty bare. If your command depends on any of these settings, the will fail. The different environment also makes it trick to test. You might end up with a long, complicated one-line command that is hard to maintain.

The Solution

One solution is to use the 'env' command to explicitly build the environment for your command. This can work, but it can make your entry very long and hard to read (and thus hard to test/debug/maintain).

My favored approach is to make all crontab entries simply invoke a script that defines all the settings and then invokes the desired command. I create the directory ${HOME}/shrawenv to hold all these scripts. (I chose this name to denote "for shell with raw environment".)

For example, I would have a crontab entry like
00 5 * * * sh /home/user/shrawenv/
Then I would define the '' script as

export APP_HOME=/usr/share/app-1.1
export SOME_LIB=/usr/share/lib-2.3
$APP_HOME/bin/projectApp arg1 arg2
This dummy example merely demonstrates how to set environment variables before invoking a command. It is a good idea to always use full paths in crontab because PATH may not be what you are used to.

In my crontab, all the commands are just invoking scripts from shrawenv. I want to to list only a simple command and the time of execution. I find it much cleaner to have all the details of the command separately in the script file.

Note that shrawenv is useful not just for cron, but for other contexts where execution is not happening in the usual shell command line. I also use it to house scripts to be called by desktop launchers.

Because this is a script file, it can edited and formated for elegance. Using 'env' would force it all into one line, and may require some syntactic gymnastics.

If I have to make any adjustments to the logic, I do not edit the crontab. Instead I just modify the internals of the script as needed.

You do not have to wait for cron to run to see what the script does. You can simply execute the script. There is an explanation of how to simulate the cron environment here.

I keep my shrawenv under revision control. (I use subversion locally.) If I took the approach of putting all the logic in the crontab, I would not have any history.

Wednesday, June 22, 2011

SVN hack: insert missing 'trunk' root directory

The Problem

One of my Sourceforge projects got its Subversion (SVN) repository created incorrectly. The root was missing the 'trunk', 'branches', and 'tags' subdirectories. All the content that should have been under 'trunk' was in the root.

I did not have time to fix this right away, so some development was done while still structured like this. Now I need to fix it (the maven release plugin requires the correct SVN setup). I wanted to find a way to create a 'trunk' directory and move all content into it.

I will not cover details of the fix of the Sourceforge-hosted repository. The instructions for that are given here.

This involves:
  1. Create a local backup of the Sourceforge SVN repository.
  2. Load the repository locally and fix it.
  3. Create the SVN repository image on an Sourceforge shell server.
  4. Replace it with the fixed image.
  5. Commit the new image.
The Sourceforge instructions are for steps 1, and 3 through 5, so I will document step 2 below. This will be the step of relevance for most people.

The Solution

The simplest option is to just create that new 'trunk' subdirectory, and then move all the files there. This solution to the problem is well documented here, along with the drawbacks. (A less detailed discussion of this approach was also found here.)

What I did not like about this method is that it is basically a copying of the files to a new location and corresponding removals from the original location, which I found inelegant. The complete histories of the files will end up split between the root directory and 'trunk' at the time of the move. I wanted a solution where if someone looked at the full history using my method, they would not see that it was ever structured incorrectly.

I found what I felt to be a cleaner (from my perspective) approach which was roughly:
  1. Create a dump of the original repository.
  2. Edit/filter the contents to prepend 'trunk' to all paths.
  3. Create a new repository and load the dump into it.
This approach is actually mentioned in the official SVN documentation. So this should not really be called a hack.

It took some experimenting to deal with problems related to specific details, but the overall approach worked. Here are the specific commands that did it, (omitting the output since that is voluminous). I abbreviate the path to the repositories base on the local file system as $SVN_REPO_BASE.

Perform the dump and rewrite the file paths
> svnadmin dump $SVN_REPO_BASE/oldrepo | sed "s/^Node\(-copyfrom\)\?-path: /Node\1-path: trunk\//" > filtered.dump

This command uses the standard "svnadmin dump" utility, but passes the data through 'sed' to process the contents before writing the actual the dump file details-1.

Create a new repository for the fixed content
> svnadmin create $SVN_REPO_BASE/newrepo
> svn mkdir -m "Create root-level directories" file://$SVN_REPO_BASE/newrepo/trunk file://$SVN_REPO_BASE/newrepo/branches file://$SVN_REPO_BASE/newrepo/tags

We create a bare new repository which will be loaded. I originally missed the 'svn mkdir' command, and that caused the upcoming "load" command to fail details-2. All of the files in the dump will be loaded in the 'trunk' directory, but nothing in that dump file itself actually creates that directory. So we have to do that by hand before loading.

Load the new repository
> svnadmin load $SVN_REPO_BASE/newrepo < filtered.dump

Now the new repository is ready to use. If it is intended to be used somewhere else (e.g. Sourceforge) then a regular dump can be made of THIS repository and loaded elsewhere.


  • This approach probably works best for relatively young projects, since the dumps for active, mature projects are likely to be huge and cumbersome to process.

  • This post mentions a need to use 'svndumpfilter' before piping through 'sed', on the notion that it was necessary to handle binary data. I did not need that to make the load work, even though I did have binary files in the repository.

  • The "sed" patterns are designed to match only the metadata headers in the dump, but there is an unlikely chance that the content will be matched and altered. This error can only happen if any file content exactly matches a dump header.


The file paths are encoded in the "Node-path:" and "Node-copyfrom-path:" fields in the dump, so these are rewritten to have 'trunk' prepended. The final result is the processed dump file.

The error I was getting when trying the load without first the 'mkdir' was
 svnadmin: File not found: transaction '0-0', path 'trunk/pom.xml'
I found the explanation of this error here.