Dot Slash Star: 2011

Friday, October 14, 2011

Shell Trick: Dynamic bashrc Setup

Anyone who works on Unix has run into the issue of setting up the environment consistently among multiple hosts. If you are a power user, you likely have developed a robust set of aliases, functions, variable settings, etc. that streamline your command-line effectiveness. However, when you create a new account on a different host, it becomes a bigger task to set it up like your others.

Besides just the volume of customizations, each different environment may require specific settings. For example, the path to your favorite editor may be different.

Overview

I have come up with a solution for configuring a bash (Bourne Again Shell) environment which allows common settings and definitions, and also customizations at the level of operating system and host. The modularity is achieved by hooking the framework minimally into .bashrc and then using separate files for the actual settings so the proper ones can be selected on different environments. A graphical representation of this framework is shown in figure 1.

Figure 1.

Hooking the Framework into the Shell

The novice approach for customization is to put all the definitions in the .bashrc file. This becomes unmanageable quickly. Another downside of this is that the base image of this file can be different on different systems. Instead, this framework requires that only two lines are added at the end of the user .bashrc file.

export ENVSETUP="${HOME}/envsetup"
[[ -f ${ENVSETUP}/bashrc-common && -n "${PS1}" ]] && . ${ENVSETUP}/bashrc-common

The first line declares where the framework is located. In this example, it is the envsetup directory in the user's home. The framework can be installed anywhere, as long as ENVSETUP is defined to be that locations. The second line calls the framework after performing two checks. First it checks that the expected main file exists. Then it checks that this is an interactive shell. A non-interactive shell (like one created for scp) will not need the customizations, and actually could fail because of output generated by the framework.

Driving the Framework

The entry point to the framework is the driver file bashrc-common. This is meant to be just structural logic, with the actual settings is separate files. All customizations should go in those other files. This file needs to be described to fully understand the framework.

# A utility to do more env setup using the given file.
# Expect ENVSETUP to be properly defined.
SOURCE_LOG=""
sourceFrom () {
   base_file="$1"; shift
   source_file="${ENVSETUP}/${base_file}";
   if [ -f "${source_file}" ]; then
       # If an optional message was given, print it first.
       [[ $# -gt 0 ]] && { printf "$1\n"; shift; }

       . ${source_file}
       export SOURCE_LOG="${SOURCE_LOG}:${base_file}"
       return 0
   else
       return 1
   fi

   # Note that if the file does not exist, no action is taken (silently).
   # Only the return value indicates that action was done.
}

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

# Common aliases are defined externally.
sourceFrom bashrc-common-aliases

# Common Functions are defined externally.
sourceFrom bashrc-common-functions

# Platform-specific setup
printf "Bourne Again Shell "
os=$(uname -s)
case ${os} in
   Linux)
       sourceFrom bashrc-os-linux "on Linux.";;
   SunOS)
       sourceFrom bashrc-os-sunos "on SunOS.";;
   CYGWIN_NT-5.1)
       sourceFrom bashrc-os-win "on Windows XP.";;
   CYGWIN_NT-5.0)
       sourceFrom bashrc-os-win "on Windows 2000.";;
   CYGWIN_NT-4.0)
       sourceFrom bashrc-os-win "on Windows NT.";;
   *)
       printf "Unknown OS: %s\n" "${os}"
esac

# host-specific setup
sourceFrom "bashrc-host-${HOSTNAME}" "Customization for host: ${HOSTNAME}"

# Final processing after OS-specific and host-specific setup.
sourceFrom bashrc-common-final

The first element is the definition of a shell function which will assist in calling the sub-files. It is not necessary to detail it here. The main logic is after the dividing line.

Common Aliases and Functions

First it reads the definitions of the aliases and the functions from separate files. There is no strict reason for these two to be in separate files other than it makes for cleaner organization.

Operating System Specific Settings

Next comes the critical logic of dynamically selecting configuration files depending on the operating system and the specific host. The magic of determining the OS is done with the uname -s command. Notice that the output of this will specify the OS, but the actual file that gets called as a result is defined by the user. I have mapped various Windows operating systems to the single file bashrc-os-win (although a different message gets printed for each).

Note that this list of operating systems is incomplete. It reflects only those systems that I have tested this framework on. Other users would extend it to the other systems they are using. It it encounters a undefined system, the output will display the string to add to the case statement.

Host Specific Settings

Although the OS requires a specific collection of settings, it may be necessary to further customize the environment to have settings for an individual host. This is done by the next step. It only requires that the settings be given in the file named bashrc-host-HOSTNAME (where HOSTNAME is replaced by the value which that environment variable will have on that system).

Final Common Settings

The final step is to call a file containing common settings that are dependent on earlier settings that come from OS-dependent of host-dependent files. To understand this, consider this example.

export JDK_HOME=${JAVA_HOME}

This will ensure that JDK_HOME has the same value of JAVA_HOME, but the latter will likely be defined in a host specific file. Since we want these two variables to have the same value on all systems, we want this to be in a common section.

Output

You may have noticed several printf statements in the listing. The purpose of these is to provide some minimal information about the environment while allowing the ability of doing some basic debugging. Consider the output that I see on my main system:

Bourne Again Shell on Linux.
Customization for host: king

The first line is always printed starting with "Bourne Again Shell" as an indication that the framework was triggered. It ends naming the OS customization it is using (or that it could not match the OS). The second line states which host-specific file will be used.

The usefulness of this output comes from the fact that the OS and host are the variables that change from system to system. You can understand how those variables were resolved from these two lines.

Why Use this Framework

The power of this framework is realized when a fresh new environment needs to be set up on a new host. That is done simply by copying all files from one existing system to the new one, and then creating a new host specific file from another existing one.

This is easy to do because of the following advantages of this framework:

All files are in a separate subdirectory. They are not intermixed with other, unrelated files in the user's home directory.
All the files are not-hidden. Searching through and debugging a set of hidden files can be inconvenient.
Exactly the same set of files (with same content) used on different systems.

How to Use this Framework

Implementing this yourself can be done in the following steps.

Create a directory for all the setup files. (This will be referenced in the scripts as ENVSETUP.)
Create the file bashrc-common and copy the contents from above.
Add the two lines to the end of .bashrc as described above. (At this point, the bare framework is working.)
Create the setup files for your operating system and host. You may need to customize bashrc-common if it does not recognize your operating system.
Move all your customizations to the appropriate setup files.

More Debugging Details

If it is unclear which files are being read in which order, then the following command can be executed to find out for sure. (Note that the tr command is used to split the single-line output into multiple lines.)

echo $SOURCE_LOG | tr : \\n

On my main system, this produces the following. It displays the exact sequence.

bashrc-common-aliases
bashrc-common-functions
bashrc-os-linux
bashrc-host-king
bashrc-common-final

Friday, August 5, 2011

Shell Trick: removing all empty directories

Given a directory structure (a subtree) that contains files and directories, I sometimes want to quickly delete all empty directories. There is a simple one-line command that will do this.

There are also many incorrect solutions that are suggested in online discussions. Before presenting my script that does this, let us examine some of the other suggested ways to do it. They are not completely wrong, they just work only for some special cases.

Consider the example at left. The command will need to walk down the directory tree and search for directories. Any directories that are empty need to be removed.

We want to start examining at d1. Notice that we will find 3 empty subdirectories — d3, d6, and d7. If we remove those, we will create one newly empty directory — d5. Removing that will cause d4 to become empty. Then d4 can be removed and there will be no empty directories remaining.

I have found some suggested solutions but they have drawbacks:

  find d1 -type d -empty -exec rmdir {} \;

This cannot work because it traverses the tree top-down. It will find the first pass of empty directories, but not the ones that become empty when their subdirectories are removed.

  find d1 -type d -empty -exec rmdir -p {} \;

Actually this almost works. The difference with the previous flavor is that it knows that it is operating only on leaf directories, but it also then tries to remove empty parent directories. But it has the problem of not knowing where to stop. There is no bound on the root, so it could remove a directory at a level higher than what we specify.

The correct solution is

  find d1 -depth -type d -empty -exec rmdir {} \;

The important aspect is that we need to do a depth-first traversal of the sub-tree. This will allow us to handle the cases where previously non-empty directories become empty.

So I came up with the following script and named it "rmEmpty.sh". Because the action can be done with a single-line command, it really is not strictly necessary to encapsulate it in a script. But notice how many moving parts are working in the argument list to 'find'. Putting it in a script saves quite some typing. Besides the convenience of this, it also adds a factor of safety.

#!/bin/bash
#-----------------------------------------------------------------------------
# Remove all empty directories under and including the given root(s).
#-----------------------------------------------------------------------------

# If no args given, then root at $PWD.
[ $# == "0" ] && set "$PWD"

# Process each arg as a root to examine.
for root in "$@"; do
   if [ -d "${root}" ]; then
       find "${root}" -depth -type d -empty -exec rmdir -v {} \;
   else
       printf "Not a directory: [%s]\n" "${root}"
   fi
done

So notice that the action can now be invoked simply as

rmEmpty.sh d1

A couple points:

Notice that the quoting of variables as shown is strictly necessary to handle directory names which may contain spaces.
The script is verbose about each removal so you can explicitly see all the actions. The "-v" can be removed from the call to 'rmdir' to reduce verbosity.
One curiosity of this script is that if you run it with no args, it starts processing from your current working directory. It could end up deleting your current directory. This is not an error condition! You can change to another existing directory and continue working.

Monday, June 27, 2011

Shell Trick: cron script indirection

The Problem

When you add crontab entries, it is easy to forget that they will be executed in a different environment than your usual shell. None of the settings from your .bashrc will be available. The environment will not be empty, but it will be pretty bare. If your command depends on any of these settings, the will fail. The different environment also makes it trick to test. You might end up with a long, complicated one-line command that is hard to maintain.

The Solution

One solution is to use the 'env' command to explicitly build the environment for your command. This can work, but it can make your entry very long and hard to read (and thus hard to test/debug/maintain).

My favored approach is to make all crontab entries simply invoke a script that defines all the settings and then invokes the desired command. I create the directory ${HOME}/shrawenv to hold all these scripts. (I chose this name to denote "for shell with raw environment".)

For example, I would have a crontab entry like

00 5 * * * sh /home/user/shrawenv/update_project.sh

Then I would define the 'update_project.sh' script as

#!/bin/sh

export APP_HOME=/usr/share/app-1.1
export SOME_LIB=/usr/share/lib-2.3
$APP_HOME/bin/projectApp arg1 arg2

This dummy example merely demonstrates how to set environment variables before invoking a command. It is a good idea to always use full paths in crontab because PATH may not be what you are used to.

In my crontab, all the commands are just invoking scripts from shrawenv. I want to to list only a simple command and the time of execution. I find it much cleaner to have all the details of the command separately in the script file.

Note that shrawenv is useful not just for cron, but for other contexts where execution is not happening in the usual shell command line. I also use it to house scripts to be called by desktop launchers.

Advantages

Readability: Because this is a script file, it can edited and formated for elegance. Using 'env' would force it all into one line, and may require some syntactic gymnastics.
Maintainability: If I have to make any adjustments to the logic, I do not edit the crontab. Instead I just modify the internals of the script as needed.
Testability: You do not have to wait for cron to run to see what the script does. You can simply execute the script. There is an explanation of how to simulate the cron environment here.
Archivability: I keep my shrawenv under revision control. (I use subversion locally.) If I took the approach of putting all the logic in the crontab, I would not have any history.

Wednesday, June 22, 2011

SVN hack: insert missing 'trunk' root directory

The Problem

One of my Sourceforge projects got its Subversion (SVN) repository created incorrectly. The root was missing the 'trunk', 'branches', and 'tags' subdirectories. All the content that should have been under 'trunk' was in the root.

I did not have time to fix this right away, so some development was done while still structured like this. Now I need to fix it (the maven release plugin requires the correct SVN setup). I wanted to find a way to create a 'trunk' directory and move all content into it.

I will not cover details of the fix of the Sourceforge-hosted repository. The instructions for that are given here.

This involves:

Create a local backup of the Sourceforge SVN repository.
Load the repository locally and fix it.
Create the SVN repository image on an Sourceforge shell server.
Replace it with the fixed image.
Commit the new image.

The Sourceforge instructions are for steps 1, and 3 through 5, so I will document step 2 below. This will be the step of relevance for most people.

The Solution

The simplest option is to just create that new 'trunk' subdirectory, and then move all the files there. This solution to the problem is well documented here, along with the drawbacks. (A less detailed discussion of this approach was also found here.)

What I did not like about this method is that it is basically a copying of the files to a new location and corresponding removals from the original location, which I found inelegant. The complete histories of the files will end up split between the root directory and 'trunk' at the time of the move. I wanted a solution where if someone looked at the full history using my method, they would not see that it was ever structured incorrectly.

I found what I felt to be a cleaner (from my perspective) approach which was roughly:

Create a dump of the original repository.
Edit/filter the contents to prepend 'trunk' to all paths.
Create a new repository and load the dump into it.

This approach is actually mentioned in the official SVN documentation. So this should not really be called a hack.

It took some experimenting to deal with problems related to specific details, but the overall approach worked. Here are the specific commands that did it, (omitting the output since that is voluminous). I abbreviate the path to the repositories base on the local file system as $SVN_REPO_BASE.

Perform the dump and rewrite the file paths

> svnadmin dump $SVN_REPO_BASE/oldrepo | sed "s/^Node\(-copyfrom\)\?-path: /Node\1-path: trunk\//" > filtered.dump

This command uses the standard "svnadmin dump" utility, but passes the data through 'sed' to process the contents before writing the actual the dump file ^details-1.

Create a new repository for the fixed content

> svnadmin create $SVN_REPO_BASE/newrepo
> svn mkdir -m "Create root-level directories" file://$SVN_REPO_BASE/newrepo/trunk file://$SVN_REPO_BASE/newrepo/branches file://$SVN_REPO_BASE/newrepo/tags

We create a bare new repository which will be loaded. I originally missed the 'svn mkdir' command, and that caused the upcoming "load" command to fail ^details-2. All of the files in the dump will be loaded in the 'trunk' directory, but nothing in that dump file itself actually creates that directory. So we have to do that by hand before loading.

Load the new repository

> svnadmin load $SVN_REPO_BASE/newrepo < filtered.dump

Now the new repository is ready to use. If it is intended to be used somewhere else (e.g. Sourceforge) then a regular dump can be made of THIS repository and loaded elsewhere.

Caveats

This approach probably works best for relatively young projects, since the dumps for active, mature projects are likely to be huge and cumbersome to process.

This post mentions a need to use 'svndumpfilter' before piping through 'sed', on the notion that it was necessary to handle binary data. I did not need that to make the load work, even though I did have binary files in the repository.

The "sed" patterns are designed to match only the metadata headers in the dump, but there is an unlikely chance that the content will be matched and altered. This error can only happen if any file content exactly matches a dump header.

Notes

^details-1

The file paths are encoded in the "Node-path:" and "Node-copyfrom-path:" fields in the dump, so these are rewritten to have 'trunk' prepended. The final result is the processed dump file.

^details-2

The error I was getting when trying the load without first the 'mkdir' was

 svnadmin: File not found: transaction '0-0', path 'trunk/pom.xml'

I found the explanation of this error here.

Saturday, May 28, 2011

Shell Trick: Create a directory and change to it in one command

Often when you create a directory, you are also intending to change to that directory to continue with other operations. This is usually done as two consecutive commands

> mkdir newdir
> cd newdir

But wouldn't this be more easily done in one single command? After all, the same directory name is used in each command.

I created a bash shell function^details-1 that does exactly this. I called it "mkcd", since it is basically a combination of "mkdir" and "cd". It is defined as

mkcd () {
  if [ $# == 1 ]; then
      dir="$1"
      printf "mkcd %s\n" "${dir}"
      if [ -d "${dir}" ]; then
          cd "${dir}"
      else
          mkdir -p ${dir}
          if [ $? == 0 ]; then
              cd "${dir}"
          fi
      fi
  else
      printf "Usage: mkcd <dir>\n"
  fi
}

This would be defined in a .bashrc environment initialization file.

This will skip the directory creation if it already exists. It actually uses "mkdir -p" so that it also creates any intermediate directory levels needed. It also only performs the "cd" if directory creation had no errors.

The operation can now be done as

> mkcd newdir

Notes

^details-1: Note that this is a function and not a shell script. Doing it as a script will not work because that runs as a child process, so the directory change happens in a different process and you will remain in the same directory in which you started.