Posts Tagged ‘bash’

Split strings and join lists using Bash

Posted in General on March 10th, 2009 by Jacobo de Vera – 6 Comments

This weekend I wrote some more Bash functions for my .bashrc file that I thought were worth sharing. The idea came after trying to do a similar thing while writing my addtopath function.

The first one is mysplit, which is not much more than a Bash wrapper for a short Awk script. It takes the contents of a file or a string from standard input (so that another program’s output can be piped to it) and splits it in separate lines. It uses a user provided string as a delimiter. Actually, this delimiter is treated as an Awk-compatible regular expression, which only makes this even more powerful.

I am changing Awk’s field delimiter, and then in each line fed to the script, iterate through all the fields and print each on of them in a line of its own.

function mysplit
{
    DEFAULT_DELIMITER=" "

    delimiter=${1:-$DEFAULT_DELIMITER}
    inputfile=$2

    awk -F "$delimiter" \
    '{  for(i = 1; i <= NF; i++) {
            print $i
        }
    }'
$inputfile

    return $?
}

Of course, if there is a mysplit, there has to be a myjoin. And here it is indeed. This function does exactly the opposite of mysplit. It takes several lines from a file or from stdin and turns them into a string, using a user provided delimiter as “glue”.

Here I use an Awk/Sed combination (you have to love those two once you get to know them a bit). With Awk, I’ve changed its ORS (Output Record Separator) from the default new line character to whatever the user wants as delimiter and then simply print everything. There are two problems with this: Awk will print the delimiter also after the last record and, if this delimiter does not contain a new line feed, none will be printed at the end. We need a new line at the end in order to have an output that we can pipe to other programs, as we will see in the examples below. Enter Sed, which I’ve used to replace the last delimiter with a new line character. Problem solved:

function myjoin
{
    DEFAULT_DELIMITER=""

    delimiter=${1:-$DEFAULT_DELIMITER}
    inputfile=$2

    # Avoid treating this delimiter as a regex:
    delimiteresc=`echo "$delimiter"; | addslashesforsed`

    awk -v ORS="$delimiter"  '{print}' $inputfile | \
        sed "s/$delimiteresc\$/\n/" # Remove the last delimiter and
                                    # add a new line char at the end

    return $?
}

The delimiter, in this case, is a normal string but we need to prevent Sed from treating it like a regex, and for that I use addslashesforsed:

function addslashesforsed
{
    filename=$1

    sed $filename \
        -e 's#\\#\\\\#' \
        -e 's#/#\\/#' \
        -e 's#\.#\\.#' \
        -e 's#\*#\\*#' \
        -e 's#\[#\\[#'

    return $?
}

By combining those two functions (mysplit and myjoin) we can obtain a lot of extra functionality. I have coded a simple function that uses both of them in order to remove a specific element from a delimited list that is stored in a string. I have called this function myrmlistitems, which says a lot about my (lack of) function naming skills. I wrote this one with the my previous post’s addtopath function in mind, so that the step where the directory to be inserted in the PATH is first removed from it could be done in a more elegant (and correct, as I have found out later) way.

function myrmlistitems
{
    DEFAULT_DELIMITER=""

    element=$1
    delimiter=${2:-$DEFAULT_DELIMITER}
    inputfile=$3

    [ -z $element ] && return 1

    mysplit "$delimiter" $inputfile | grep -v "^$element\$" \
        | myjoin "$delimiter";
}

Finally, as trivial examples and just for fun, I am using myjoin in the process to calculate the factorial of a number, myrmlistitems to get rid of one of the directories in the PATH and mysplit to show the PATH contents in separate lines

jdevera@aurora:~$ seq 4
1
2
3
4

jdevera@aurora:~$ seq 4 | myjoin '*'
1*2*3*4

jdevera@aurora:~$ seq 4 | myjoin '*' | bc
24

jdevera@aurora:~$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games

jdevera@aurora:~$ echo $PATH | myrmlistitems /usr/local/bin :
/usr/bin:/bin:/usr/bin/X11:/usr/games

jdevera@aurora:~$ echo $PATH | mysplit ':'
/usr/local/bin
/usr/bin
/bin
/usr/bin/X11
/usr/games

Adding directories to the PATH in Bash

Posted in General on February 21st, 2009 by Jacobo de Vera – 3 Comments

When I am working on a Linux machine for which I don’t have administrator rights and want to use a program that is not available, or even a newer version of a program that’s already installed, I just install it somewhere in my home directory and add the directory where it is stored to the PATH environment variable. If I want to override an existing version of a program that is already in the system, I will have to prefix my program’s directory to the PATH, so that it is found before its system wide counterpart. If I just want to add some functionality that was not available, I can either prefix or append the directory to the PATH. The usual way to do is is to add lines like these to the shell configuration file (.bashrc for my beloved Bash).

export PATH=/home/jdevera/mybinaries:$PATH
export PATH=$PATH:/home/jdevera/mybinaries

I find this a bit unpleasant to read, so I use these functions instead:

function addtopath
{
    directory=`echo $1 | sed 's#/$##'`  # remove trailing slash
    where=$2

    # Add only existing directories
    if [ ! -d $directory ]; then
        return 1
    fi

    # If the directory is already in the path, remove it so that
    # it can be inserted in the desired position without
    # poluting $PATH with duplicates
    newpath=`echo $PATH | tr ':' '\n' | \
             grep -v "^$directory\$" | \
             xargs | tr ' ' ':'`

   
    if [ $where = "beg" ]; then    # Prefix to $PATH
        export PATH=$directory:$newpath
    elif [ $where = "end" ]; then  # Append to $PATH
        export PATH=$newpath:$directory
    else
        return 1
    fi

    return 0
}

# Convenience wrappers for addtopath
#
function pathappend  { addtopath $1 end; return $?; }
function pathprepend { addtopath $1 beg; return $?; }

The main function, addtopath, is smarter than the previous option (exporting the new value):

  • It checks that the directory exists before adding it to the PATH
  • It avoids duplicates in the PATH; if the directory is already in the $PATH, it is removed prior to reinsertion. This means that the directory will be moved within PATH to the desired position
  • It can prefix or append the directory to the PATH

I have also written two small wrappers, pathprepend and pathappend, that prefix and append, respectively, a directory to the PATH, thus making the calls even easier and more legible.

I can now write something like this:

pathappend /home/jdevera/mynewbinaries
pathprepend /home/jdevera/myoverrides

And will get them both in the PATH like this:

jdevera@aurora:~$ echo $PATH
/home/jdevera/myoverrides:/usr/local/bin:/usr/bin:/bin:/usr/games:/home/jdevera/mynewbinaries

Edit: I added semicolons after the return statements in the wrapper functions and this now works on the Korn shell as well.