Split strings and join lists using Bash

This weekend I wrote some more Bash functions for my .bashrc file that I thought were worth sharing. The idea came after trying to do a similar thing while writing my addtopath function.

The first one is mysplit, which is not much more than a Bash wrapper for a short Awk script. It takes the contents of a file or a string from standard input (so that another program’s output can be piped to it) and splits it in separate lines. It uses a user provided string as a delimiter. Actually, this delimiter is treated as an Awk-compatible regular expression, which only makes this even more powerful.

I am changing Awk’s field delimiter, and then in each line fed to the script, iterate through all the fields and print each on of them in a line of its own.

function mysplit
{
    DEFAULT_DELIMITER=" "

    delimiter=${1:-$DEFAULT_DELIMITER}
    inputfile=$2

    awk -F "$delimiter" \
    '{  for(i = 1; i <= NF; i++) {
            print $i
        }
    }'
$inputfile

    return $?
}

Of course, if there is a mysplit, there has to be a myjoin. And here it is indeed. This function does exactly the opposite of mysplit. It takes several lines from a file or from stdin and turns them into a string, using a user provided delimiter as “glue”.

Here I use an Awk/Sed combination (you have to love those two once you get to know them a bit). With Awk, I’ve changed its ORS (Output Record Separator) from the default new line character to whatever the user wants as delimiter and then simply print everything. There are two problems with this: Awk will print the delimiter also after the last record and, if this delimiter does not contain a new line feed, none will be printed at the end. We need a new line at the end in order to have an output that we can pipe to other programs, as we will see in the examples below. Enter Sed, which I’ve used to replace the last delimiter with a new line character. Problem solved:

function myjoin
{
    DEFAULT_DELIMITER=""

    delimiter=${1:-$DEFAULT_DELIMITER}
    inputfile=$2

    # Avoid treating this delimiter as a regex:
    delimiteresc=`echo "$delimiter"; | addslashesforsed`

    awk -v ORS="$delimiter"  '{print}' $inputfile | \
        sed "s/$delimiteresc\$/\n/" # Remove the last delimiter and
                                    # add a new line char at the end

    return $?
}

The delimiter, in this case, is a normal string but we need to prevent Sed from treating it like a regex, and for that I use addslashesforsed:

function addslashesforsed
{
    filename=$1

    sed $filename \
        -e 's#\\#\\\\#' \
        -e 's#/#\\/#' \
        -e 's#\.#\\.#' \
        -e 's#\*#\\*#' \
        -e 's#\[#\\[#'

    return $?
}

By combining those two functions (mysplit and myjoin) we can obtain a lot of extra functionality. I have coded a simple function that uses both of them in order to remove a specific element from a delimited list that is stored in a string. I have called this function myrmlistitems, which says a lot about my (lack of) function naming skills. I wrote this one with the my previous post’s addtopath function in mind, so that the step where the directory to be inserted in the PATH is first removed from it could be done in a more elegant (and correct, as I have found out later) way.

function myrmlistitems
{
    DEFAULT_DELIMITER=""

    element=$1
    delimiter=${2:-$DEFAULT_DELIMITER}
    inputfile=$3

    [ -z $element ] && return 1

    mysplit "$delimiter" $inputfile | grep -v "^$element\$" \
        | myjoin "$delimiter";
}

Finally, as trivial examples and just for fun, I am using myjoin in the process to calculate the factorial of a number, myrmlistitems to get rid of one of the directories in the PATH and mysplit to show the PATH contents in separate lines

jdevera@aurora:~$ seq 4
1
2
3
4

jdevera@aurora:~$ seq 4 | myjoin '*'
1*2*3*4

jdevera@aurora:~$ seq 4 | myjoin '*' | bc
24

jdevera@aurora:~$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/bin/X11:/usr/games

jdevera@aurora:~$ echo $PATH | myrmlistitems /usr/local/bin :
/usr/bin:/bin:/usr/bin/X11:/usr/games

jdevera@aurora:~$ echo $PATH | mysplit ':'
/usr/local/bin
/usr/bin
/bin
/usr/bin/X11
/usr/games
  1. Reena says:

    Señor Jake:

    CO TI LLE OS. Para eso se tiene un blog. Para co ti lle ar. Quiero saber de su vida en Dublín, así que déjese de frikadas ¬¬

    (Creo que he hablado por muchos de tus lectores jajaja)

    Weno es broma, eh? si vas a usar el blog para cosas que no entiendo pos lo borro de mis feeds :$

    • jdevera says:

      @Reena, apologies first for replying in English and also for disappointing you. This new blog not only brings a change in the language, but also in style and subjects. The old blog was mainly about me whereas this one is mainly about things I learn. It is too early for me to know if all the posts will be as technical as those first ones, although I’d like them not to, but I can already tell you that I doubt there will be the kind of gossip known as CO TI LLE OS here.

      In the future, Once I have generated more content, I might even categorize it or tag it according to the geek factor or technical depth of each article, so that you can filter out the über-geek content that you don’t want to read, but right now, again, it is too early to say.

  2. hongchao says:

    nice article, mate. but when it comes to the list handling, lisp (LISt Processing language) rules all:)

    just kidding. very elegant and useful bash function, i am gonna use them:)

  3. Peeter Joot says:

    if you aim is just to split $PATH, then the following less general solution is one that I often use:

    echo $PATH | tr : ‘\n’

    sometimes we have problematic network components in the $PATH at work … doing something like this, and then reversing it with

    tr ‘\n’ :

    can be very handy until machines come back to life.

    • jdevera says:

      Indeed, using tr is much simpler if your delimiter is only one character. By the time I wrote those functions I was obsessed with the idea of doing this without using the “\n” character, which I failed to do in myjoin. In fact, a more recent version of myjoin (that I still have to post here) uses neither “\n” nor the addslashesforsed helper function.

  1. There are no trackbacks for this post yet.

Leave a Reply