Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 32

You're already running it

If you check, you'll probably find that you are running bash right now. Even if you
changed your default shell, bash is probably still running somewhere on your system,
because it's the standard Linux shell and is used for a variety of purposes. Because bash
is already running, any additional bash scripts that you run are inherently memory-
efficient because they share memory with any already-running bash processes. Why
load a 500K interpreter if you already are running something that will do the job, and do
it well?

Back to top

You're already using it

Not only are you already running bash, but you're actually interacting with bash on a
daily basis. It's always there, so it makes sense to learn how to use it to its fullest
potential. Doing so will make your bash experience more fun and productive. But why
should you learn bash programming? Easy, because you already think in terms of
running commands, CPing files, and piping and redirecting output. Shouldn't you learn
a language that allows you to use and build upon these powerful time-saving constructs
you already know how to use? Command shells unlock the potential of a UNIX system,
and bash is the Linux shell. It's the high-level glue between you and the machine. Grow
in your knowledge of bash, and you'll automatically increase your productivity under
Linux and UNIX -- it's that simple.

Back to top

Bash confusion

Learning bash the wrong way can be a very confusing process. Many newbies type
"man bash" to view the bash man page, only to be confronted with a very terse and
technical description of shell functionality. Others type "info bash" (to view the GNU
info documentation), causing either the man page to be redisplayed, or (if they are
lucky) only slightly more friendly info documentation to appear.

While this may be somewhat disappointing to novices, the standard bash documentation
can't be all things to all people, and caters towards those already familiar with shell
programming in general. There's definitely a lot of excellent technical information in the
man page, but its helpfulness to beginners is limited.

That's where this series comes in. In it, I'll show you how to actually use bash
programming constructs, so that you will be able to write your own scripts. Instead of
technical descriptions, I'll provide you with explanations in plain English, so that you
will know not only what something does, but when you should actually use it. By the
end of this three-part series, you'll be able to write your own intricate bash scripts, and
be at the level where you can comfortably use bash and supplement your knowledge by
reading (and understanding!) the standard bash documentation. Let's begin.
Back to top

Environment variables

Under bash and almost all other shells, the user can define environment variables, which
are stored internally as ASCII strings. One of the handiest things about environment
variables is that they are a standard part of the UNIX process model. This means that
environment variables not only are exclusive to shell scripts, but can be used by
standard compiled programs as well. When we "export" an environment variable under
bash, any subsequent program that we run can read our setting, whether it is a shell
script or not. A good example is the vipw command, which normally allows root to edit
the system password file. By setting the EDITOR environment variable to the name of
your favorite text editor, you can configure vipw to use it instead of vi, a handy thing if
you are used to xemacs and really dislike vi.

The standard way to define an environment variable under bash is:

$ myvar='This is my environment variable!'

Quoting specifics
For extremely detailed information on how quotes should be used in bash, you may
want to look at the "QUOTING" section in the bash man page. The existence of special
character sequences that get "expanded" (replaced) with other values does complicate
how strings are handled in bash. We will just cover the most often-used quoting
functionality in this series.

The above command defined an environment variable called "myvar" and contains the
string "This is my environment variable!". There are several things to notice above:
first, there is no space on either side of the "=" sign; any space will result in an error (try
it and see). The second thing to notice is that while we could have done away with the
quotes if we were defining a single word, they are necessary when the value of the
environment variable is more than a single word (contains spaces or tabs).

Thirdly, while we can normally use double quotes instead of single quotes, doing so in
the above example would have caused an error. Why? Because using single quotes
disables a bash feature called expansion, where special characters and sequences of
characters are replaced with values. For example, the "!" character is the history
expansion character, which bash normally replaces with a previously-typed command.
(We won't be covering history expansion in this series of articles, because it is not
frequently used in bash programming. For more information on it, see the "HISTORY
EXPANSION" section in the bash man page.) While this macro-like functionality can
come in handy, right now we want a literal exclamation point at the end of our
environment variable, rather than a macro.

Now, let's take a look at how one actually uses environment variables. Here's an
example:

$ echo $myvar
This is my environment variable!

By preceding the name of our environment variable with a $, we can cause bash to
replace it with the value of myvar. In bash terminology, this is called "variable
expansion". But, what if we try the following:

$ echo foo$myvarbar
foo

We wanted this to echo "fooThis is my environment variable!bar", but it didn't work.


What went wrong? In a nutshell, bash's variable expansion facility in got confused. It
couldn't tell whether we wanted to expand the variable $m, $my, $myvar, $myvarbar,
etc. How can we be more explicit and clearly tell bash what variable we are referring
to? Try this:

$ echo foo${myvar}bar
fooThis is my environment variable!bar

As you can see, we can enclose the environment variable name in curly braces when it
is not clearly separated from the surrounding text. While $myvar is faster to type and
will work most of the time, ${myvar} can be parsed correctly in almost any situation.
Other than that, they both do the same thing, and you will see both forms of variable
expansion in the rest of this series. You'll want to remember to use the more explicit
curly-brace form when your environment variable is not isolated from the surrounding
text by whitespace (spaces or tabs).

Recall that we also mentioned that we can "export" variables. When we export an
environment variable, it's automatically available in the environment of any
subsequently-run script or executable. Shell scripts can "get to" the environment
variable using that shell's built-in environment-variable support, while C programs can
use the getenv() function call. Here's some example C code that you should type in and
compile -- it'll allow us to understand environment variables from the perspective of C:

myvar.c -- a sample environment variable C program

#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *myenvvar=getenv("EDITOR");
printf("The editor environment variable is set to %s\n",myenvvar);
}

Save the above source into a file called myenv.c, and then compile it by issuing the
command:

$ gcc myenv.c -o myenv


Now, there will be an executable program in your directory that, when run, will print
the value of the EDITOR environment variable, if any. This is what happens when I run
it on my machine:

$ ./myenv
The editor environment variable is set to (null)

Hmmm... because the EDITOR environment variable was not set to anything, the C
program gets a null string. Let's try setting it to a specific value:

$ EDITOR=xemacs
$ ./myenv
The editor environment variable is set to (null)

While you might have expected myenv to print the value "xemacs", it didn't quite work,
because we didn't export the EDITOR environment variable. This time, we'll get it
working:

$ export EDITOR
$ ./myenv
The editor environment variable is set to xemacs

So, you have seen with your very own eyes that another process (in this case our
example C program) cannot see the environment variable until it is exported.
Incidentally, if you want, you can define and export an environment variable using one
line, as follows:

$ export EDITOR=xemacs

It works identically to the two-line version. This would be a good time to show how to
erase an environment variable by using unset:

$ unset EDITOR
$ ./myenv
The editor environment variable is set to (null)

dirname and basename


Both dirname and basename do not look at any files or directories on disk; they are
purely string manipulation commands.

Back to top

Chopping strings overview


Chopping strings -- that is, splitting an original string into smaller, separate chunk(s) --
is one of those tasks that is performed daily by your average shell script. Many times,
shell scripts need to take a fully-qualified path, and find the terminating file or
directory. While it's possible (and fun!) to code this in bash, the standard basename
UNIX executable performs this extremely well:

$ basename /usr/local/share/doc/foo/foo.txt
foo.txt
$ basename /usr/home/drobbins
drobbins

Basename is quite a handy tool for chopping up strings. It's companion, called dirname,
returns the "other" part of the path that basename throws away:

$ dirname /usr/local/share/doc/foo/foo.txt
/usr/local/share/doc/foo
$ dirname /usr/home/drobbins/
/usr/home

Back to top

Command substitution

One very handy thing to know is how to create an environment variable that contains
the result of an executable command. This is very easy to do:

$ MYDIR=`dirname /usr/local/share/doc/foo/foo.txt`
$ echo $MYDIR
/usr/local/share/doc/foo

What we did above is called "command substitution". Several things are worth noticing
in this example. On the first line, we simply enclosed the command we wanted to
execute in back quotes. Those are not standard single quotes, but instead come from the
keyboard key that normally sits above the Tab key. We can do exactly the same thing
with bash's alternate command substitution syntax:

$ MYDIR=$(dirname /usr/local/share/doc/foo/foo.txt)
$ echo $MYDIR
/usr/local/share/doc/foo

As you can see, bash provides multiple ways to perform exactly the same thing. Using
command substitution, we can place any command or pipeline of commands in between
` ` or $( ) and assign it to an environment variable. Handy stuff! Here's an example of
how to use a pipeline with command substitution:

MYFILES=$(ls /etc | grep pa)


bash-2.03$ echo $MYFILES
pam.d passwd
Back to top

Chopping strings like a pro

While basename and dirname are great tools, there are times where we may need to
perform more advanced string "chopping" operations than just standard pathname
manipulations. When we need more punch, we can take advantage of bash's advanced
built-in variable expansion functionality. We've already used the standard kind of
variable expansion, which looks like this: ${MYVAR}. But bash can also perform some
handy string chopping on its own. Take a look at these examples:

$ MYVAR=foodforthought.jpg
$ echo ${MYVAR##*fo}
rthought.jpg
$ echo ${MYVAR#*fo}
odforthought.jpg

In the first example, we typed ${MYVAR##*fo}. What exactly does this mean?
Basically, inside the ${ }, we typed the name of the environment variable, two ##s, and
a wildcard ("*fo"). Then, bash took MYVAR, found the longest substring from the
beginning of the string "foodforthought.jpg" that matched the wildcard "*fo", and
chopped it off the beginning of the string. That's a bit hard to grasp at first, so to get a
feel for how this special "##" option works, let's step through how bash completed this
expansion. First, it began searching for substrings at the beginning of
"foodforthought.jpg" that matched the "*fo" wildcard. Here are the substrings that it
checked:

f
fo MATCHES *fo
foo
food
foodf
foodfo MATCHES *fo
foodfor
foodfort
foodforth
foodfortho
foodforthou
foodforthoug
foodforthought
foodforthought.j
foodforthought.jp
foodforthought.jpg

After searching the string for matches, you can see that bash found two. It selects the
longest match, removes it from the beginning of the original string, and returns the
result.

The second form of variable expansion shown above appears identical to the first,
except it uses only one "#" -- and bash performs an almost identical process. It checks
the same set of substrings as our first example did, except that bash removes the
shortest match from our original string, and returns the result. So, as soon as it checks
the "fo" substring, it removes "fo" from our string and returns "odforthought.jpg".

This may seem extremely cryptic, so I'll show you an easy way to remember this
functionality. When searching for the longest match, use ## (because ## is longer than
#). When searching for the shortest match, use #. See, not that hard to remember at all!
Wait, how do you remember that we are supposed to use the '#' character to remove
from the *beginning* of a string? Simple! You will notice that on a US keyboard, shift-
4 is "$", which is the bash variable expansion character. On the keyboard, immediately
to the left of "$" is "#". So, you can see that "#" is "at the beginning" of "$", and thus
(according to our mnemonic), "#" removes characters from the beginning of the string.
You may wonder how we remove characters from the end of the string. If you guessed
that we use the character immediately to the right of "$" on the US keyboard ("%"),
you're right! Here are some quick examples of how to chop off trailing portions of
strings:

$ MYFOO="chickensoup.tar.gz"
$ echo ${MYFOO%%.*}
chickensoup
$ echo ${MYFOO%.*}
chickensoup.tar

As you can see, the % and %% variable expansion options work identically to # and ##,
except they remove the matching wildcard from the end of the string. Note that you
don't have to use the "*" character if you wish to remove a specific substring from the
end:

MYFOOD="chickensoup"
$ echo ${MYFOOD%%soup}
chicken

In this example, it doesn't matter whether we use "%%" or "%", since only one match is
possible. And remember, if you forget whether to use "#" or "%", look at the 3, 4, and 5
keys on your keyboard and figure it out.

We can use another form of variable expansion to select a specific substring, based on a
specific character offset and length. Try typing in the following lines under bash:

$ EXCLAIM=cowabunga
$ echo ${EXCLAIM:0:3}
cow
$ echo ${EXCLAIM:3:7}
abunga

This form of string chopping can come in quite handy; simply specify the character to
start from and the length of the substring, all separated by colons.

Back to top
Applying string chopping

Now that we've learned all about chopping strings, let's write a simple little shell script.
Our script will accept a single file as an argument, and will print out whether it appears
to be a tarball. To determine if it is a tarball, it will look for the pattern ".tar" at the end
of the file. Here it is:

mytar.sh -- a sample script


#!/bin/bash

if [ "${1##*.}" = "tar" ]
then
echo This appears to be a tarball.
else
echo At first glance, this does not appear to be a tarball.
fi

To run this script, enter it into a file called mytar.sh, and type "chmod 755 mytar.sh" to
make it executable. Then, give it a try on a tarball, as follows:

$ ./mytar.sh thisfile.tar
This appears to be a tarball.
$ ./mytar.sh thatfile.gz
At first glance, this does not appear to be a tarball.

OK, it works, but it's not very functional. Before we make it more useful, let's take a
look at the "if" statement used above. In it, we have a boolean expression. In bash, the
"=" comparison operator checks for string equality. In bash, all boolean expressions are
enclosed in square brackets. But what does the boolean expression actually test for?
Let's take a look at the left side. According to what we've learned about string chopping,
"${1##*.}" will remove the longest match of "*." from the beginning of the string
contained in the environment variable "1", returning the result. This will cause
everything after the last "." in the file to be returned. Obviously, if the file ends in ".tar",
we will get "tar" as a result, and the condition will be true.

You may be wondering what the "1" environment variable is in the first place. Very
simple -- $1 is the first command-line argument to the script, $2 is the second, etc. OK,
now that we've reviewed the function, we can take our first look at "if" statements.

Back to top

If statements

Like most languages, bash has its own form of conditional. When using them, stick to
the format above; that is, keep the "if" and the "then" on separate lines, and keep the
"else" and the terminating and required "fi" in horizontal alignment with them. This
makes the code easier to read and debug. In addition to the "if,else" form, there are
several other forms of "if" statements:

if [ condition ]
then
action
fi

This one performs an action only if condition is true, otherwise it performs no action
and continues executing any lines following the "fi".

if [ condition ]
then
action
elif [ condition2 ]
then
action2
.
.
.
elif [ condition3 ]
then

else
actionx
fi

The above "elif" form will consecutively test each condition and execute the action
corresponding to the first true condition. If none of the conditions are true, it will
execute the "else" action, if one is present, and then continue executing lines following
the entire "if,elif,else" statement.

Back to top

Next time

Now that we've covered the most basic bash functionality, it's time to pick up the pace
and get ready to write some real scripts. In the next article, I'll cover looping constructs,
functions, namespace, and other essential topics. Then, we'll be ready to write some
more complicated scripts. In the third article, we'll focus almost exclusively on very
complex scripts and functions, as well as several bash script design options. See you
then!

Let's start with a brief tip on handling command-line arguments, and then look at bash's
basic programming constructs.

Accepting arguments

In the sample program in the introductory article, we used the environment variable
"$1", which referred to the first command-line argument. Similarly, you can use "$2",
"$3", etc. to refer to the second and third arguments passed to your script. Here's an
example:

#!/usr/bin/env bash

echo name of script is $0


echo first argument is $1
echo second argument is $2
echo seventeenth argument is $17
echo number of arguments is $#

The example is self explanatory except for two small details. First, "$0" will expand to
the name of the script, as called from the command line, and "$#" will expand to the
number of arguments passed to the script. Play around with the above script, passing
different kinds of command-line arguments to get the hang of how it works.

Sometimes, it's helpful to refer to all command-line arguments at once. For this
purpose, bash features the "$@" variable, which expands to all command-line
parameters separated by spaces. We'll see an example of its use when we take a look at
"for" loops, a bit later in this article.

Back to top

Bash programming constructs

If you've programmed in a procedural language like C, Pascal, Python, or Perl, then


you're familiar with standard programming constructs like "if" statements, "for" loops,
and the like. Bash has its own versions of most of these standard constructs. In the next
several sections, I will introduce several bash constructs and demonstrate the
differences between these constructs and others you are already familiar with from other
programming languages. If you haven't programmed much before, don't worry. I
include enough information and examples so that you can follow the text.

Back to top

Conditional love

If you've ever programmed any file-related code in C, you know that it requires a
significant amount of effort to see if a particular file is newer than another. That's
because C doesn't have any built-in syntax for performing such a comparison; instead,
two stat() calls and two stat structures must be used to perform the comparison by hand.
In contrast, bash has standard file comparison operators built in, so determining if
"/tmp/myfile is readable" is as easy as checking to see if "$myvar is greater than 4".

The following table lists the most frequently used bash comparison operators. You'll
also find an example of how to use every option correctly. The example is meant to be
placed immediately after the "if". For example:
if [ -z "$myvar" ]
then
echo "myvar is not defined"
fi

Sometimes, there are several different ways that a particular comparison can be made.
For example, the following two snippets of code function identically:

if [ "$myvar" -eq 3 ]
then
echo "myvar equals 3"
fi

if [ "$myvar" = "3" ]
then
echo "myvar equals 3"
fi

In the above two comparisons do exactly the same thing, but the first uses arithmetic
comparison operators, while the second uses string comparison operators.

Back to top

String comparison caveats

Most of the time, while you can omit the use of double quotes surrounding strings and
string variables, it's not a good idea. Why? Because your code will work perfectly,
unless an environment variable happens to have a space or a tab in it, in which case
bash will get confused. Here's an example of a fouled-up comparison:

if [ $myvar = "foo bar oni" ]


then
echo "yes"
fi

In the above example, if myvar equals "foo", the code will work as expected and not
print anything. However, if myvar equals "foo bar oni", the code will fail with the
following error:

[: too many arguments

In this case, the spaces in "$myvar" (which equals "foo bar oni") end up confusing bash.
After bash expands "$myvar", it ends up with the following comparison:

[ foo bar oni = "foo bar oni" ]


Because the environment variable wasn't placed inside double quotes, bash thinks that
you stuffed too many arguments in-between the square brackets. You can easily
eliminate this problem by surrounding the string arguments with double-quotes.
Remember, if you get into the habit of surrounding all string arguments and
environment variables with double-quotes, you'll eliminate many similar programming
errors. Here's how the "foo bar oni" comparison should have been written:

if [ "$myvar" = "foo bar oni" ]


then
echo "yes"
fi

More quoting specifics


If you want your environment variables to be expanded, you must enclose them in
double quotes, rather than single quotes. Single quotes disable variable (as well as
history) expansion.
The above code will work as expected and will not create any unpleasant surprises.

Back to top

Looping constructs: "for"

OK, we've covered conditionals, now it's time to explore bash looping constructs. We'll
start with the standard "for" loop. Here's a basic example:

#!/usr/bin/env bash

for x in one two three four


do
echo number $x
done

output:
number one
number two
number three
number four

What exactly happened? The "for x" part of our "for" loop defined a new environment
variable (also called a loop control variable) called "$x", which was successively set to
the values "one", "two", "three", and "four". After each assignment, the body of the loop
(the code between the "do" ... "done") was executed once. In the body, we referred to
the loop control variable "$x" using standard variable expansion syntax, like any other
environment variable. Also notice that "for" loops always accept some kind of word list
after the "in" statement. In this case we specified four English words, but the word list
can also refer to file(s) on disk or even file wildcards. Look at the following example,
which demonstrates how to use standard shell wildcards:

#!/usr/bin/env bash
for myfile in /etc/r*
do
if [ -d "$myfile" ]
then
echo "$myfile (dir)"
else
echo "$myfile"
fi
done

output:

/etc/rc.d (dir)
/etc/resolv.conf
/etc/resolv.conf~
/etc/rpc

The above code looped over each file in /etc that began with an "r". To do this, bash
first took our wildcard /etc/r* and expanded it, replacing it with the string /etc/rc.d
/etc/resolv.conf /etc/resolv.conf~ /etc/rpc before executing the loop. Once inside the
loop, the "-d" conditional operator was used to perform two different actions, depending
on whether myfile was a directory or not. If it was, a " (dir)" was appended to the output
line.

We can also use multiple wildcards and even environment variables in the word list:

for x in /etc/r??? /var/lo* /home/drobbins/mystuff/* /tmp/${MYPATH}/*


do
cp $x /mnt/mydir
done

Bash will perform wildcard and variable expansion in all the right places, and
potentially create a very long word list.

While all of our wildcard expansion examples have used absolute paths, you can also
use relative paths, as follows:

for x in ../* mystuff/*


do
echo $x is a silly file
done

In the above example, bash performs wildcard expansion relative to the current working
directory, just like when you use relative paths on the command line. Play around with
wildcard expansion a bit. You'll notice that if you use absolute paths in your wildcard,
bash will expand the wildcard to a list of absolute paths. Otherwise, bash will use
relative paths in the subsequent word list. If you simply refer to files in the current
working directory (for example, if you type "for x in *"), the resultant list of files will
not be prefixed with any path information. Remember that preceding path information
can be stripped using the "basename" executable, as follows:

for x in /var/log/*
do
echo `basename $x` is a file living in /var/log
done

Of course, it's often handy to perform loops that operate on a script's command-line
arguments. Here's an example of how to use the "$@" variable, introduced at the
beginning of this article:

#!/usr/bin/env bash

for thing in "$@"


do
echo you typed ${thing}.
done

output:

$ allargs hello there you silly


you typed hello.
you typed there.
you typed you.
you typed silly.

Back to top

Shell arithmetic

Before looking at a second type of looping construct, it's a good idea to become familiar
with performing shell arithmetic. Yes, it's true: You can perform simple integer math
using shell constructs. Simply enclose the particular arithmetic expression between a "$
((" and a "))", and bash will evaluate the expression. Here are some examples:

$ echo $(( 100 / 3 ))


33
$ myvar="56"
$ echo $(( $myvar + 12 ))
68
$ echo $(( $myvar - $myvar ))
0
$ myvar=$(( $myvar + 1 ))
$ echo $myvar
57

Now that you're familiar performing mathematical operations, it's time to introduce two
other bash looping constructs, "while" and "until".

Back to top

More looping constructs: "while" and "until"


A "while" statement will execute as long as a particular condition is true, and has the
following format:

while [ condition ]
do
statements
done

"While" statements are typically used to loop a certain number of times, as in the
following example, which will loop exactly 10 times:

myvar=0
while [ $myvar -ne 10 ]
do
echo $myvar
myvar=$(( $myvar + 1 ))
done

You can see the use of arithmetic expansion to eventually cause the condition to be
false, and the loop to terminate.

"Until" statements provide the inverse functionality of "while" statements: They repeat
as long as a particular condition is false. Here's an "until" loop that functions identically
to the previous "while" loop:

myvar=0
until [ $myvar -eq 10 ]
do
echo $myvar
myvar=$(( $myvar + 1 ))
done

Back to top

Case statements

Case statements are another conditional construct that comes in handy. Here's an
example snippet:

case "${x##*.}" in
gz)
gzunpack ${SROOT}/${x}
;;
bz2)
bz2unpack ${SROOT}/${x}
;;
*)
echo "Archive format not recognized."
exit
;;
esac
Above, bash first expands "${x##*.}". In the code, "$x" is the name of a file, and "$
{x##.*}" has the effect of stripping all text except that following the last period in the
filename. Then, bash compares the resultant string against the values listed to the left of
the ")"s. In this case, "${x##.*}" gets compared against "gz", then "bz2" and finally "*".
If "${x##.*}" matches any of these strings or patterns, the lines immediately following
the ")" are executed, up until the ";;", at which point bash continues executing lines after
the terminating "esac". If no patterns or strings are matched, no lines of code are
executed; however, in this particular code snippet, at least one block of code will
execute, because the "*" pattern will catch everything that didn't match "gz" or "bz2".

Back to top

Functions and namespaces

In bash, you can even define functions, similar to those in other procedural languages
like Pascal and C. In bash, functions can even accept arguments, using a system very
similar to the way scripts accept command-line arguments. Let's take a look at a sample
function definition and then proceed from there:

tarview() {
echo -n "Displaying contents of $1 "
if [ ${1##*.} = tar ]
then
echo "(uncompressed tar)"
tar tvf $1
elif [ ${1##*.} = gz ]
then
echo "(gzip-compressed tar)"
tar tzvf $1
elif [ ${1##*.} = bz2 ]
then
echo "(bzip2-compressed tar)"
cat $1 | bzip2 -d | tar tvf -
fi
}

Another case
The above code could have been written using a "case" statement. Can you figure out
how?
Above, we define a function called "tarview" that accepts one argument, a tarball of
some kind. When the function is executed, it identifies what type of tarball the argument
is (either uncompressed, gzip-compressed, or bzip2-compressed), prints out a one-line
informative message, and then displays the contents of the tarball. This is how the
above function should be called (whether from a script or from the command line, after
it has been typed in, pasted in, or sourced):
$ tarview shorten.tar.gz
Displaying contents of shorten.tar.gz (gzip-compressed tar)
drwxr-xr-x ajr/abbot 0 1999-02-27 16:17 shorten-2.3a/
-rw-r--r-- ajr/abbot 1143 1997-09-04 04:06 shorten-2.3a/Makefile
-rw-r--r-- ajr/abbot 1199 1996-02-04 12:24 shorten-2.3a/INSTALL
-rw-r--r-- ajr/abbot 839 1996-05-29 00:19 shorten-2.3a/LICENSE
....

As you can see, arguments can be referenced inside the function definition by using the
same mechanism used to reference command-line arguments. In addition, the "$#"
macro will be expanded to contain the number of arguments. The only thing that may
not work completely as expected is the variable "$0", which will either expand to the
string "bash" (if you run the function from the shell, interactively) or to the name of the
script the function is called from.

Use 'em interactively


Don't forget that functions, like the one above, can be placed in your ~/.bashrc or
~/.bash_profile so that they are available for use whenever you are in bash.

Back to top

Namespace

Often, you'll need to create environment variables inside a function. While possible,
there's a technicality you should know about. In most compiled languages (such as C),
when you create a variable inside a function, it's placed in a separate local namespace.
So, if you define a function in C called myfunction, and in it define a variable called
"x", any global (outside the function) variable called "x" will not be affected by it,
eliminating side effects.

While true in C, this isn't true in bash. In bash, whenever you create an environment
variable inside a function, it's added to the global namespace. This means that it will
overwrite any global variable outside the function, and will continue to exist even after
the function exits:

#!/usr/bin/env bash

myvar="hello"

myfunc() {

myvar="one two three"


for x in $myvar
do
echo $x
done
}

myfunc

echo $myvar $x

When this script is run, it produces the output "one two three three", showing how
"$myvar" defined in the function clobbered the global variable "$myvar", and how the
loop control variable "$x" continued to exist even after the function exited (and also
would have clobbered any global "$x", if one were defined).

In this simple example, the bug is easy to spot and to compensate for by using alternate
variable names. However, this isn't the right approach; the best way to solve this
problem is to prevent the possibility of clobbering global variables in the first place, by
using the "local" command. When we use "local" to create variables inside a function,
they will be kept in the local namespace and not clobber any global variables. Here's
how to implement the above code so that no global variables are overwritten:

#!/usr/bin/env bash

myvar="hello"

myfunc() {
local x
local myvar="one two three"
for x in $myvar
do
echo $x
done
}

myfunc

echo $myvar $x

This function will produce the output "hello" -- the global "$myvar" doesn't get
overwritten, and "$x" doesn't continue to exist outside of myfunc. In the first line of the
function, we create x, a local variable that is used later, while in the second example
(local myvar="one two three"") we create a local myvar and assign it a value. The first
form is handy for keeping loop control variables local, since we're not allowed to say
"for local x in $myvar". This function doesn't clobber any global variables, and you are
encouraged to design all your functions this way. The only time you should not use
"local" is when you explicitly want to modify a global variable.

Back to top

Wrapping it up

Now that we've covered the most essential bash functionality, it's time to look at how to
develop an entire application based in bash. In my next installment, we'll do just that.
See you then!

Enter the ebuild system

I've really been looking forward to this third and final Bash by example article, because
now that we've already covered bash programming fundamentals in Part 1 and Part 2,
we can focus on more advanced topics, like bash application development and program
design. For this article, I will give you a good dose of practical, real-world bash
development experience by presenting a project that I've spent many hours coding and
refining: The Gentoo Linux ebuild system.

I'm the chief architect of Gentoo Linux, a next-generation Linux OS currently in beta.
One of my primary responsibilities is to make sure that all of the binary packages
(similar to RPM packages) are created properly and work together. As you probably
know, a standard Linux system is not composed of a single unified source tree (like
BSD), but is actually made up of about 25+ core packages that work together. Some of
the packages include:

Package Description
linux The actual kernel
util-linux A collection of miscellaneous Linux-related programs
e2fsprogs A collection of ext2 filesystem-related utilities
glibc The GNU C library

Each package is in its own tarball and is maintained by separate independent


developers, or teams of developers. To create a distribution, each package has to be
separately downloaded, compiled, and packaged. Every time a package must be fixed,
upgraded, or improved, the compilation and packaging steps must be repeated (and this
gets old really fast). To help eliminate the repetitive steps involved in creating and
updating packages, I created the ebuild system, written almost entirely in bash. To
enhance your bash knowledge, I'll show you how I implemented the unpack and
compile portions of the ebuild system, step by step. As I explain each step, I'll also
discuss why certain design decisions were made. By the end of this article, not only will
you have an excellent grasp of larger-scale bash programming projects, but you'll also
have implemented a good portion of a complete auto-build system.

Back to top

Why bash?

Bash is an essential component of the Gentoo Linux ebuild system. It was chosen as
ebuild's primary language for a number of reasons. First, it has an uncomplicated and
familiar syntax that is especially well suited for calling external programs. An auto-
build system is "glue code" that automates the calling of external programs, and bash is
very well suited to this type of application. Second, Bash's support for functions allowed
the ebuild system to have modular, easy-to-understand code. Third, the ebuild system
takes advantage of bash's support for environment variables, allowing package
maintainers and developers to configure it easily, on-the-fly.

Back to top

Build process review


Before we look at the ebuild system, let's review what's involved in getting a package
compiled and installed. For our example, we will look at the "sed" package, a standard
GNU text stream editing utility that is part of all Linux distributions. First, download
the source tarball (sed-3.02.tar.gz) (see Resources). We will store this archive in
/usr/src/distfiles, a directory we will refer to using the environment variable
"$DISTDIR". "$DISTDIR" is the directory where all of our original source tarballs live;
it's a big vault of source code.

Our next step is to create a temporary directory called "work", which houses the
uncompressed sources. We'll refer to this directory later using the "$WORKDIR"
environment variable. To do this, change to a directory where we have write permission
and type the following:

Uncompressing sed into a temporary directory


$ mkdir work
$ cd work
$ tar xzf /usr/src/distfiles/sed-3.02.tar.gz

The tarball is then decompressed, creating a directory called sed-3.02 that contains all of
the sources. We'll refer to the sed-3.02 directory later using the environment variable
"$SRCDIR". To compile the program, type the following:

Uncompressing sed into a temporary directory


$ cd sed-3.02
$ ./configure --prefix=/usr
(autoconf generates appropriate makefiles, this can take a while)

$ make

(the package is compiled from sources, also takes a bit of time)

We're going to skip the "make install" step, since we are just covering the unpack and
compile steps in this article. If we wanted to write a bash script to perform all these
steps for us, it could look something like this:

Sample bash script to perform the unpack/compile process


#!/usr/bin/env bash

if [ -d work ]
then
# remove old work directory if it exists
rm -rf work
fi
mkdir work
cd work
tar xzf /usr/src/distfiles/sed-3.02.tar.gz
cd sed-3.02
./configure --prefix=/usr
make

Back to top

Generalizing the code

Although this autocompile script works, it's not very flexible. Basically, the bash script
just contains the listing of all the commands that were typed at the command line. While
this solution works, it would be nice to make a generic script that can be configured
quickly to unpack and compile any package just by changing a few lines. That way, it's
much less work for the package maintainer to add new packages to the distribution.
Let's take a first stab at doing this by using lots of different environment variables,
making our build script more generic:

A new, more general script


#!/usr/bin/env bash

# P is the package name

P=sed-3.02

# A is the archive name

A=${P}.tar.gz

export ORIGDIR=`pwd`
export WORKDIR=${ORIGDIR}/work
export SRCDIR=${WORKDIR}/${P}

if [ -z "$DISTDIR" ]
then
# set DISTDIR to /usr/src/distfiles if not already set
DISTDIR=/usr/src/distfiles
fi
export DISTDIR

if [ -d ${WORKDIR} ]
then
# remove old work directory if it exists
rm -rf ${WORKDIR}
fi

mkdir ${WORKDIR}
cd ${WORKDIR}
tar xzf ${DISTDIR}/${A}
cd ${SRCDIR}
./configure --prefix=/usr
make

We've added a lot of environment variables to the code, but it still does basically the
same thing. However, now, to compile any standard GNU autoconf-based source
tarball, we can simply copy this file to a new file (with an appropriate name to reflect
the name of the new package it compiles), and then change the values of "$A" and "$P"
to new values. All other environment variables automatically adjust to the correct
settings, and the script works as expected. While this is handy, there's a further
improvement that can be made to the code. This particular code is much longer than the
original "transcript" script that we created. Since one of the goals for any programming
project should be the reduction of complexity for the user, it would be nice to
dramatically shrink the code, or at least organize it better. We can do this by performing
a neat trick -- we'll split the code into two separate files. Save this file as "sed-
3.02.ebuild":

sed-3.02.ebuild
#the sed ebuild file -- very simple!
P=sed-3.02
A=${P}.tar.gz

Our first file is trivial, and contains only those environment variables that must be
configured on a per-package basis. Here's the second file, which contains the brains of
the operation. Save this one as "ebuild" and make it executable:

The ebuild script


#!/usr/bin/env bash

if [ $# -ne 1 ]
then
echo "one argument expected."
exit 1
fi

if [ -e "$1" ]
then
source $1
else
echo "ebuild file $1 not found."
exit 1
fi

export ORIGDIR=`pwd`
export WORKDIR=${ORIGDIR}/work
export SRCDIR=${WORKDIR}/${P}

if [ -z "$DISTDIR" ]
then
# set DISTDIR to /usr/src/distfiles if not already set
DISTDIR=/usr/src/distfiles
fi
export DISTDIR

if [ -d ${WORKDIR} ]
then
# remove old work directory if it exists
rm -rf ${WORKDIR}
fi

mkdir ${WORKDIR}
cd ${WORKDIR}
tar xzf ${DISTDIR}/${A}
cd ${SRCDIR}
./configure --prefix=/usr
make

Now that we've split our build system into two files, I bet you're wondering how it
works. Basically, to compile sed, type:

$ ./ebuild sed-3.02.ebuild

When "ebuild" executes, it first tries to "source" variable "$1". What does this mean?
From my previous article, recall that "$1" is the first command line argument -- in this
case, "sed-3.02.ebuild". In bash, the "source" command reads in bash statements from a
file, and executes them as if they appeared immediately in the file the "source"
command is in. So, "source ${1}" causes the "ebuild" script to execute the commands in
"sed-3.02.ebuild", which cause "$P" and "$A" to be defined. This design change is
really handy, because if we want to compile another program instead of sed, we can
simply create a new .ebuild file and pass it as an argument to our "ebuild" script. That
way, the .ebuild files end up being really simple, while the complicated brains of the
ebuild system get stored in one place -- our "ebuild" script. This way, we can upgrade or
enhance the ebuild system simply by editing the "ebuild" script, keeping the
implementation details outside of the ebuild files. Here's a sample ebuild file for gzip:

gzip-1.2.4a.ebuild
#another really simple ebuild script!
P=gzip-1.2.4a
A=${P}.tar.gz

Back to top

Adding functionality

OK, we're making some progress. But, there is some additional functionality I'd like to
add. I'd like the ebuild script to accept a second command-line argument, which will be
"compile", "unpack", or "all". This second command-line argument tells the ebuild
script which particular step of the build process to perform. That way, I can tell ebuild
to unpack the archive, but not compile it (just in case I need to inspect the source
archive before compilation begins). To do this, I'll add a case statement that will test
variable "$2", and do different things based on its value. Here's what the code looks like
now:

ebuild, revision 2
#!/usr/bin/env bash

if [ $# -ne 2 ]
then
echo "Please specify two args - .ebuild file and unpack,
compile or all"
exit 1
fi

if [ -z "$DISTDIR" ]
then
# set DISTDIR to /usr/src/distfiles if not already set
DISTDIR=/usr/src/distfiles
fi
export DISTDIR

ebuild_unpack() {
#make sure we're in the right directory
cd ${ORIGDIR}

if [ -d ${WORKDIR} ]
then
rm -rf ${WORKDIR}
fi

mkdir ${WORKDIR}
cd ${WORKDIR}
if [ ! -e ${DISTDIR}/${A} ]
then
echo "${DISTDIR}/${A} does not exist. Please download
first."
exit 1
fi
tar xzf ${DISTDIR}/${A}
echo "Unpacked ${DISTDIR}/${A}."
#source is now correctly unpacked
}

ebuild_compile() {

#make sure we're in the right directory


cd ${SRCDIR}
if [ ! -d "${SRCDIR}" ]
then
echo "${SRCDIR} does not exist -- please unpack
first."
exit 1
fi
./configure --prefix=/usr
make
}

export ORIGDIR=`pwd`
export WORKDIR=${ORIGDIR}/work

if [ -e "$1" ]
then
source $1
else
echo "Ebuild file $1 not found."
exit 1
fi

export SRCDIR=${WORKDIR}/${P}

case "${2}" in
unpack)
ebuild_unpack
;;
compile)
ebuild_compile
;;
all)
ebuild_unpack
ebuild_compile
;;
*)
echo "Please specify unpack, compile or all as the
second arg"
exit 1
;;
esac

We've made a lot of changes, so let's review them. First, we placed the compile and
unpack steps in their own functions, and called ebuild_compile() and ebuild_unpack(),
respectively. This is a good move, since the code is getting more complicated, and the
new functions provide some modularity, which helps to keep things organized. On the
first line in each function, I explicitly "cd" into the directory I want to be in because, as
our code is becoming more modular rather than linear, it's more likely that we might
slip up and execute a function in the wrong current working directory. The "cd"
commands explicitly put us in the right place, and prevent us from making a mistake
later -- an important step -- especially if you will be deleting files inside the functions.

Also, I added a useful check to the beginning of the ebuild_compile() function. Now, it
checks to make sure the "$SRCDIR" exists, and, if not, it prints an error message telling
the user to unpack the archive first, and then exits. If you like, you can change this
behavior so that if "$SRCDIR" doesn't exist, our ebuild script will unpack the source
archive automatically. You can do this by replacing ebuild_compile() with the
following code:

A new spin on ebuild_compile()


ebuild_compile() {
#make sure we're in the right directory
if [ ! -d "${SRCDIR}" ]
then
ebuild_unpack
fi
cd ${SRCDIR}
./configure --prefix=/usr
make
}
One of the most obvious changes in our second version of the ebuild script is the new
case statement at the end of the code. This case statement simply checks the second
command-line argument, and performs the correct action, depending on its value. If we
now type:

$ ebuild sed-3.02.ebuild

we'll actually get an error message. ebuild now wants to be told what to do, as follows:

$ ebuild sed-3.02.ebuild unpack

or

$ ebuild sed-3.02.ebuild compile

or

$ ebuild sed-3.02.ebuild all

If you provide a second command-line argument, other than those listed above, you get
an error message (the * clause), and the program exits.

Back to top

Modularizing the code

Now that the code is quite advanced and functional, you may be tempted to create
several more ebuild scripts to unpack and compile your favorite programs. If you do,
sooner or later you'll come across some sources that do not use autoconf ("./configure")
or possibly others that have non-standard compilation processes. We need to make some
more changes to the ebuild system to accommodate these programs. But before we do,
it is a good idea to think a bit about how to accomplish this.

One of the great things about hard-coding "./configure --prefix=/usr; make" into our
compile stage is that, most of the time, it works. But, we must also have the ebuild
system accommodate sources that do not use autoconf or normal Makefiles. To solve
this problem, I propose that our ebuild script should, by default, do the following:

1. If there is a configure script in "${SRCDIR}", execute it as follows:


./configure --prefix=/usr
Otherwise, skip this step.
2. Execute the following command: make

Since ebuild only runs configure if it actually exists, we can now automatically
accommodate those programs that don't use autoconf and have standard makefiles. But
what if a simple "make" doesn't do the trick for some sources? We need a way to
override our reasonable defaults with some specific code to handle these situations. To
do this, we'll transform our ebuild_compile() function into two functions. The first
function, which can be looked at as a "parent" function, will still be called
ebuild_compile(). However, we'll have a new function, called user_compile(), which
contains only our reasonable default actions:

ebuild_compile() split into two functions


user_compile() {
#we're already in ${SRCDIR}
if [ -e configure ]
then
#run configure script if it exists
./configure --prefix=/usr
fi
#run make
make
}

ebuild_compile() {
if [ ! -d "${SRCDIR}" ]
then
echo "${SRCDIR} does not exist -- please unpack
first."
exit 1
fi
#make sure we're in the right directory
cd ${SRCDIR}
user_compile
}

It may not seem obvious why I'm doing this right now, but bear with me. While the
code works almost identically to our previous version of ebuild, we can now do
something that we couldn't do before -- we can override user_compile() in sed-
3.02.ebuild. So, if the default user_compile() function doesn't meet our needs, we can
define a new one in our .ebuild file that contains the commands required to compile the
package. For example, here's an ebuild file for e2fsprogs-1.18, which requires a slightly
different "./configure" line:

e2fsprogs-1.18.ebuild
#this ebuild file overrides the default user_compile()
P=e2fsprogs-1.18
A=${P}.tar.gz

user_compile() {
./configure --enable-elf-shlibs
make
}

Now, e2fsprogs will be compiled exactly the way we want it to be. But, for most
packages, we can omit any custom user_compile() function in the .ebuild file, and the
default user_compile() function is used instead.
How exactly does the ebuild script know which user_compile() function to use? This is
actually quite simple. In the ebuild script, the default user_compile() function is defined
before the e2fsprogs-1.18.ebuild file is sourced. If there is a user_compile() in
e2fsprogs-1.18.ebuild, it overwrites the default version defined previously. If not, the
default user_compile() function is used.

This is great stuff; we've added a lot of flexibility without requiring any complex code if
it's not needed. We won't cover it here, but you could also make similar modifications to
ebuild_unpack() so that users can override the default unpacking process. This could
come in handy if any patching has to be done, or if the files are contained in multiple
archives. It is also a good idea to modify our unpacking code so that it recognizes bzip2-
compressed tarballs by default.

Back to top

Configuration files

We've covered a lot of sneaky bash techniques so far, and now it's time to cover one
more. Often, it's handy for a program to have a global configuration file that resides
in /etc. Fortunately, this is easy to do using bash. Simply create the following file and
save it as /etc/ebuild.conf:

/ect/ebuild.conf
# /etc/ebuild.conf: set system-wide ebuild options in this file

# MAKEOPTS are options passed to make


MAKEOPTS="-j2"

In this example, I've included just one configuration option, but you could include many
more. One of the beautiful things about bash is that this file can be parsed by simply
sourcing it. This is a design trick that works with most interpreted languages. After
/etc/ebuild.conf is sourced, "$MAKEOPTS" is defined inside our ebuild script. We'll
use it to allow the user to pass options to make. Normally, this option would be used to
allow the user to tell ebuild to do a parallel make.

What is a parallel make?


To speed compilation on multiprocessor systems, make supports compiling a program
in parallel. This means that instead of compiling just one source file at a time, make
compiles a user-specified number of source files simultaneously (so those extra
processors in a multiprocessor system are used). Parallel makes are enabled by passing
the -j # option to make, as follows:

make -j4 MAKE="make -j4"


This code instructs make to compile four programs simultaneously. The MAKE="make
-j4" argument tells make to pass the -j4 option to any child make processes it launches.

Here's the final version of our ebuild program:

ebuild, the final version


#!/usr/bin/env bash

if [ $# -ne 2 ]
then
echo "Please specify ebuild file and unpack, compile or all"
exit 1
fi

source /etc/ebuild.conf

if [ -z "$DISTDIR" ]
then
# set DISTDIR to /usr/src/distfiles if not already set
DISTDIR=/usr/src/distfiles
fi
export DISTDIR

ebuild_unpack() {
#make sure we're in the right directory
cd ${ORIGDIR}

if [ -d ${WORKDIR} ]
then
rm -rf ${WORKDIR}
fi

mkdir ${WORKDIR}
cd ${WORKDIR}
if [ ! -e ${DISTDIR}/${A} ]
then
echo "${DISTDIR}/${A} does not exist. Please download
first."
exit 1
fi
tar xzf ${DISTDIR}/${A}
echo "Unpacked ${DISTDIR}/${A}."
#source is now correctly unpacked
}

user_compile() {
#we're already in ${SRCDIR}
if [ -e configure ]
then
#run configure script if it exists
./configure --prefix=/usr
fi
#run make
make $MAKEOPTS MAKE="make $MAKEOPTS"
}

ebuild_compile() {
if [ ! -d "${SRCDIR}" ]
then
echo "${SRCDIR} does not exist -- please unpack
first."
exit 1
fi
#make sure we're in the right directory
cd ${SRCDIR}
user_compile
}

export ORIGDIR=`pwd`
export WORKDIR=${ORIGDIR}/work

if [ -e "$1" ]
then
source $1
else
echo "Ebuild file $1 not found."
exit 1
fi

export SRCDIR=${WORKDIR}/${P}

case "${2}" in
unpack)
ebuild_unpack
;;
compile)
ebuild_compile
;;
all)
ebuild_unpack
ebuild_compile
;;
*)
echo "Please specify unpack, compile or all as the
second arg"
exit 1
;;
esac

Notice /etc/ebuild.conf is sourced near the beginning of the file. Also, notice that we use
"$MAKEOPTS" in our default user_compile() function. You may be wondering how
this will work -- after all, we refer to "$MAKEOPTS" before we source /etc/ebuild.conf,
which actually defines "$MAKEOPTS" in the first place. Fortunately for us, this is OK
because variable expansion only happens when user_compile() is executed. By the time
user_compile() is executed, /etc/ebuild.conf has already been sourced, and
"$MAKEOPTS" is set to the correct value.

Back to top

Wrapping it up

We've covered a lot of bash programming techniques in this article, but we've only
touched the surface of the power of bash. For example, the production Gentoo Linux
ebuild system not only automatically unpacks and compiles each package, but it can
also:

• Automatically download the sources if they are not found in "$DISTDIR"


• Verify that the sources are not corrupted by using MD5 message digests
• If requested, install the compiled application into the live filesystem, recording
all installed files so that the package can be easily uninstalled at a later date
• If requested, package the compiled application in a tarball (compressed the way
you like it) so that it can be installed later, on another computer, or during the
CD-based installation process (if you are building a distribution CD)

In addition, the production ebuild system has several other global configuration options,
allowing the user to specify options such as what optimization flags to use during
compilation, and whether optional support for packages like GNOME and slang should
be enabled by default in those packages that support it.

It's clear that bash can accomplish much more than what I've touched on in this series of
articles. I hope you've learned a lot about this incredible tool, and are excited about
using bash to speed up and enhance your development projects.

Resources

• Download the source tarball (sed-3.02.tar.gz) from ftp://ftp.gnu.org/pub/gnu/sed.

• Read "Bash by example: Part 1" on developerWorks.

• Read "Bash by example: Part 2" on developerWorks.

• Visit the home page of the Gentoo Project.

• Visit GNU's bash home page.

About the author

Residing in Albuquerque, New Mexico, Daniel Robbins is the Chief Architect of the
Gentoo Project, CEO of Gentoo Technologies, Inc., the mentor for the Linux Advanced
Multimedia Project (LAMP), and a contributing author for the Macmillan books
Caldera OpenLinux Unleashed, SuSE Linux Unleashed, and Samba Unleashed. Daniel
has been involved with computers in some fashion since the second grade, when he was
first exposed to the Logo programming language as well as a potentially dangerous dose
of Pac Man. This probably explains why he has since served as a Lead Graphic Artist at
SONY Electronic Publishing/Psygnosis. Daniel enjoys spending time with his wife,
Mary, who is expecting a child this spring. You can contact Daniel at
drobbins@gentoo.org.

You might also like