Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 75

Searching man Pages by Keyword

Unfortunately, you won't always remember the exact name of the man page that you want to view.
In these cases you can search for man pages that match a keyword by using the -k option to
the man command.
For example, what if you knew you wanted a man page that displays how to change your password, but you didn't
remember the exact name? You could run the command man -k password:

sysadmin@localhost:~$ man -k passwd

chgpasswd (8) - update group passwords in batch mode

chpasswd (8) - update passwords in batch mode

fgetpwent_r (3) - get passwd file entry reentrantly

getpwent_r (3) - get passwd file entry reentrantly

gpasswd (1) - administer /etc/group and /etc/gshadow

pam_localuser (8) - require users to be listed in /etc/passwd

passwd (1) - change user password

passwd (1ssl) - compute password hashes

passwd (5) - the password file

passwd2des (3) - RFS password encryption

update-passwd (8) - safely update /etc/passwd, /etc/shadow and /etc/group

sysadmin@localhost:~$

When you use this option, you may end up with a large amount of output. The preceding command,
for example, provided over 60 results.
Recall that there are thousands of man pages, so when you search for a keyword, be as specific as
possible. Using a generic word, such as "the" could result in hundreds or even thousands of results.

Note that on most Linux distributions, the apropos command does the same thing as man -k. On those
distributions, both will produce the same output.

info Command
Man pages are great sources of information, but they do tend to have a few disadvantages. One
example of a disadvantage is that each man page is a separate document, not related to any other
man page. While some man pages have a SEE ALSO section that may refer to other man pages, they
really tend to be unrelated sources of documentation.
The info command also provides documentation on operating system commands and features. The
goal of this command is slightly different from man pages: to provide a documentation resource that
provides a logical organizational structure, making reading documentation easier.

Within info documents, information is broken down into categories that work much like a table of
contents that you would find in a book. Hyperlinks are provided to pages with information on
individual topics for a specific command or feature. In fact, all of the documentation is merged into
a single "book" in which you can go to the top level of documentation and view the table of
contents representing all of the documentation available.

Another advantage of info over man pages is that the writing style of info documents is typically
more conducive to learning a topic. Consider man pages to be more of a reference resource and info
documents to be more of a learning guide.

Displaying Info Documentation for a Command


To display the info documentation for a command, execute info command(replace command with the
name of the command that you are seeking information about). For example, the following
demonstrates the output of the command info ls:

File: coreutils.info, Node: ls invocation, Next: dir invocation, Up: Directo\ry listing

10.1 `ls': List directory contents

==================================

The `ls' program lists information about files (of any type, including directories). Options and file arguments can
be intermixed arbitrarily, as usual.

For non-option command-line arguments that are directories, by

default `ls' lists the contents of directories, not recursively, and

omitting files with names beginning with `.'. For other non-option

arguments, by default `ls' lists just the file name. If no non-option

argument is specified, `ls' operates on the current directory, acting

as if it had been invoked with a single argument of `.'.

By default, the output is sorted alphabetically, according to the

locale settings in effect.(1) If standard output is a terminal, the

output is in columns (sorted vertically) and control characters are


output as question marks; otherwise, the output is listed one per line

and control characters are output as-is.

--zz-Info: (coreutils.info.gz)ls invocation, 58 lines --Top-------------

Welcome to Info version 5.2. Type h for help, m for menu item.

Notice that the first line provides some information that tells you where you are in the info
documentation. This documentation is broken up into nodes and in the example above you are
currently in the ls invocation node. If you went to the next node (like going to the next chapter in a
book), you would be in the dir invocation node. If you went up one level you would be in the Directory
listing node.

Moving Around While Viewing an info Document


Like the man command, you can get a listing of movement commands by typing the letter h while
reading the info documentation:

Basic Info command keys

l Close this help window.

q Quit Info altogether.

H Invoke the Info tutorial.

Up Move up one line.

Down Move down one line.

DEL Scroll backward one screenful.

SPC Scroll forward one screenful.

Home Go to the beginning of this node.

End Go to the end of this node.

TAB Skip to the next hypertext link.

RET Follow the hypertext link under the cursor.

l Go back to the last node seen in this window.

[ Go to the previous node in the document.

] Go to the next node in the document.

p Go to the previous node on this level.

n Go to the next node on this level.

u Go up one level.

-----Info: *Info Help*, 466 lines --Top---------------------------------


Note that if you want to close the help screen, you type the letter l. This brings you back to your
document and allows you to continue reading. To quit entirely, you type the letter q.

The following table provides a summary of useful commands:

Command Function
Down arrow Go down one line
Space Go down one page
s Search for term
[ Go to previous node
] Go to next node
u Go up one level
TAB Skip to next hyperlink
HOME Go to beginning
END Go to end
h Display help
L Quit help page
q Quit info command
If you scroll through the document, you will eventually see the menu for the lscommand:

* Menu:

* Which files are listed::

* What information is listed::

* Sorting the output::

* Details about version sort::

* General output formatting::

* Formatting file timestamps::

* Formatting the file names::

---------- Footnotes ----------

(1) If you use a non-POSIX locale (e.g., by setting `LC_ALL' to

`en_US'), then `ls' may produce output that is sorted differently than

you're accustomed to. In that case, set the `LC_ALL' environment

variable to `C'.

--zz-Info: (coreutils.info.gz)ls invocation, 58 lines --Top-------------

The items under the menu are hyperlinks that can take you to nodes that describe more about
the ls command. For example, if you placed your cursor on the line "* Sorting the output::" and
pressed the Enter key, you would be taken to a node that describes sorting the output of
the ls command:

File: coreutils.info, Node: Sorting the output, Next: Details about version s\ort, Prev: What information is listed, Up:
ls invocation

10.1.3 Sorting the output

-------------------------

These options change the order in which `ls' sorts the information it

outputs. By default, sorting is done by character code (e.g., ASCII

order).

`-c'

`--time=ctime'

`--time=status'

If the long listing format (e.g., `-l', `-o') is being used, print

the status change time (the `ctime' in the inode) instead of the

modification time. When explicitly sorting by time (`--sort=time'

or `-t') or when not using a long listing format, sort according

to the status change time.

`-f'

Primarily, like `-U'--do not sort; list the files in whatever

order they are stored in the directory. But also enable `-a' (lis

--zz-Info: (coreutils.info.gz)Sorting the output, 68 lines --Top--------

Note that by going into the node about sorting, you essentially went into a sub-node of the one in
which you originally started. To go back to your previous node, you can use the u key. While u will
take you to the start of the node one level up, you could also use the l key to return you exactly to
the previous location that you were before entering the sorting node.

Using the --help Option


Many commands will provide you basic information, very similar to the SYNOPSISfound in man
pages, when you apply the --help option to the command. This is useful to learn the basic usage of a
command:
sysadmin@localhost:~$ ps --help

********* simple selection ********* ********* selection by list *********

-A all processes -C by command name

-N negate selection -G by real group ID (supports names)

-a all w/ tty except session leaders -U by real user ID (supports names)

-d all except session leaders -g by session OR by effective group name

-e all processes -p by process ID

T all processes on this terminal -s processes in the sessions given

a all w/ tty, including other users -t by tty

g OBSOLETE -- DO NOT USE -u by effective user ID (supports names)

r only running processes U processes for specified users

x processes w/o controlling ttys t by tty

*********** output format ********** *********** long options ***********

-o,o user-defined -f full --Group --User --pid --cols --ppid

-j,j job control s signal --group --user --sid --rows --info

-O,O preloaded -o v virtual memory --cumulative --format --deselect

-l,l long u user-oriented --sort --tty --forest --version

-F extra full X registers --heading --no-headi

********* misc options *********

-V,V show version L list format codes f ASCII art forest

-m,m,-L,-T,H threads S children in sum -y change -l format

-M,Z security data c true command name -c scheduling class

-w,w wide output n numeric WCHAN,UID -H process hierarchy

sysadmin@localhost:~$

Find Any File or Directory


The whereis command is designed to specifically find commands and man pages. While this is useful,
there are times where you want to find a file or directory, not just files that are commands or man
pages.

To find any file or directory, you can use the locate command. This command will search a database
of all files and directories that were on the system when the database was created. Typically, the
command to generate this database is run nightly.
sysadmin@localhost:~$ locate gshadow

/etc/gshadow

/etc/gshadow-

/usr/include/gshadow.h

/usr/share/man/cs/man5/gshadow.5.gz

/usr/share/man/da/man5/gshadow.5.gz

/usr/share/man/de/man5/gshadow.5.gz

/usr/share/man/fr/man5/gshadow.5.gz

/usr/share/man/it/man5/gshadow.5.gz

/usr/share/man/man5/gshadow.5.gz

/usr/share/man/ru/man5/gshadow.5.gz

/usr/share/man/sv/man5/gshadow.5.gz

/usr/share/man/zh_CN/man5/gshadow.5.gz

sysadmin@localhost:~$

Any files that you created today will not normally be searchable with the locatecommand. If you
have access to the system as the root user (the system administrator account), you can manually
update the locate database by running the updatedb command. Regular users cannot update the database
file.

Also note that when you use the locate command as a regular user, your output may be limited due to
file permissions. Essentially, if you don't have access to a file or directory on the filesystem due to
permissions, the locate command won't return those names. This is a security feature designed to
keep users from "exploring" the filesystem by using the locate database. The root user can search for
any file in the locate database.

Count the Number of Files


The output of the locate command can be quite large. When you search for a filename, such
as passwd, the locate command will produce every file that contains the string passwd, not just files
named passwd.

In many cases, you may want to start by listing how many files will match. You can do this by
using the -c option to the locate command.

Some system features also have more detailed help documents located in the /usr/share/doc directory structure.
Execute the following command to view the contents of this document.
Listing Hidden Files
When you use the ls command to display the contents of a directory, not all files are shown
automatically. The ls command doesn't display hidden files by default. A hidden file is any file (or
directory) that begins with a dot . character.

To display all files, including hidden files, use the -a option to the ls command:
sysadmin@localhost:~$ ls -a
. .bashrc .selected_editor Downloads Public
.. .cache Desktop Music Templates
.bash_logout .profile Documents Pictures Videos

Why are files hidden in the first place? Most of the hidden files are customization files, designed to
customize how Linux, your shell or programs work. For example, the .bashrc file in your home
directory customizes features of the shell, such as creating or modifying variables and aliases.
These customization files are not ones that you work with on a regular basis. There are also many of
them, as you can see, and having them displayed will make it more difficult to find the files that you
do regularly work with. So, the fact that they are hidden is to your benefit.

Long Display Listing


There is information about each file, called metadata that is sometimes helpful to display. This may
include who owns a file, the size of a file and the last time the contents of a file were modified. You
can display this information by using the -l option to the ls command:
sysadmin@localhost:~$ ls -l
total 0
drwxr-xr-x 1 sysadmin sysadmin 0 Jan 29 2015 Desktop
drwxr-xr-x 1 sysadmin sysadmin 0 Jan 29 2015 Documents
drwxr-xr-x 1 sysadmin sysadmin 0 Jan 29 2015 Downloads
drwxr-xr-x 1 sysadmin sysadmin 0 Jan 29 2015 Music
drwxr-xr-x 1 sysadmin sysadmin 0 Jan 29 2015 Pictures
drwxr-xr-x 1 sysadmin sysadmin 0 Jan 29 2015 Public
drwxr-xr-x 1 sysadmin sysadmin 0 Jan 29 2015 Templates
drwxr-xr-x 1 sysadmin sysadmin 0 Jan 29 2015 Videos
sysadmin@localhost:~$

In the output above, each line describes metadata about a single file. The following describes each
of the fields of data that you will see in the output of the ls -l command:
Human Readable Sizes
When you display file sizes with the -l option to the ls command, you end up with file sizes in
bytes. For text files, a byte is 1 character.
For smaller files, byte sizes are fine. However, for larger files it is hard to comprehend how large
the file is. For example, consider the output of the following command:
sysadmin@localhost:~$ ls -l /usr/bin/omshell
-rwxr-xr-c 1 root root 1561400 Oct 9 2012 /usr/bin/omshell
sysadmin@localhost:~$

As you can see, the file size is hard to determine in bytes. Is 1561400 a large file or small? It seems
fairly large, but it is hard to determine using bytes.
Think of it this way: if someone were to give you the distance between Boston and New York using
inches, that value would essentially be meaningless because for a distance like that, you think in
terms of miles.
It would be better if the file size was presented in a more human readable size, like megabytes or
gigabytes. To accomplish this, add the -h option to the ls command:
sysadmin@localhost:~$ ls -lh /usr/bin/omshell
-rwxr-xr-c 1 root root 1.5M Oct 9 2012 /usr/bin/omshell
sysadmin@localhost:~$

Important: The -h option must be used with the -l option.

Listing Directories
When the command ls -d is used, it refers to the current directory, and not the contents within it.
Without any other options, it is rather meaningless, although it is important to note that the current
directory is always referred to with a single period (.):
sysadmin@localhost:~$ ls -d
.

To use the ls -d command in a meaningful way requires the addition of the -l option. In this
case, note that the first command lists the details of the contents in the /home/sysadmin
directory, while the second command lists the /home/sysadmin directory itself.
sysadmin@localhost:~$ ls -l
total 0
drwxr-xr-x 1 sysadmin sysadmin 0 Apr 15 2015 Desktop
drwxr-xr-x 1 sysadmin sysadmin 0 Apr 15 2015 Documents
drwxr-xr-x 1 sysadmin sysadmin 0 Apr 15 2015 Downloads
drwxr-xr-x 1 sysadmin sysadmin 0 Apr 15 2015 Music
drwxr-xr-x 1 sysadmin sysadmin 0 Apr 15 2015 Pictures
drwxr-xr-x 1 sysadmin sysadmin 0 Apr 15 2015 Public
drwxr-xr-x 1 sysadmin sysadmin 0 Apr 15 2015 Templates
drwxr-xr-x 1 sysadmin sysadmin 0 Apr 15 2015 Videos
drwxr-xr-x 1 sysadmin sysadmin 420 Apr 15 2015 test
sysadmin@localhost:~$ ls -ld
drwxr-xr-x 1 sysadmin sysadmin 224 Nov 7 17:07 .
sysadmin@localhost:~$

Note the single period at the end of the second long listing. This indicates that the current directory
is being listed, and not the contents.

Recursive Listing
There will be times when you want to display all of the files in a directory as well as all of the files
in all subdirectories under a directory. This is called a recursive listing.

To perform a recursive listing, use the -R option to the ls command:


Note: The output shown below will vary from the results you will see if you execute the command
within the virtual machine environment of this course.
sysadmin@localhost:~$ ls -R /etc/ppp
/etc/ppp:
chap-secrets ip-down.ipv6to4 ip-up.ipv6to4 ipv6-up pap-secrets
ip-down ip-up ipv6-down options peers

/etc/ppp/peers:
sysadmin@localhost:~$

Note that in the previous example, the files in the /etc/ppp directory were listed first. After that,
the files in the /etc/ppp/peers directory were listed (there were no files in this case, but if
any file had been in this directory, they would have been displayed).

Be careful with this option; for example, running the command ls -R / would list every file on
the file system, including all files on any attached USB device and DVD in the system. Limit the
use of the -R option to smaller directory structures.

Sort a Listing
By default, the ls command sorts files alphabetically by file name. Sometimes, It may be useful to
sort files using different criteria.

To sort files by size, we can use the -S option. Note the difference in the output of the following
two commands:
sysadmin@localhost:~$ ls /etc/ssh moduli
ssh_host_dsa_key.pub ssh_host_rsa_key sshd_confi
ssh_config ssh_host_ecdsa_key ssh_host_rsa_key.pub
ssh_host_dsa_key ssh_host_ecdsa_key.pub ssh_import_id
sysadmin@localhost:~$ ls -S /etc/ssh moduli
ssh_host_dsa_key ssh_host_ecdsa_key
sshd_config ssh_host_dsa_key.pub ssh_host_ecdsa_key.pub
ssh_host_rsa_key ssh_host_rsa_key.pub
ssh_config ssh_import_id
sysadmin@localhost:~$
The same files and directories are listed, but in a different order. While the -S option works by
itself, you can't really tell that the output is sorted by size, so it is most useful when used with the -
l option. The following command will list files from largest to smallest and display the actual size
of the file.
sysadmin@localhost:~$ ls -lS /etc/ssh
total 160
-rw-r--r-- 1 root root 125749 Apr 29 2014 moduli
-rw-r--r-- 1 root root 2489 Jan 29 2015 sshd_config
-rw------- 1 root root 1675 Jan 29 2015 ssh_host_rsa_key
-rw-r--r-- 1 root root 1669 Apr 29 2014 ssh_config
-rw------- 1 root root 668 Jan 29 2015 ssh_host_dsa_key
-rw-r--r-- 1 root root 607 Jan 29 2015 ssh_host_dsa_key.pub
-rw-r--r-- 1 root root 399 Jan 29 2015 ssh_host_rsa_key.pub
-rw-r--r-- 1 root root 302 Jan 10 2011 ssh_import_id
-rw------- 1 root root 227 Jan 29 2015 ssh_host_ecdsa_key
-rw-r--r-- 1 root root 179 Jan 29 2015 ssh_host_ecdsa_key.pub
sysadmin@localhost:~$

It may also be useful to use the -h option to display human-readable file sizes:
sysadmin@localhost:~$ ls -lSh /etc/ssh
total 160K
-rw-r--r-- 1 root root 123K Apr 29 2014 moduli
-rw-r--r-- 1 root root 2.5K Jan 29 2015 sshd_config
-rw------- 1 root root 1.7K Jan 29 2015 ssh_host_rsa_key
-rw-r--r-- 1 root root 1.7K Apr 29 2014 ssh_config
-rw------- 1 root root 668 Jan 29 2015 ssh_host_dsa_key
-rw-r--r-- 1 root root 607 Jan 29 2015 ssh_host_dsa_key.pub
-rw-r--r-- 1 root root 399 Jan 29 2015 ssh_host_rsa_key.pub
-rw-r--r-- 1 root root 302 Jan 10 2011 ssh_import_id
-rw------- 1 root root 227 Jan 29 2015 ssh_host_ecdsa_key
-rw-r--r-- 1 root root 179 Jan 29 2015 ssh_host_ecdsa_key.pub
sysadmin@localhost:~$

It is also possible to sort files based on the time they were modified. You can do this by using the -
t option.

The -t option will list the most recently modified files first. This option can be used alone, but
again, is usually more helpful when paired with the -l option:
sysadmin@localhost:~$ ls -tl /etc/ssh
total 160
-rw------- 1 root root 668 Jan 29 2015 ssh_host_dsa_key
-rw-r--r-- 1 root root 607 Jan 29 2015 ssh_host_dsa_key.pub
-rw------- 1 root root 227 Jan 29 2015 ssh_host_ecdsa_key
-rw-r--r-- 1 root root 179 Jan 29 2015 ssh_host_ecdsa_key.pub
-rw------- 1 root root 1675 Jan 29 2015 ssh_host_rsa_key
-rw-r--r-- 1 root root 399 Jan 29 2015 ssh_host_rsa_key.pub
-rw-r--r-- 1 root root 2489 Jan 29 2015 sshd_config
-rw-r--r-- 1 root root 125749 Apr 29 2014 moduli
-rw-r--r-- 1 root root 1669 Apr 29 2014 ssh_config
-rw-r--r-- 1 root root 302 Jan 10 2011 ssh_import_id
sysadmin@localhost:~$

It is important to remember that the modified date on directories represents the last time a file was
added to or removed from the directory.
If the files in a directory were modified many days or months ago, it may be harder to tell exactly
when they were modified, as only the date is provided for older files. For more detailed
modification time information you can use the --full-time option to display the complete
timestamp (including hours, seconds, minutes...):
sysadmin@localhost:~$ ls -t --full-time /etc/ssh
total 160
-rw------- 1 root root 668 2015-01-29 03:17:33.000000000 +0000
ssh_host_dsa_key
-rw-r--r-- 1 root root 607 2015-01-29 03:17:33.000000000 +0000
ssh_host_dsa_key.pub
-rw------- 1 root root 227 2015-01-29 03:17:33.000000000 +0000
ssh_host_ecdsa_key
-rw-r--r-- 1 root root 179 2015-01-29 03:17:33.000000000 +0000
ssh_host_ecdsa_key.pub
-rw------- 1 root root 1675 2015-01-29 03:17:33.000000000 +0000
ssh_host_rsa_key
-rw-r--r-- 1 root root 399 2015-01-29 03:17:33.000000000 +0000
ssh_host_rsa_key.pub
-rw-r--r-- 1 root root 2489 2015-01-29 03:17:33.000000000 +0000 sshd_config
-rw-r--r-- 1 root root 125749 2014-04-29 23:58:51.000000000 +0000 moduli
-rw-r--r-- 1 root root 1669 2014-04-29 23:58:51.000000000 +0000 ssh_config
-rw-r--r-- 1 root root 302 2011-01-10 18:48:29.000000000 +0000 ssh_import_id
sysadmin@localhost:~$

The --full-time option will assume the -l option automatically.

It is possible to perform a reverse sort with either the -S or -t options by using the -r option. The
following command will sort files by size, smallest to largest:
sysadmin@localhost:~$ ls -lrS /etc/ssh
total 160
-rw-r--r-- 1 root root 179 Jan 29 2015 ssh_host_ecdsa_key.pub
-rw------- 1 root root 227 Jan 29 2015 ssh_host_ecdsa_key
-rw-r--r-- 1 root root 302 Jan 10 2011 ssh_import_id
-rw-r--r-- 1 root root 399 Jan 29 2015 ssh_host_rsa_key.pub
-rw-r--r-- 1 root root 607 Jan 29 2015 ssh_host_dsa_key.pub
-rw------- 1 root root 668 Jan 29 2015 ssh_host_dsa_key
-rw-r--r-- 1 root root 1669 Apr 29 2014 ssh_config
-rw------- 1 root root 1675 Jan 29 2015 ssh_host_rsa_key
-rw-r--r-- 1 root root 2489 Jan 29 2015 sshd_config
-rw-r--r-- 1 root root 125749 Apr 29 2014 moduli
sysadmin@localhost:~$

The following command will list files by modification date, oldest to newest:
sysadmin@localhost:~$ ls -lrt /etc/ssh
total 160
-rw-r--r-- 1 root root 302 Jan 10 2011 ssh_import_id
-rw-r--r-- 1 root root 1669 Apr 29 2014 ssh_config
-rw-r--r-- 1 root root 125749 Apr 29 2014 moduli
-rw-r--r-- 1 root root 2489 Jan 29 2015 sshd_config
-rw-r--r-- 1 root root 399 Jan 29 2015 ssh_host_rsa_key.pub
-rw------- 1 root root 1675 Jan 29 2015 ssh_host_rsa_key
-rw-r--r-- 1 root root 179 Jan 29 2015 ssh_host_ecdsa_key.pub
-rw------- 1 root root 227 Jan 29 2015 ssh_host_ecdsa_key
-rw-r--r-- 1 root root 607 Jan 29 2015 ssh_host_dsa_key.pub
-rw------- 1 root root 668 Jan 29 2015 ssh_host_dsa_key
sysadmin@localhost:~$
Listing With Globs
In a previous chapter, we discussed the use of file globs to match filenames using wildcard
characters. For example, we demonstrated that you can list all of the files in the /etc directory that
begin with the letter e with the following command:
sysadmin@localhost:~$ echo /etc/e*
/etc/encript.cfg /etc/environment /etc/ethers /etc/event.d /etc/exports
sysadmin@localhost:~$

Now that you know that the ls command is normally used to list files in a directory, using the
echo command may seem to have been a strange choice. However, there is something about the
ls command that might have caused confusion while we were discussing globs. This "feature"
might also cause problems when you try to list files using glob patterns.

Keep in mind that it is the shell, not the echo or ls command, that expands the glob pattern into
corresponding file names. In other words, when you typed the echo /etc/e* command, what
the shell did before executing the echo command was replace e* with all of the files and
directories within the /etc directory that match the pattern.

So, if you were to run the ls /etc/e* command, what the shell would really run would be this:
ls /etc/encript.cfg /etc/environment /etc/ethers /etc/event.d /etc/exports

When the ls command sees multiple arguments, it performs a list operation on each item
separately. In other words, the command ls /etc/encript.cfg /etc/environment is
essentially the same as ls /etc/encript.cfg; ls /etc/environment.

Now consider what happens when you run the ls command on a file, such as encript.cfg:
sysadmin@localhost:~$ ls /etc/enscript.cfg
/etc/enscript.cfg
sysadmin@localhost:~$

As you can see, running the ls command on a single file results in the name of the file being
printed. Typically this is useful if you want to see details about a specific file by using the -l
option to the ls command:
sysadmin@localhost:~$ ls -l /etc/enscript.cfg
-r--r--r--. 1 root root 4843 Nov 11 2010 /etc/enscript.cfg
sysadmin@localhost:~$

However, what if the ls command is given a directory name as an argument? In this case, the
output of the command is different than if the argument was a file name:
sysadmin@localhost:~$ ls /etc/event.d
ck-log-system-restart ck-log-system-start ck-log-system-stop
sysadmin@localhost:~$

If you give a directory name as an argument to the ls command, the command will display the
contents of the directory (the names of the files in the directory), not just provide the directory
name. The filenames you see in the example above are the names of the files in the
/etc/event.d directory.

Why is this a problem when using globs? Consider the following output:
sysadmin@localhost:~$ ls /etc/e*
/etc/encript.cfg /etc/environment /etc/ethers /etc/event.d /etc/exports
/etc/event.d:
ck-log-system-restart ck-log-system-start ck-log-system-stop
sysadmin@localhost:~$

As you can see, when the ls command sees a filename as an argument, it just displays the
filename. However, for any directory, it will display the contents of the directory, not just the
directory name.
This becomes even more confusing in a situation like the following:
sysadmin@localhost:~$ ls /etc/ev*
ck-log-system-restart ck-log-system-start ck-log-system-stop
sysadmin@localhost:~$

In the previous example, it seems like the ls command is just plain wrong. But what really
happened is that the only thing that matches the glob /etc/ev* is the /etc/event.d directory.
So, the ls command only displayed the files in that directory!

There is a simple solution to this problem: when you use glob arguments with the ls command,
always use the -d option. When you use the -d option, then the ls command won't display the
contents of a directory, but rather the name of the directory:
sysadmin@localhost:~$ ls -d /etc/e*
/etc/encript.cfg /etc/environment /etc/ethers /etc/event.d /etc/exports
sysadmin@localhost:~$

Avoid Overwriting Data


The cp command can be destructive to existing data if the destination file already exists. In the case
where the destination file exists , the cp command will overwrite the existing file's contents with
the contents of the source file. To illustrate this potential problem, first a new file is created in the
sysadmin home directory by copying an existing file:
sysadmin@localhost:~$ cp /etc/skel/.bash_logout ~/example.txt
sysadmin@localhost:~$

View the output of the ls command to see the file and view the contents of the file using the more
command:
sysadmin@localhost:~$ cp /etc/skel/.bash_logout ~/example.txt
sysadmin@localhost:~$ ls -l example.txt
-rw-rw-r--. 1 sysadmin sysadmin 18 Sep 21 15:56 example.txt
sysadmin@localhost:~$ more example.txt
# ~/.bash_logout: executed by bash(1) when login shell exits.

sysadmin@localhost:~$ cp -i /etc/hosts ~/example.txt


cp: overwrite `/home/sysadmin/example.txt'? n
sysadmin@localhost:~$ ls -l example.txt
-rw-rw-r--. 1 sysadmin sysadmin 18 Sep 21 15:56 example.txt
sysadmin@localhost:~$ more example.txt
# ~/.bash_logout: executed by bash(1) when login shell exits.

sysadmin@localhost:~$

In the next example, you will see that the cp command destroys the original contents of the
example.txt file. Notice that after the cp command is complete, the size of the file is different (158
bytes rather than 18) from the original and the contents are different as well:
sysadmin@localhost:~$ cp /etc/hosts ~/example.txt
sysadmin@localhost:~$ ls -l example.txt
-rw-rw-r--. 1 sysadmin sysadmin 158 Sep 21 14:11 example.txt
sysadmin@localhost:~$ cat example.txt
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
sysadmin@localhost:~$

There are two options that can be used to safeguard against accidental overwrites. With the -i
(interactive) option, the cp will prompt before overwriting a file. The following example will
demonstrate this option, first restoring the content of the original file:
sysadmin@localhost:~$ cp /etc/skel/.bash_logout ~/example.txt
sysadmin@localhost:~$ ls -l example.txt
-rw-r--r-- 1 sysadmin sysadmin 18 Sep 21 15:56 example.txt
sysadmin@localhost:~$ more example.txt
# ~/.bash_logout: executed by bash(1) when login shell exits.

sysadmin@localhost:~$ cp -i /etc/hosts ~/example.txt


cp: overwrite `/home/sysadmin/example.txt'? n
sysadmin@localhost:~$ ls -l example.txt
-rw-r--r-- 1 sysadmin sysadmin 18 Sep 21 15:56 example.txt
sysadmin@localhost:~$ more example.txt
# ~/.bash_logout: executed by bash(1) when login shell exits.

sysadmin@localhost:~$

Notice that since the value of n (no) was given when prompted to overwrite the file, no changes
were made to the file. If a value of y (yes) was given, then the copy process would have taken
place.

The -i option requires you to answer y or n for every copy that could end up overwriting an
existing file's contents. This can be tedious when a bunch of overwrites could occur, such as the
example demonstrated below:
sysadmin@localhost:~$ cp -i /etc/skel/.* ~
cp: omitting directory `/etc/skel/.'
cp: omitting directory `/etc/skel/..'
cp: overwrite `/home/sysadmin/.bash_logout'? n
cp: overwrite `/home/sysadmin/.bashrc'? n
cp: overwrite `/home/sysadmin/.profile'? n
cp: overwrite `/home/sysadmin/.selected_editor'? n
sysadmin@localhost:~$

As you can see from the example above, the cp command tried to overwrite four existing files,
forcing the user to answer three prompts. If this situation happened for 100 files, it could become
very annoying, very quickly.
If you want to automatically answer n to each prompt, use the -n option. It essentially stands for
"no rewrite.

Additional cp Options
cp command used either with mv

Option Meaning
-i Interactive move: ask if a file is to be overwritten.
-n Do not overwrite a destination files' contents
-v Verbose: show the resulting move
-r Recursive
-p preserve file attributes
Important: There is no -r option as the mv command will by default move directories.

Creating Files
There are several ways of creating a new file, including using a program designed to edit a file (a
text editor). In a later chapter, text editors will be covered.
There is also a way to simply create a file that can be populated with data at a later time. This is a
useful feature since for some operating system features, the very existence of a file could alter how
a command or service works. It is also useful to create a file as a "placeholder" to remind you to
create the file contents at a later time.

To create an empty file, use the touch command as demonstrated below:


sysadmin@localhost:~$ ls
Desktop Documents Downloads Music Pictures Public Templates
Videos
sysadmin@localhost:~$ touch sample
sysadmin@localhost:~$ ls -l sample
-rw-rw-r-- 1 sysadmin sysadmin 0 Nov 9 16:48 sample
sysadmin@localhost:~$

Notice the size of the new file is 0 bytes. As previously mentioned, the touch command doesn't
place any data within the new file.

Compressing files
Compressing files makes them smaller by removing duplication from a file and storing it such that
the file can be restored. A file with human readable text might have frequently used words replaced
by something smaller, or an image with a solid background might represent patches of that color by
a code. You generally dont use the compressed version of the file, instead you decompress it before
use. The compression algorithm is a procedure the computer does to encode the original file, and as
a result make it smaller. Computer scientists research these algorithms and come up with better ones
that can work faster or make the input file smaller.

When talking about compression, there are two types:

Lossless: No information is removed from the file. Compressing a file and decompressing it
leaves something identical to the original.

Lossy: Information might be removed from the file as it is compressed so that


uncompressing a file will result in a file that is slightly different than the original. For
instance, an image with two subtly different shades of green might be made smaller by
treating those two shades as the same. Often, the eye cant pick out the difference anyway.

Generally human eyes and ears dont notice slight imperfections in pictures and audio, especially as
they are displayed on a monitor or played over speakers. Lossy compression often benefits media
because it results in smaller file sizes and people cant tell the difference between the original and
the version with the changed data. For things that must remain intact, such as documents, logs, and
software, you need lossless compression.

Most image formats, such as GIF, PNG, and JPEG, implement some kind of lossy compression.
You can generally decide how much quality you want to preserve. A lower quality results in a
smaller file, but after decompression you may notice artifacts such as rough edges or discolorations.
High quality will look much like the original image, but the file size will be closer to the original.

Compressing an already compressed file will not make it smaller. This is often forgotten when it
comes to images, since they are already stored in a compressed format. With lossless compression,
this multiple compression is not a problem, but if you compress and decompress a file several times
using a lossy algorithm you will eventually have something that is unrecognizable.

Linux provides several tools to compress files, the most common is gzip. Here we show a log file
before and after compression.

bob:tmp $ ls -l access_log*

-rw-r--r-- 1 sean sean 372063 Oct 11 21:24 access_log

bob:tmp $ gzip access_log

bob:tmp $ ls -l access_log*

-rw-r--r-- 1 sean sean 26080 Oct 11 21:24 access_log.gz

In the example above, there is a file called access_log that is 372,063 bytes. The file is compressed
by invoking the gzipcommand with the name of the file as the only argument. After that command
completes, the original file is gone and a compressed version with a file extension of .gz is left in its
place. The file size is now 26,080 bytes, giving a compression ratio of about 14:1, which is common
with log files.

Gzip will give you this information if you ask, by using the l parameter, as shown here:

bob:tmp $ gzip -l access_log.gz


compressed uncompressed ratio uncompressed_name

26080 372063 93.0% access_log

Here, you can see that the compression ratio is given as 93%, which is the inverse of the 14:1 ratio,
i.e. 13/14. Additionally, when the file is decompressed it will be called access_log.

bob:tmp $ gunzip access_log.gz

bob:tmp $ ls -l access_log*

-rw-r--r-- 1 sean sean 372063 Oct 11 21:24 access_log

The opposite of the gzip command is gunzip. Alternatively, gzip d does the same thing (gunzip is just a
script that calls gzip with the right parameters). After gunzip does its work you can see that
the access_log file is back to its original size.

Gzip can also act as a filter which means it doesnt read or write anything to disk but instead
receives data through an input channel and writes it out to an output channel. Youll learn more
about how this works in the next chapter, so the next example just gives you an idea of what you
can do by being able to compress a stream.

bob:tmp $ mysqldump -A | gzip > database_backup.gz

bob:tmp $ gzip -l database_backup.gz

compressed uncompressed ratio uncompressed_name

76866 1028003 92.5% database_backup

The mysqldump A command outputs the contents of the local MySQL databases to the console.
The | character (pipe) says redirect the output of the previous command into the input of the next
one. The program to receive the output is gzip, which recognizes that no filenames were given so it
should operate in pipe mode. Finally, the > database_backup.gz means redirect the output of the
previous command into a file called database_backup.gz. Inspecting this file with gzip lshows that the
compressed version is 7.5% of the size of the original, with the added benefit that the larger file
never had to be written to disk.

There is another pair of commands that operate virtually identically to gzip and gunzip. These
are bzip2 and bunzip2. The bzip utilities use a different compression algorithm (called Burrows-Wheeler
block sorting, versus Lempel-Ziv coding used by gzip) that can compress files smaller than gzip at the
expense of more CPU time. You can recognize these files because they have a .bz or bz2 extension
instead of .gz.

Archiving Files
If you had several files to send to someone, you could compress each one individually. You would
have a smaller amount of data in total than if you sent uncompressed files, but you would still have
to deal with many files at one time.
Archiving is the solution to this problem. The traditional UNIX utility to archive files is called tar,
which is a short form of TApe aRchive. Tar was used to stream many files to a tape for backups or
file transfer. Tar takes in several files and creates a single output file that can be split up again into
the original files on the other end of the transmission.

Tar has 3 modes you will want to be familiar with:

Create: make a new archive out of a series of files

Extract: pull one or more files out of an archive

List: show the contents of the archive without extracting

Remembering the modes is key to figuring out the command line options necessary to do what you
want. In addition to the mode, you will also want to make sure you remember where to specify the
name of the archive, as you may be entering multiple file names on a command line.

Here, we show a tar file, also called a tarball, being created from multiple access logs.

bob:tmp $ tar -cf access_logs.tar access_log*

bob:tmp $ ls -l access_logs.tar

-rw-rw-r-- 1 sean sean 542720 Oct 12 21:42 access_logs.tar

Creating an archive requires two named options. The first, c, specifies the mode. The second, f, tells
tar to expect a file name as the next argument. The first argument in the example above creates an
archive called access_logs.tar. The remaining arguments are all taken to be input file names, either as
a wildcard, a list of files, or both. In this example, we use the wildcard option to include all files that
begin with access_log.

The example above does a long directory listing of the created file. The final size is 542,720 bytes
which is slightly larger than the input files. Tarballs can be compressed for easier transport, either
by gzipping the archive or by having tar do it with the zflag as follows:

bob:tmp $ tar -czf access_logs.tar.gz access_log*

bob:tmp $ ls -l access_logs.tar.gz

-rw-rw-r-- 1 sean sean 46229 Oct 12 21:50 access_logs.tar.gz

bob:tmp $ gzip -l access_logs.tar.gz

compressed uncompressed ratio uncompressed_name

46229 542720 91.5% access_logs.tar

The example above shows the same command as the prior example, but with the addition of
the z parameter. The output is much smaller than the tarball itself, and the resulting file is
compatible with gzip. You can see from the last command that the uncompressed file is the same
size as it would be if you tarred it in a separate step.
While UNIX doesnt treat file extensions specially, the convention is to use .tar for tar files,
and .tar.gz or .tgz for compressed tar files. You can use bzip2 instead of gzip by substituting the
letter z for j and using .tar.bz2, .tbz, or .tbz2 for a file extension (e.g. tar cjf file.tbz access_log*).

Given a tar file, compressed or not, you can see whats in it by using the t command:

bob:tmp $ tar -tjf access_logs.tbz

logs/

logs/access_log.3

logs/access_log.1

logs/access_log.4

logs/access_log

logs/access_log.2

This example uses 3 options:

t: list files in the archive

j: decompress with bzip2 before reading

f: operate on the given filename access_logs.tbz

The contents of the compressed archive are then displayed. You can see that a directory was
prefixed to the files. Tar will recurse into subdirectories automatically when compressing and will
store the path info inside the archive.

Just to show that this file is still nothing special, we will list the contents of the file in two steps
using a pipeline.

bob:tmp $ bunzip2 -c access_logs.tbz | tar -t

logs/

logs/access_log.3

logs/access_log.1

logs/access_log.4

logs/access_log

logs/access_log.2

The left side of the pipeline is bunzip c access_logs.tbz, which decompresses the file, but the -c option
sends the output to the screen. The output is redirected to tar t. If you dont specify a file with
f then tar will read from the standard input, which in this case is the uncompressed file.

Finally you can extract the archive with the x flag:


bob:tmp $ tar -xjf access_logs.tbz

bob:tmp $ ls -l

total 36

-rw-rw-r-- 1 sean sean 30043 Oct 14 13:27 access_logs.tbz

drwxrwxr-x 2 sean sean 4096 Oct 14 13:26 logs

bob:tmp $ ls -l logs

total 536

-rw-r--r-- 1 sean sean 372063 Oct 11 21:24 access_log

-rw-r--r-- 1 sean sean 362 Oct 12 21:41 access_log.1

-rw-r--r-- 1 sean sean 153813 Oct 12 21:41 access_log.2

-rw-r--r-- 1 sean sean 1136 Oct 12 21:41 access_log.3

-rw-r--r-- 1 sean sean 784 Oct 12 21:41 access_log.4

The example above uses the similar pattern as before, specifying the operation (eXtract), the
compression (the j flag, meaning bzip2), and a file name (-f access_logs.tbz). The original file is
untouched and the new logs directory is created. Inside the directory are the files.

Add the v flag and you will get verbose output of the files processed. This is helpful so you can
see whats happening:

bob:tmp $ tar -xjvf access_logs.tbz

logs/

logs/access_log.3

logs/access_log.1

logs/access_log.4

logs/access_log

logs/access_log.2

It is important to keep the f flag at the end, as tar assumes whatever follows it is a filename. In the
next example, the fand v flags were transposed, leading to tar interpreting the command as an
operation on a file called "v" (the relevant message is in italics.)

bob:tmp $ tar -xjfv access_logs.tbz

tar (child): v: Cannot open: No such file or directory

tar (child): Error is not recoverable: exiting now

tar: Child returned status 2


tar: Error is not recoverable: exiting now

If you only want some files out of the archive you can add their names to the end of the command,
but by default they must match the name in the archive exactly or use a pattern:

bob:tmp $ tar -xjvf access_logs.tbz logs/access_log

logs/access_log

The example above shows the same archive as before, but extracting only the logs/access_log file.
The output of the command (as verbose mode was requested with the v flag) shows only the one file
has been extracted.

Tar has many more features, such as the ability to use patterns when extracting files, excluding
certain files, or outputting the extracted files to the screen instead of disk. The documentation
for tar has in depth information.

Very important: When you use gzip, the original file is replaced by the zipped file. In the example
above, the file words was replaced with words.gz.
When you unzip the file, the zipped file will be replaced with the original file

STDIN
Standard input, or STDIN, is information entered normally by the user via the keyboard. When a
command prompts the shell for data, the shell provides the user with the ability to type commands
that, in turn, are sent to the command as STDIN.

STDOUT
Standard output, or STDOUT, is the normal output of commands. When a command functions
correctly (without errors) the output it produces is called STDOUT. By default, STDOUT is
displayed in the terminal window (screen) where the command is executing.

STDERR
Standard error, or STDERR, are error messages generated by commands. By default, STDERR is
displayed in the terminal window (screen) where the command is executing.

I/O redirection allows the user to redirect STDIN so data comes from a file and STDOUT/STDERR
so output goes to a file. Redirection is achieved by using the arrow characters: < and > .
Redirecting STDOUT
STDOUT can be directed to files. To begin, observe the output of the following command which
will display to the screen:

sysadmin@localhost:~$ echo "Line 1"

Line 1

sysadmin@localhost:~$

Using the > character the output can be redirected to a file:

sysadmin@localhost:~$ echo "Line 1" > example.txt

sysadmin@localhost:~$ ls

Desktop Downloads Pictures Templates example.txt test

Documents Music Public Videos sample.txt

sysadmin@localhost:~$ cat example.txt

Line 1

sysadmin@localhost:~$

This command displays no output, because STDOUT was sent to the file example.txt instead of the
screen. You can see the new file with the output of the ls command. The newly-created file contains
the output of the echo command when the file is viewed with the cat command.

It is important to realize that the single arrow will overwrite any contents of an existing file:

sysadmin@localhost:~$ cat example.txt

Line 1

sysadmin@localhost:~$ echo "New line 1" > example.txt

sysadmin@localhost:~$ cat example.txt

New line 1

sysadmin@localhost:~$

The original contents of the file are gone, replaced with the output of the new echo command.

It is also possible to preserve the contents of an existing file by appending to it. Use "double
arrow" >> to append to a file instead of overwriting it:

sysadmin@localhost:~$ cat example.txt

New line 1

sysadmin@localhost:~$ echo "Another line" >> example.txt


sysadmin@localhost:~$ cat example.txt

New line 1

Another line

sysadmin@localhost:~$

Instead of being overwritten, the output of the most recent echo command is added to the bottom of
the file.

Redirecting STDERR
STDERR can be redirected in a similar fashion to STDOUT. STDOUT is also known
as stream (or channel) #1. STDERR is assigned stream #2.

When using arrows to redirect, stream #1 is assumed unless another stream is specified. Thus,
stream #2 must be specified when redirecting STDERR.

To demonstrate redirecting STDERR, first observe the following command which will produce an
error because the specified directory does not exist:

sysadmin@localhost:~$ ls /fake

ls: cannot access /fake: No such file or directory

sysadmin@localhost:~$

Note that there is nothing in the example above that implies that the output is STDERR. The output
is clearly an error message, but how could you tell that it is being sent to STDERR? One easy way
to determine this is to redirect STDOUT:

sysadmin@localhost:~$ ls /fake > output.txt

ls: cannot access /fake: No such file or directory

sysadmin@localhost:~$

In the example above, STDOUT was redirected to the output.txt file. So, the output that is displayed
can't be STDOUT because it would have been placed in the output.txt file. Because all command
output goes either to STDOUT or STDERR, the output displayed above must be STDERR.

The STDERR output of a command can be sent to a file:

sysadmin@localhost:~$ ls /fake 2> error.txt

sysadmin@localhost:~$ more error.txt

ls: cannot access /fake: No such file or directory

sysadmin@localhost:~$

In the command above, the 2> indicates that all error messages should be sent to the file error.txt.
Redirecting Multiple Streams
It is possible to direct both the STDOUT and STDERR of a command at the same time. The
following command will produce both STDOUT and STDERR because one of the specified
directories exists and the other does not:

sysadmin@localhost:~$ ls /fake /etc/ppp

ls: cannot access /fake: No such file or directory

/etc/ppp:

chap-secrets ip-down ip-down.ipv6to4 ip-up ip-up.ipv6to4

ipv6-down ipv6-up options pap-secrets peers

If only the STDOUT is sent to a file, STDERR will still be printed to the screen:

sysadmin@localhost:~$ ls /fake /etc/ppp > example.txt

ls: cannot access /fake: No such file or directory

sysadmin@localhost:~$ cat example.txt

/etc/ppp:

chap-secrets

ip-down

ip-down.ipv6to4

ip-up

ip-up.ipv6to4

ipv6-down

ipv6-up

options

pap-secrets

peers

sysadmin@localhost:~$

If only the STDERR is sent to a file, STDOUT will still be printed to the screen:

sysadmin@localhost:~$ ls /fake /etc/ppp 2> error.txt

/etc/ppp:

hap-secrets ip-down ip-down.ipv6to4 ip-up ip-up.ipv6to4

ipv6-down ipv6-up options pap-secrets peers


sysadmin@localhost:~$ cat error.txt

ls: cannot access /fake: No such file or directory

sysadmin@localhost:~$

Both STDOUT and STDERR can be sent to a file by using &>, a character set that means
"both 1> and 2>:

sysadmin@localhost:~$ ls /fake /etc/ppp &> all.txt

sysadmin@localhost:~$ cat all.txt

ls: cannot access /fake: No such file or directory

/etc/ppp:

chap-secrets

ip-down

ip-down.ipv6to4

ip-up

ip-up.ipv6to4

ipv6-down

ipv6-up

options

pap-secrets

peers

sysadmin@localhost:~$

Note that when you use &>, the output appears in the file with all of the STDERR messages at the
top and all of the STDOUT messages below all STDERR messages:

sysadmin@localhost:~$ ls /fake /etc/ppp /junk /etc/sound &> all.txt

sysadmin@localhost:~$ cat all.txt

ls: cannot access /fake: No such file or directory

ls: cannot access /junk: No such file or directory

/etc/ppp:

chap-secrets

ip-down

ip-down.ipv6to4
ip-up

ip-up.ipv6to4

ipv6-down

ipv6-up

options

pap-secrets

peers

/etc/sound:

events

sysadmin@localhost:~$

If you don't want STDERR and STDOUT to both go to the same file, they can be redirected to
different files by using both > and 2>. For example:

sysadmin@localhost:~$ rm error.txt example.txt

sysadmin@localhost:~$ ls

Desktop Downloads Pictures Templates all.txt

Documents Music Public Videos

sysadmin@localhost:~$ ls /fake /etc/ppp > example.txt 2> error.txt

sysadmin@localhost:~$ ls

Desktop Downloads Pictures Templates all.txt example.txt

Documents Music Public Videos error.txt

sysadmin@localhost:~$ cat error.txt

ls: cannot access /fake: No such file or directory

sysadmin@localhost:~$ cat example.txt

/etc/ppp:

chap-secrets

ip-down

ip-down.ipv6to4

ip-up

ip-up.ipv6to4
ipv6-down

ipv6-up

options

pap-secrets

peers

sysadmin@localhost:~$

The order the streams are specified in does not matter.

Redirecting STDIN
The concept of redirecting STDIN is a difficult one because it is more difficult to
understand why you would want to redirect STDIN. With STDOUT and STDERR, the answer
to why is fairly easy: because sometimes you want to store the output into a file for future use.

Most Linux users end up redirecting STDOUT routinely, STDERR on occasion and STDIN...well,
very rarely. There are very few commands that require you to redirect STDIN because with most
commands if you want to read data from a file into a command, you can just specify the filename as
an argument to the command. The command will then look into the file.

For some commands, if you don't specify a filename as an argument, they will revert to using
STDIN to get data. For example, consider the following cat command:

sysadmin@localhost:~$ cat

hello

hello

how are you?

how are you?

goodbye

goodbye

sysadmin@localhost:~$

In the example above, the cat command wasn't provided a filename as an argument. So, it asked for
the data to display on the screen from STDIN. The user typed hello and then the cat command
displayed hello on the screen. Perhaps this is useful for lonely people, but not really a good use of
the cat command.
However, perhaps if the output of the cat command were redirected to a file, then this method could
be used either to add to an existing file or to place text into a new file:

sysadmin@localhost:~$ cat > new.txt

Hello

How are you?

Goodbye

sysadmin@localhost:~$ cat new.txt

Hello

How are you?

Goodbye

sysadmin@localhost:~$

While the previous example demonstrates another advantage of redirecting STDOUT, it doesn't
address why or how STDIN can be directed. To understand this, first consider a new command
called tr. This command will take a set of characters and translate them into another set of
characters.

For example, suppose you wanted to capitalize a line of text. You could use the tr command as
follows:

sysadmin@localhost:~$ tr 'a-z' 'A-Z'

watch how this works

WATCH HOW THIS WORKS

sysadmin@localhost:~$

The tr command took the STDIN from the keyboard (watch how this works) and converted all lower
case letters before sending STDOUT to the screen (WATCH HOW THIS WORKS).

It would seem that a better use of the tr command would be to perform translation on a file, not
keyboard input. However, the tr command does not support filename arguments:

sysadmin@localhost:~$ more example.txt

/etc/ppp:

chap-secrets

ip-down

ip-down.ipv6to4

ip-up

ip-up.ipv6to4
ipv6-down

ipv6-up

options

pap-secrets

peers

sysadmin@localhost:~$ tr 'a-z' 'A-Z' example.txt

tr: extra operand `example.txt'

Try `tr --help' for more information

sysadmin@localhost:~$

You can, however, tell the shell to get STDIN from a file instead of from the keyboard by using
the < character:

sysadmin@localhost:~$ tr 'a-z' 'A-Z' < example.txt

/ETC/PPP:

CHAP-SECRETS

IP-DOWN

IP-DOWN.IPV6TO4

IP-UP

IP-UP.IPV6TO4

IPV6-DOWN

IPV6-UP

OPTIONS

PAP-SECRETS

sysadmin@localhost:~$

This is fairly rare because most commands do accept filenames as arguments. But, for those that do
not, this method could be used to have the shell read from the file instead of relying on the
command to have this ability.

One last note: In most cases you probably want to take the resulting output and place it back into
another file:

sysadmin@localhost:~$ tr 'a-z' 'A-Z' < example.txt > newexample.txt

sysadmin@localhost:~$ more newexample.txt

/ETC/PPP:
CHAP-SECRETS

IP-DOWN

IP-DOWN.IPV6TO4

IP-UP

IP-UP.IPV6TO4

IPV6-DOWN

IPV6-UP

OPTIONS

PAP-SECRETS

sysadmin@localhost:~$

Searching for Files Using the Find Command


One of the challenges that users face when working with the filesystem, is trying to recall the
location where files are stored. There are thousands of files and hundreds of directories on a typical
Linux filesystem, so recalling where these files are located can pose challenges.

Keep in mind that most of the files that you will work with are ones that you create. As a result, you
often will be looking in your own home directory to find files. However, sometimes you may need
to search in other places on the filesystem to find files created by other users.

The find command is a very powerful tool that you can use to search for files on the filesystem. This
command can search for files by name, including using wildcard characters for when you are not
certain of the exact filename. Additionally, you can search for files based on file metadata, such as
file type, file size and file ownership.

The syntax of the find command is:

find [starting directory] [search option] [search criteria] [result option]

A description of all of these components:

Component Description
This is where the user specifies where to start searching. The find command will
[starting directory] search this directory and all of its subdirectories. If no starting directory is
provided, then the current directory is used for the starting point.
This is where the user specifies an option to determine what sort of metadata to
[search option] search for; there are options for file name, file size and many other file
attributes.
This is an argument that compliments the search option. For example, if the user
[search criteria] uses the option to search for a file name, the search criteria would be the
filename.
This option is used to specify what action should be taken once the file is found.
[result option]
If no option is provided, the file name will be printed to STDOUT.
Search by File Name
To search for a file by name, use the -name option to the find command:

sysadmin@localhost:~$ find /etc -name hosts

find: `/etc/dhcp': Permission denied

find: `/etc/cups/ssl': Permission denied

find: `/etc/pki/CA/private': Permission denied

find: `/etc/pki/rsyslog': Permission denied

find: `/etc/audisp': Permission denied

find: `/etc/named': Permission denied

find: `/etc/lvm/cache': Permission denied

find: `/etc/lvm/backup': Permission denied

find: `/etc/lvm/archive': Permission denied

/etc/hosts

find: `/etc/ntp/crypto': Permission denied

find: `/etc/polkit-l/localauthority': Permission denied

find: `/etc/sudoers.d': Permission denied

find: `/etc/sssd': Permission denied

/etc/avahi/hosts

find: `/etc/selinux/targeted/modules/active': Permission denied

find: `/etc/audit': Permission denied

sysadmin@localhost:~$

Note that two files were found: /etc/hosts and /etc/avahi/hosts. The rest of the output was STDERR
messages because the user who ran the command didn't have the permission to access certain
subdirectories.

Recall that you can redirect STDERR to a file so you don't need to see these error messages on the
screen:

sysadmin@localhost:~$ find /etc -name hosts 2> errors.txt

/etc/hosts

/etc/avahi.hosts
sysadmin@localhost:~$

While the output is easier to read, there really is no purpose to storing the error messages in
the error.txt file. The developers of Linux realized that it would be good to have a "junk file" to send
unnecessary data; any file that you send to the /dev/null file is discarded:

sysadmin@localhost:~$ find /etc -name hosts 2> /dev/null

/etc/hosts

/etc/avahi/hosts

sysadmin@localhost:~$

Searching for Files by Size


One of the many useful searching options is the option that allows you to search for files by size.
The -size option allows you to search for files that are either larger than or smaller then a specified
size as well as search for an exact file size.

When you specify a file size, you can give the size in bytes (c), kilobytes (k), megabytes (M) or
gigabytes (G). For example, the following will search for files in the /etc directory structure that are
exactly 10 bytes large:

sysadmin@localhost:~$ find /etc -size 10c -ls 2>/dev/null

432 4 -rw-r--r-- 1 root root 10 Jan 28 2015 /etc/adjtime

8814 0 drwxr-xr-x 1 root root 10 Jan 29 2015 /etc/ppp/ip-d

own.d

8816 0 drwxr-xr-x 1 root root 10 Jan 29 2015 /etc/ppp/ip-u

p.d

8921 0 lrwxrwxrwx 1 root root 10 Jan 29 2015 /etc/ssl/cert

s/349f2832.0 -> EC-ACC.pem

9234 0 lrwxrwxrwx 1 root root 10 Jan 29 2015 /etc/ssl/cert

s/aeb67534.0 -> EC-ACC.pem

73468 4 -rw-r--r-- 1 root root 10 Nov 16 20:42 /etc/hostname

sysadmin@localhost:~$

If you want to search for files that are larger than a specified size, you place a + character before the
size. For example, the following will look for all files in the /usr directory structure that are over 100
megabytes in size:

sysadmin@localhost:~$ find /usr -size +100M -ls 2> /dev/null


574683 104652 -rw-r--r-- 1 root root 107158256 Aug 7 11:06 /usr/share/icons/oxygen/icon-theme.cache

sysadmin@localhost:~$

To search for files that are smaller than a specified size, place a - character before the file size.

Additional Useful Search Options


There are many search options. The following table illustrates a few of these options:

Option Meaning
Allows the user to specify how deep in the directory structure to search. For example, -
-maxdepth maxdepth 1 would mean only search the specified directory and its immediate
subdirectories.
-group
Returns files owned by a specified group. For example, -group payroll would return files
owned by the payroll group.
-iname
Returns files that match specified filename, but unlike -name, -iname is case insensitive.
For example, -iname hosts would match files named hosts, Hosts, HOSTS, etc.
-mmin
Returns files that were modified based on modification time in minutes. For example, -
mmin 10 would match files that were modified 10 minutes ago.

-type
Returns files that match file type. For example, -type f would return files that are regular
files.
-user
Returns files owned by a specified user. For example, -user bob would return files owned
by the bob user.

8.4.5 Using Multiple Options


If you use multiple options, they act as an "and", meaning for a match to occur, all of the criteria
must match, not just one. For example, the following command will display all files in
the /etc directory structure that are 10 bytes in size and are plain files:

sysadmin@localhost:~$ find /etc -size 10c -type f -ls 2>/dev/null

432 4 -rw-r--r-- 1 root root 10 Jan 28 2015 /etc/adjtime

73468 4 -rw-r--r-- 1 root root 10 Nov 16 20:42 /etc/hostname

sysadmin@localhost:~$

Viewing Files Using the less Command


While viewing small files with the cat command poses no problems, it is not an ideal choice for
large files. The cat command doesn't provide any way to easily pause and restart the display, so the
entire file contents are dumped to the screen.

For larger files, you will want to use a pager command to view the contents. Pager commands will
display one page of data at a time, allowing you to move forward and backwards in the file by using
movement keys.

There are two commonly used pager commands:


The less command: This command provides a very advanced paging capability. It is normally
the default pager used by commands like the mancommand.

The more command: This command has been around since the early days of UNIX. While it
has fewer features than the less command, it does have one important advantage:
The less command isn't always included with all Linux distributions (and on some
distributions, it isn't installed by default). The more command is always available.

When you use the more or less commands, they will allow you to "move around" a document by
using keystroke commands. Because the developers of the less command based the command from
the functionality of the morecommand, all of the keystroke commands available in the more command
also work in the less command.

For the purpose of this manual, the focus will be on the more advanced command (less).
The more command is still useful to remember for times when the lesscommand isn't available.
Remember that most of the keystroke commands provided work for both commands.

less Movement Commands


There are many movement commands for the less command, each with multiple possible keys or key
combinations. While this may seem intimidating, remember you don't need to memorize all of these
movement commands; you can always use the h key whenever you need to get help.

The first group of movement commands that you may want to focus upon are the ones that are most
commonly used. To make this even easier to learn, the keys that are identical in more and less will be
summarized. In this way, you will be learning how to move in more and less at the same time:

Movement Key
Window forward Spacebar
Window backward b
Line forward Enter
Exit q
Help h
When simply using less as a pager, the easiest way to advance forward a page is to press
the spacebar.

less Searching Commands


There are two ways to search in the less command: you can either search forward or backwards from
your current position using patterns called regular expressions. More details regarding regular
expressions are provided later in this chapter.

To start a search to look forward from your current position, use the / key. Then, type the text or
pattern to match and press the Enter key.

If a match can be found, then your cursor will move in the document to the match. For example, in
the following graphic the expression "frog" was searched for in the /usr/share/dict/words file:

bullfrog
bullfrog's

bullfrogs

bullheaded

bullhorn

bullhorn's

bullhorns

bullied

bullies

bulling

bullion

bullion's

bullish

bullock

bullock's

bullocks

bullpen

bullpen's

bullpens

bullring

bullring's

bullrings

bulls

Notice that "frog" didn't have to be a word by itself. Also notice that while the lesscommand took
you to the first match from the current position, all matches were highlighted.

If no matches forward from your current position can be found, then the last line of the screen will
report Pattern not found:

Pattern not found (press RETURN)

To start a search to look backwards from your current position, press the ? key, then type the text or
pattern to match and press the Enter key. Your cursor will move backward to the first match it can
find or report that the pattern cannot be found.
If more than one match can be found by a search, then using the n key will allow you to move to the
next match and using the N key will allow you to go to a previous match.

Revisiting the head and tail Commands


Recall that the head and tail commands are used to filter files to show a limited number of lines. If
you want to view a select number of lines from the top of the file, you use the head command and if
you want to view a select number of lines at the bottom of a file, then you use the tail command.

By default, both commands display ten lines from the file. The following table provides some
examples:

Command Example Explanation of Displayed Text


head /etc/passwd First ten lines of /etc/passwd
head -3 /etc/group First three lines of /etc/group
head -n 3 /etc/group First three lines of /etc/group
help | head First ten lines of output piped from the helpcommand
tail /etc/group Last ten lines of /etc/group
tail -5 /etc/passwd Last five lines of /etc/passwd
tail -n 5 /etc/passwd Last five lines of /etc/passwd
help | tail Last ten lines of output piped from the helpcommand
As seen from the above examples, both commands will output text from either a regular file or from
the output of any command sent through a pipe. They both use the -n option to indicate how many
lines to output.

Negative Value with the -n Option


Traditionally in UNIX, the number of lines to output would be specified as an option with either
command, so -3 meant show three lines. For the tail command, either -3 or -n -3 still means show three
lines. However, the GNU version of the head command recognizes -n -3 as show all but the last
three lines, and yet the head command still recognizes the option -3 as show the first three lines.

Positive Value With the tail Command


The GNU version of the tail command allows for a variation of how to specify the number of lines to
be printed. If you use the -n option with a number prefixed by the plus sign, then the tail command
recognizes this to mean to display the contents starting at the specified line and continuing all the
way to the end.

For example, the following will display line #22 to the end of the output of the nlcommand:

sysadmin@localhost:~$ nl /etc/passwd | tail -n +22

22 sshd:x:103:65534::/var/run/sshd:/usr/sbin/nologin

23 operator:x:1000:37::/root:/bin/sh

24 sysadmin:x:1001:1001:System Administrator,,,,:/home/sysadmin:/bin/bash

sysadmin@localhost:~$
Following Changes to a File
You can view live file changes by using the -f option to the tail command. This is useful when you
want to see changes to a file as they are happening.

A good example of this would be when viewing log files as a system administrator. Log files can be
used to troubleshoot problems and administrators will often view them "interactively" with
the tail command as they are performing the commands they are trying to troubleshoot in a separate
window.

For example, if you were to log in as the root user, you could troubleshoot issues with the email
server by viewing live changes to its log file with the following command: tail -f /var/log/mail.log

Sorting Files or Input


The sort command can be used to rearrange the lines of files or input in either dictionary or numeric
order based upon the contents of one or more fields. Fields are determined by a field separator
contained on each line, which defaults to whitespace (spaces and tabs).

The following example creates a small file, using the head command to grab the first 5 lines of
the /etc/passwd file and send the output to a file called mypasswd.

sysadmin@localhost:~$ head -5 /etc/passwd > mypasswd

sysadmin@localhost:~$

sysadmin@localhost:~$ cat mypasswd

root:x:0:0:root:/root:/bin/bash

daemon:x:1:1:daemon:/usr/sbin:/bin/sh

bin:x:2:2:bin:/bin:/bin/sh

sys:x:3:3:sys:/dev:/bin/sh

sync:x:4:65534:sync:/bin:/bin/sync

sysadmin@localhost:~$

Now we will sort the mypasswd file:

sysadmin@localhost:~$ sort mypasswd

bin:x:2:2:bin:/bin:/bin/sh

daemon:x:1:1:daemon:/usr/sbin:/bin/sh

root:x:0:0:root:/root:/bin/bash

sync:x:4:65534:sync:/bin:/bin/sync

sys:x:3:3:sys:/dev:/bin/sh
sysadmin@localhost:~$

Fields and Sort Options


In the event that the file or input might be separated by another delimiter like a comma or colon,
the -t option will allow for another field separator to be specified. To specify fields to sort by, use
the -k option with an argument to indicate the field number (starting with 1 for the first field).

The other commonly used options for the sort command are the -n to perform a numeric sort and -r to
perform a reverse sort.

In the next example, the -t option is used to separate fields by a colon character and performs a
numeric sort using the third field of each line:

sysadmin@localhost:~$ sort -t: -n -k3 mypasswd

root:x:0:0:root:/root:/bin/bash

daemon:x:1:1:daemon:/usr/sbin:/bin/sh

bin:x:2:2:bin:/bin:/bin/sh

sys:x:3:3:sys:/dev:/bin/sh

sync:x:4:65534:sync:/bin:/bin/sync

sysadmin@localhost:~$

Note that the -r option could have been used to reverse the sort, making the higher numbers in the
third field appear at the top of the output:

sysadmin@localhost:~$ sort -t: -n -r -k3 mypasswd

sync:x:4:65534:sync:/bin:/bin/sync

sys:x:3:3:sys:/dev:/bin/sh

bin:x:2:2:bin:/bin:/bin/sh

daemon:x:1:1:daemon:/usr/sbin:/bin/sh

root:x:0:0:root:/root:/bin/bash

sysadmin@localhost:~$

Lastly, you may want to perform more complex sorts, such as sort by a primary field and then by a
secondary field. For example, consider the following data:

bob:smith:23

nick:jones:56
sue:smith:67

You might want to sort first by the last name (field #2) and then first name (field #1) and then by age
(field #3). This can be done with the following command:

sysadmin@localhost:~$ sort -t: -k2 -k1 -k3n filename

Viewing File Statistics With the wc Command


The wc command allows for up to three statistics to be printed for each file provided, as well as the
total of these statistics if more than one filename is provided. By default, the wc command provides
the number of lines, words and bytes (1 byte = 1 character in a text file):

sysadmin@localhost:~$ wc /etc/passwd /etc/passwd-

35 56 1710 /etc/passwd

34 55 1665 /etc/passwd-

69 111 3375 total

sysadmin@localhost:~$

The above example shows the output from executing: wc /etc/passwd /etc/passwd-. The output has four
columns: number of lines in the file, number of words in the file, number of bytes in the file and the
file name or total.

If you are interested in viewing just specific statistics, then you can use -l to show just the number of
lines, -w to show just the number of words and -c to show just the number of bytes.

The wc command can be useful for counting the number of lines output by some other command
through a pipe. For example, if you wanted to know the total number of files in the /etc directory,
you could execute ls /etc | wc -l:

sysadmin@localhost:~$ ls /etc/ | wc -l

136

sysadmin@localhost:~$

Using the cut Command to Filter File Contents


The cut command can extract columns of text from a file or standard input. A primary use of
the cut command is for working with delimited database files. These files are very common on
Linux systems.
By default, it considers its input to be separated by the Tab character, but the -doption can specify
alternative delimiters such as the colon or comma.
Using the -foption, you can specify which fields to display, either as a hyphenated range or a comma
separated list.

In the following example, the first, fifth, sixth and seventh fields from mypasswddatabase file are
displayed:

sysadmin@localhost:~$ cut -d: -f1,5-7 mypasswd

root:root:/root:/bin/bash

daemon:daemon:/usr/sbin:/bin/sh

bin:bin:/bin:/bin/sh

sys:sys:/dev:/bin/sh

sync:sync:/bin:/bin/sync

sysadmin@localhost:~$

Using the cut command, you can also extract columns of text based upon character position with
the -c option. This can be useful for extracting fields from fixed-width database files. For example,
the following will display just the file type (character #1), permissions (characters #2-10) and
filename (characters #50+) of the output of the ls -l command:

sysadmin@localhost:~$ ls -l | cut -c1-11,50-

total 12

drwxr-xr-x Desktop

drwxr-xr-x Documents

drwxr-xr-x Downloads

drwxr-xr-x Music

drwxr-xr-x Pictures

drwxr-xr-x Public

drwxr-xr-x Templates

drwxr-xr-x Videos

-rw-rw-r-- errors.txt

-rw-rw-r-- mypasswd

-rw-rw-r-- new.txt

sysadmin@localhost:~$
Using the grep Command to Filter File Contents
The grep command can be used to filter lines in a file or the output of another command based on
matching a pattern. That pattern can be as simple as the exact text that you want to match or it can
be much more advanced through the use of regular expressions (discussed later in this chapter).

For example, you may want to find all the users who can login to the system with the BASH shell,
so you could use the grep command to filter the lines from the /etc/passwd file for the lines containing
the characters bash:

sysadmin@localhost:~$ grep bash /etc/passwd

root:x:0:0:root:/root:/bin/bash

sysadmin:x:1001:1001:System Administrator,,,,:/home/sysadmin:/bin/bash

sysadmin@localhost:~$

To make it easier to see what exactly is matched, use the --color option. This option will highlight the
matched items in red:

sysadmin@localhost:~$ grep --color bash /etc/passwd

root:x:0:0:root:/root:/bin/bash

sysadmin:x:1001:1001:System Administrator,,,,:/home/sysadmin:/bin/bash

sysadmin@localhost:~$

In some cases you don't care about the specific lines that match the pattern, but rather how many
lines match the pattern. With the -c option, you can get a count of how many lines that match:

sysadmin@localhost:~$ grep -c bash /etc/passwd

sysadmin@localhost:~$

When you are viewing the output from the grep command, it can be hard to determine the original
line numbers. This information can be useful when you go back into the file (perhaps to edit the
file) as you can use this information to quickly find one of the matched lines.

The -n option to the grep command will display original line numbers:

sysadmin@localhost:~$ grep -n bash /etc/passwd

1:root:x:0:0:root:/root:/bin/bash

24:sysadmin:x:1001:1001:System Administrator,,,,:/home/sysadmin:/bin/bas

sysadmin@localhost:~$

Some additional useful grep options:


Examples Output
grep -v nologin /etc/passwd All lines not containing nologin in the /etc/passwd file
grep -l linux /etc/* List of files in the /etc directory containing linux
Listing of lines from files in the /etc directory containing any case (capital or
grep -i linux /etc/*
lower) of the character pattern linux
Listing of lines from files in the /etc directory containing the word
grep -w linux /etc/*
pattern linux

Basic Regular Expressions - the . Character


In the example below, a simple file is first created using redirection. Then the grep command is used
to demonstrate a simple pattern match:

sysadmin@localhost:~$ echo 'abcddd' > example.txt

sysadmin@localhost:~$ cat example.txt

abcddd

sysadmin@localhost:~$ grep --color 'a..' example.txt

abcddd

sysadmin@localhost:~$

In the previous example, you can see that the pattern a.. matched abc . The first . character matched
the b and the second matched the c.

In the next example, the pattern a..c won't match anything, so the grepcommand will not product any
output. For the match to be successful, there would need to be two characters between the a and
the c in example.txt:

sysadmin@localhost:~$ grep --color 'a..c' example.txt

sysadmin@localhost:~$

Basic Regular Expressions - the [ ] Characters


If you use the . character, then any possible character could match. In some cases you want to
specify exactly which characters you want to match. For example, maybe you just want to match a
lower-case alpha character or a number character. For this, you can use the [ ] Regular Expression
characters and specify the valid characters inside the [ ] characters.

For example, the following command matches two characters, the first is either an a or a b while the
second is either an a, b, c or d:

sysadmin@localhost:~$ grep --color '[ab][a-d]' example.txt

abcddd

sysadmin@localhost:~$
Note that you can either list out each possible character [abcd] or provide a range [a-d] as long as the
range is in the correct order. For example, [d-a]wouldn't work because it isn't a valid range:

sysadmin@localhost:~$ grep --color '[d-a]' example.txt

grep: Invalid range end

sysadmin@localhost:~$

The range is specified by a standard called the ASCII table. This table is a collection of all printable
characters in a specific order. You can see the ASCII table with the man ascii command. A small
example:

041 33 21 ! 141 97 61 a

042 34 22 142 98 62 b

043 35 23 # 143 99 63 c

044 36 24 $ 144 100 64 d

045 37 25 % 145 101 65 e

046 38 26 & 146 102 66 f

Since a has a smaller numeric value (141) then d (144), the range a-dincludes all characters
from a to d.

What if you want to match a character that can be anything but an x, y or z? You wouldn't want to
have to provide a [ ] set with all of the characters except x, y or z.

To indicate that you want to match a character that is not one of the listed characters, start your [
] set with a ^ symbol. For example, the following will demonstrate matching a pattern that includes
a character that isn't an a, b or cfollowed by a d:

sysadmin@localhost:~$ grep --color '[^abc]d' example.txt

abcddd

sysadmin@localhost:~$

Basic Regular Expressions - the * Character


The * character can be used to match "zero or more of the previous character". For example, the
following will match zero or more d characters:

sysadmin@localhost:~$ grep --color 'd*' example.txt

abcddd

sysadmin@localhost:~$
Basic Regular Expressions - the ^ and $
Characters
When you perform a pattern match, the match could occur anywhere on the line. You may want to
specify that the match occurs at the beginning of the line or the end of the line. To match at the
beginning of the line, begin the pattern with a ^symbol.

In the following example, another line is added to the example.txt file to demonstrate the use of
the ^ symbol:

sysadmin@localhost:~$ echo "xyzabc" >> example.txt

sysadmin@localhost:~$ cat example.txt

abcddd

xyzabc

sysadmin@localhost:~$ grep --color "a" example.txt

abcddd

xyzabc

sysadmin@localhost:~$ grep --color "^a" example.txt

abcddd

sysadmin@localhost:~$

Note that in the first grep output, both lines match because they both contain the letter a. In the
second grep output, only the line that began with the letter amatched.

In order to specify the match occurs at the end of line, end the pattern with the $character. For
example, in order to only find lines which end with the letter c:

sysadmin@localhost:~$ grep "c$" example.txt

xyzabc

sysadmin@localhost:~$

Basic Regular Expressions - the \ Character


In some cases you may want to match a character that happens to be a special Regular Expression
character. For example, consider the following:

sysadmin@localhost:~$ echo "abcd*" >> example.txt

sysadmin@localhost:~$ cat example.txt

abcddd
xyzabc

abcd*

sysadmin@localhost:~$ grep --color "cd*" example.txt

abcddd

xyzabc

abcd*

sysadmin@localhost:~$

In the output of the grep command above, you will see that every line matches because you are
looking for a c character followed by zero or more dcharacters. If you want to look for an
actual * character, place a \ character before the * character:

sysadmin@localhost:~$ grep --color "cd\*" example.txt

abcd*

sysadmin@localhost:~$

Extended Regular Expressions


The use of Extended Regular Expressions often requires a special option be provided to the
command to recognize them. Historically, there is a command called egrep, which is similar to grep,
but is able to understand their usage. Now, the egrep command is deprecated in favor of using grep
with the -Eoption.

The following regular expressions are considered "extended":

RE Meaning
? Matches previous character zero or one time, so it is an optional character
+ Matches previous character repeated one or more times
| Alternation or like a logical or operator
Some extended regular expressions examples:

Command Meaning Matches


grep -E 'colou?r' 2.txt Match colo following by zero or one u character color colour
grep -E 'd+' 2.txt Match one or more d characters d dd ddd dddd
grep -E 'gray|grey' 2.txt Match either gray or grey gray grey

xargs Command
The xargs command is used to build and execute command lines from standard input. This command
is very helpful when you need to execute a command with a very long list of arguments, which in
some cases can result in an error if the list of arguments is too long.
The xargs command has an option -0 which disables the end-of-file string, allowing the use of
arguments containing spaces, quotes, or backslashes.

The xargs command is useful for allowing commands to be executed more efficiently. Its goal is to
build the command line for a command to execute as few times as possible with as many arguments
as possible, rather than to execute the command many times with one argument each time.

The xargs command functions by breaking up the list of arguments into sublists and executing the
command with each sublist. The number of arguments in each sublist will not exceed the maximum
number of argments for the command being executed and therefore avoids an Argument list too long
error.

The following example shows a scenario where the xargs command allowed for many files to be
removed, where using a normal wildcard (glob) character failed:

sysadmin@localhost:~/many$ rm *

bash: /bin/rm: Argument list too long

sysadmin@localhost:~/many$ ls | xargs rm

sysadmin@localhost:~/many$

Shell Scripts in a Nutshell


A shell script is a file of executable commands that has been stored in a text file. When the file is
run, each command is executed. Shell scripts have access to all the commands of the shell,
including logic. A script can therefore test for the presence of a file or look for particular output and
change its behavior accordingly. You can build scripts to automate repetitive parts of your work,
which frees your time and ensures consistency each time you use the script. For instance, if you run
the same five commands every day, you can turn them into a shell script that reduces your work to
one command.

A script can be as simple as one command:

echo Hello, World!

The script, test.sh, consists of just one line that prints the string Hello, World! to the console.

Running a script can be done either by passing it as an argument to your shell or by running it
directly:

sysadmin@localhost:~$ sh test.sh

Hello, World!

sysadmin@localhost:~$ ./test.sh

-bash: ./test.sh: Permission denied

sysadmin@localhost:~$ chmod +x ./test.sh


sysadmin@localhost:~$ ./test.sh

Hello, World!

In the example above, first, the script is run as an argument to the shell. Next, the script is run
directly from the shell. It is rare to have the current directory in the binary search path $PATH so the
name is prefixed with ./ to indicate it should be run out of the current directory.

The error Permission denied means that the script has not been marked as executable. A
quick chmod later and the script works. chmod is used to change the permissions of a file, which will
be explained in detail in a later chapter.

There are various shells with their own language syntax. Therefore, more complicated scripts will
indicate a particular shell by specifying the absolute path to the interpreter as the first line, prefixed
by #! as shown:

#!/bin/sh

echo Hello, World!

or

#!/bin/bash

echo Hello, World!

The two characters, #! are traditionally called the hash and the bang respectively, which leads to the
shortened form of shebang when theyre used at the beginning of a script.

Incidentally, the shebang (or crunchbang) is used for traditional shell scripts and other text-based
languages like Perl, Ruby, and Python. Any text file marked as executable will be run under the
interpreter specified in the first line as long as the script is run directly. If the script is invoked
directly as an argument to an interpreter, such as sh script or bash script, the given shell will be used
no matter whats in the shebang line.

It helps to become comfortable using a text editor before writing shell scripts, since you will need to
create files in plain text. Traditional office tools like LibreOffice that output file formats containing
formatting and other information are not appropriate for this task.
Processors
A Central Processing Unit (CPU or processor) is one of the most important hardware components
of a computer. It performs the decision making as well as the calculations that need to be performed
to properly run an operating system. The processor is essentially a computer chip.
The processor is connected to the other hardware via a motherboard, also known as the system
board. Motherboards are designed to work with specific types of processors.
If a hardware system has more than one processor, the system is referred to as a multiprocessor. If
more than one processor is combined into a single processor chip, then it is called multi-core.
Although support is available for more types of processors in Linux than any other operating
system, there are primarily just two types of processors used on desktop and server computers: x86
and x86_64. On an x86 system, the system processes data 32 bits at a time; on a x86_64 the system
processes data 64 bits at a time. A x86_64 system is also capable of also processing data 32 bits at a
time in a backward compatible mode. One of the main advantages to a 64 bit system is that the
system is able to work with more memory.
The x86 family of processors was originated by Intel in 1978 with the release of the 8086 processor.
Since that time, Intel has produced many other processors that are improvements to the original
8086; they are known generically as x86 processors. These processors include the 80386 (also
known as the i386), 80486 (i486), the Pentium series (i586) and the Pentium Pro series ( i686). In
addition to Intel, other companies like AMD and Cyrix have also produced x86 compatible
processors. While Linux is capable of supporting processors back to the i386 generation, many
distributions limit their support to i686 or later.
The x86_64 family of processors, including the 64 bit processors from Intel and AMD, have been in
production since around the year 2000. As a result, most of the modern processors built today are
x86_64. While the hardware has been available for over a decade now, the software to support this
family of processors has been much slower to develop. Even as of 2013, there are many software
packages that are available for the x86 architecture, but not the x86_64.

You can see the family your CPU belongs to, using the arch command:
sysadmin@localhost:~$ arch
x86_64
sysadmin@localhost:~$

Another command you can use to identify the type of CPU in your system is the lscpu command:
sysadmin@localhost:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 44
Stepping: 2
CPU MHz: 2394.000
BogoMIPS: 4788.00
Virtualization: VT-x
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 12288K
NUMA node0 CPU(s): 0-3
sysadmin@localhost:~$
The first line of this output shows that the CPU is being used in a 32 bit mode, as the architecture
reported is x86_64. The second line of output shows that the CPU is capable of operating in either a
32 or 64 bit mode, therefore it is actually a 64 bit CPU.
The most detailed way of displaying information about your CPU(s) is viewing the
/proc/cpuinfo file with the cat command:
sysadmin@localhost:~$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
stepping : 2
microcode : 0x15
cpu MHz : 2394.000
cache size : 12288 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx rdtscp lm constant_ts
arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf pni pcl
mulqdq vmx ssse3 cx16 sse4_1 sse4_2 x2apic popcnt aes hypervisor lahf_lm ida
arat dtherm tpr_shadow vnmi ept vpid

While much of the output of the lscpu and the contents of the /proc/cpuinfo file appears to
be the same, one benefit to viewing the /proc/cpuinfo file is that the flags of the CPU are
displayed. The flags of a CPU are a very important component, since they indicate which
features the CPU supports and the capabilities of the CPU.

For example, the output from the previous example contains the flag lm (long mode), indicating
that this CPU is 64-bit capable. There are also flags that indicated if the CPU is capable of
supporting virtual machines (the ability to have multiple operating systems on a single computer).

dmidecode
The system board of many computers contains what is known as Basic Input and Output System
(BIOS). System Management BIOS (SMBIOS) is the standard that defines the data structures and
how to communicate information about computer hardware. The dmidecode command is able to
read and display the information from SMBIOS.

For devices directly attached to the motherboard, an administrator can use the dmidecode
command to view them. There is a great deal of information provided by the output of this
command.
The examples below provide you with a few ideas of what you can learn from the output of the
dmidecode command. This command is not available within the virtual machine environment of
this course.
In the first example, you can see that the BIOS supports booting directly from the CD-ROM. This is
important since operating system installs are often done by booting directly from the install CD:
# dmidecode 2.11
SMBIOS 2.4 present.
364 structures occupying 16040 bytes.
Table at 0x000E0010

Handle 0x0000, DMI type 0, 24 bytes


BIOS Information
Vendor: Phoenix Technologies LTD
Version: 6.00
Release Date: 06/22/2012
Address: 0xEA0C0
Runtime Size: 89920 bytes
ROM Size: 64 kB
Characteristics:
ISA is supported
PCI is supported
PC Card (PCMCIA) is supported
PNP is supported
APM is supported
BIOS is upgradeable
BIOS shadowing is allowed
ESCD support is available
Boot from CD is supported
--More--

In the next example, you can see that a total of 2048 (about 2GB) of RAM is installed on the
system:
Socket Designation: RAM socket #0
Bank Connections: None
Current Speed: Unknown
Type: EDO DIMM
Installed Size: 2048 MB (Single-bank Connection)
Enabled Size: 2048 MB (Single-bank Connection)
Error Status: OK

Random Access Memory


The motherboard normally has slots where Random Access Memory (RAM) can be connected to the
system. 32 bit architecture systems can use up to 4 gigabytes (GB) of RAM, while 64 bit
architectures are capable of addressing and using far more RAM.
In some cases, the RAM your system has might not be enough to handle all of the operating system
requirements. Each program needs to store data in RAM and the programs themselves are loaded
into RAM when they execute.
To avoid having the system fail due to a lack of RAM, virtual RAM (or swap space) is utilized.
Virtual RAM is hard drive space that is used to temporarily store RAM data when the system is
running out of RAM. Data that is stored in RAM and that has not been used recently is copied on to
the hard drive so currently active programs can use the RAM. If needed, this swapped data can be
stored back into RAM at a later time.

To view the amount of RAM in your system, including the virtual RAM, execute the free
command. The free command has a -m option to force the output to be rounded to the nearest
megabyte and a -g option to force the output to be rounded to the nearest gigabyte:
sysadmin@localhost:~$ free -m
total used free shared buffers cached
Mem: 1894 356 1537 0 25 177
-/+ buffers/cache: 153 1741
Swap: 4063 0 4063
sysadmin@localhost:~$

The output of executing this free command shows that the system it was executed on a system has
a total of 1,894 megabytes and is currently using 356 megabytes.
The amount of swap appears to be approximately 4 gigabytes, although none of it appears to be in
use. This makes sense because so much of the physical RAM is free, so there is no need at this time
for virtual RAM to be used.

Peripheral Devices
The motherboard has buses that allow for multiple devices to connect to the system, including the
Peripheral Component Interconnect (PCI) and Universal Serial Bus (USB). The motherboard also
has connectors for monitors, keyboards and mice.

In order to view all of the devices connected by the PCI bus, execute the lspci command. The
following is a sample output of this command. As you can see below in the highlighted sections,
this system has a VGA controller (a monitor connector), a SCSI storage controller (a type of hard
drive) and an Ethernet controller (a network connector):

The graphics below provide examples of using the lspci command. This command is not
available within the virtual machine environment of this course.
sysadmin@localhost:~$ lspci
00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge
(rev 01)
00:01.0 PCI bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge
(rev 01)
00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 08)
00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
00:07.7 System peripheral: VMware Virtual Machine Communication Interface (rev
10)
00:0f.0 VGA compatible controller: VMware SVGA II Adapter
03:00.0 Serial Attached SCSI controller: VMware PVSCSI SCSI Controller (rev 02
0b:00.0 Ethernet controller: VMware VMXNET3 Ethernet Controller (rev 01)

Executing the lspci command with the -nn option shows both a numeric identifier for each
device, as well as the original text description:
sysadmin@localhost:~$ lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host
bridge [8086:7190] (rev 01)
00:01.0 PCI bridge [0604]: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX AGP
bridge [8086:7191] (rev 01)
00:07.0 ISA bridge [0601]: Intel Corporation 82371AB/EB/MB PIIX4 ISA
[8086:7110](rev 08)
00:07.1 IDE interface [0101]: Intel Corporation 82371AB/EB/MB PIIX4 IDE
[8086:7111] (rev 01)
00:07.3 Bridge [0680]: Intel Corporation 82371AB/EB/MB PIIX4 ACPI
[8086:7113](rev 08)
00:07.7 System peripheral [0880]: VMware Virtual Machine Communication Interface
[15ad:0740] (rev 10)
00:0f.0 VGA compatible controller [0300]: VMware SVGA II Adapter [15ad:0405]
03:00.0 Serial Attached SCSI controller [0107]: VMware PVSCSI SCSI Controller
[15ad:07c0] (rev 02)
0b:00.0 Ethernet controller [0200]: VMware VMXNET3 Ethernet Controller
[15ad:07b0] (rev 01)

The highlighted section, [15ad:07b0], is referred to as the [vendor:device] section.

Using the [vendor:device] information can be useful for displaying detailed information about a
specific device. By using the -d vendor:device option, you can select to view information about
just one device.

You can also view more detailed information by using either the -v, -vv or -vvv option. The
more v characters, the more verbose the output will be. For example:
sysadmin@localhost:~$ lspci -d 15ad:07b0 -vvv
0b:00.0 Ethernet controller: VMware VMXNET3 Ethernet Controller (rev 01)
Subsystem: VMware VMXNET3 Ethernet Controller
Physical Slot: 192
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Step
ping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 19
Region 0: Memory at fd4fb000 (32-bit, non-prefetchable) [size=4K]
Region 1: Memory at fd4fc000 (32-bit, non-prefetchable) [size=4K]
Region 2: Memory at fd4fe000 (32-bit, non-prefetchable) [size=8K]
Region 3: I/O ports at 5000 [size=16]
[virtual] Expansion ROM at fd400000 [disabled] [size=64K]
Capabilities: <access denied>
Kernel driver in use: vmxnet3
Kernel modules: vmxnet3
sysadmin@localhost:~$

The lspci command shows detailed information about devices connected to the system via the
PCI bus. This information can be helpful to determine if the device is supported by the system, as
indicated by a Kernel driver or Kernel module in use, as shown in the last couple of lines of output
above.

Universal Serial Bus Devices


While the PCI bus is used for many internal devices such as sound and network cards, many
external devices (or peripherals) are connected to the computer via USB. Devices connected
internally are usually cold-plug, meaning the system must be shut down in order to connect or
disconnect a device. USB devices are hot-plug, meaning they can be connected or disconnected
while the system is running.
Note: The graphics below provide examples of using the lsusb command. This command is not
available within the virtual machine environment of this course.

To display the devices connected to the system via USB, execute the lsusb command:
sysadmin@localhost:~$ lsusb
Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
sysadmin@localhost:~$

The verbose option, -v, for the lsusb command shows a great amount of detail about each
device:
sysadmin@localhost:~$ lsusb -v

Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Couldnt open device, some information will be missing
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 1.10
bDeviceClass 9 Hub
bDeviceSubClass 0 Unused
bDeviceProtocol 0 Full speed (or root) hub
bMaxPacketSize0 64
idVendor 0x1d6b Linux Foundation
idProduct 0x0001 1.1 Linux Foundation
bcDevice 2.06
iManufacturer 3
iProduct 2
iSerial 1

Hardware Abstraction Layer


HAL is the Hardware Abstraction Layer. The daemon for HAL is hald, a process that gathers
information about all devices connected to the system. When events occur that change the state of
the connected devices, such as when a USB device is attached to the system, then hald broadcasts
this new information to any processes that have registered to be notified about new events.

Note: The graphic below provides an example of using the lshal command. This command is not
available within the virtual machine environment of this course.

The lshal command allows you to view the devices detected by HAL. This command produces a
huge amount of output; the following provides a small sample:
Disk Devices
Disk devices (AKA, hard drives) may be attached to the system in a number of ways; the controller
may be integrated into the motherboard, on a PCI (Peripheral Component Interconnect) card or a
USB device.
Hard drives are divided into partitions. A partition is a logical division of a hard drive, designed to
take a large amount of available storage space and break it up into smaller "chunks". While it is
common on Microsoft Windows to have a single partition for each hard drive, on Linux
distributions, multiple partitions per hard drive is common.
Some hard drives make use of a partitioning technology called Master Boot Record (MBR) while
others make use of a partitioning type called GUID Partitioning Table (GPT). The MBR type of
partitioning has been used since the early days of the Personal Computer (PC) and the GPT type has
been available since the year 2000.
An old term used to describe an internal hard disk is "fixed disk", as the disk is fixed (not
removable). This term gave rise to several command names: the fdisk, cfdisk and sfdisk
commands, which are tools for working with the MBR partitioned disks.
The GPT disks use a newer type of partitioning, which allows the user to divide the disk into more
partitions than what MBR supports. GPT also allows having partitions which can be larger than two
terabytes (MBR does not). The tools for managing GPT disks are named similar to the fdisk
counterparts: gdisk, cgdisk, and sgdisk.
There is also a family of tools that attempts to support both MBR and GPT type disks. This set of
tools includes the parted command and the graphical gparted tool.

Hard drives are associated with file names (called device files) that are stored in the /dev directory.
Different types of hard drives are given slightly different names: hd for IDE (Intelligent Drive
Electronics) hard drives and sd for USB, SATA (Serial Advanced Technology Attachment) and
SCSI (Small Computer System Interface) hard drives.
Each hard drive is assigned a letter, for example, the first IDE hard drive would have a device file
name of /dev/hda and the second IDE hard drive would have be associated with the /dev/hdb
device file.
Partitions are given unique numbers for each device. For example, if a USB hard drive had two
partitions, they could be associated with the /dev/sda1 and /dev/sda2 device files.

In the following output, you can see that this system has three sd devices: /dev/sda, /dev/sdb
and /dev/sdc. Also, you can see there are two partitions on the first device (as evidenced by the
/dev/sda1 and /dev/sda2 files) and one partition on the second device (as evidenced by the
/dev/sdb1 file):
root@localhost:~$ ls /dev/sd*
/dev/sda /dev/sda1 /dev/sda2 /dev/sdb /dev/sdb1 /dev/sdc
root@localhost:~$

In the following example, the fdisk command is used to display partition information on the first
sd device.
Note: The following command requires root access
root@localhost:~# fdisk -l /dev/sda
Disk /dev/sda: 21.5 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders, total 41943040 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000571a2

Device Boot Start End Blocks Id System


/dev/sda1 * 2048 39845887 19921920 83 Linux
/dev/sda2 39847934 41940991 1046529 5 Extended
/dev/sda5 39847936 41940991 1046528 82 Linux swap / Solaris
root@localhost:~#

Creating and modifying partitions is beyond the scope of this course.

Optical Disks
Optical disks, often referred to as CD-ROMs, DVDs, or Blue-Ray are removable storage media.
While some devices used with optical disks are read-only, others are capable of burning (writing to)
disks, when using a writable type of disk. There are various standards for writable and rewritable
disks, such as CD-R, CD+R, DVD+RW, and DVD-RW. These media standards go beyond the
scope of the curriculum.
Where these removable disks are mounted in the file system is an important consideration for a
Linux administrator. Modern distributions often mount the disks under the /media folder, while
older distributions typically mount them under the /mnt folder.
Upon mounting, most GUI interfaces will prompt the user to take an action, such as open the
contents of the disk in a file browser or start a media program. When the user is finished using the
disk, it is prudent to unmount it using a menu or the eject command. While pressing the eject
button will open the disk tray, some programs will not realize that the disk is no longer mounted on
the filesystem.

Video Display Devices


In order to display video (output to the monitor), the computer system must have a video display
device (AKA, video card) and a monitor. Video display devices are often directly attached to the
motherboard, although they can also be connected through the PCI bus slots on the motherboard.
Unfortunately, since the early days of the PC, no video standard has been approved by the major
vendors, so each video display device usually requires a proprietary driver provided by the vendor.
Drivers are software programs that allow the operating system to communicate with the device.
Drivers must be written for the specific operating system, something that is commonly done for
Microsoft Windows, but not always for Linux. Fortunately, the three largest video display vendors
all now provide at least some level of Linux support.
There are two types of video cables commonly used: the analog 15 pin Video Graphics Array
(VGA) cable and the 29 pin Digital Visual Interface (DVI).
In order for monitors to work properly with video display devices, they must be able to support the
same resolution as the video display device. Normally, the software driving the video display device
(commonly the X.org server) will normally be able to automatically detect the maximum resolution
that the video display and monitor can both support and set the screen resolution to that value.
Graphical tools are normally provided to change your resolution, as well as the maximum number
of colors that can be displayed (known as the color depth) with your Linux distribution. For
distributions using the X.org server, the /etc/X11/xorg.conf file can be used to change
resolution, color depth and other settings.

Managing Devices
In order for a device to be used in Linux, there may be several different kinds of software required.
First of all there is the driver software. The driver could be compiled as part of the Linux kernel,
loaded into the kernel as a module or loaded by a user command or application. Most devices have
the driver either built-in to the kernel or have it loaded into the kernel, as the driver may require the
low-level kind of access that the kernel has with devices.
External devices, like scanners and printers, typically have their drivers loaded by an application
and these drivers in turn communicate through the device via the kernel through an interface such as
USB.
In order to be successful in enabling devices in Linux, it is best to check the Linux distribution to
see if the device is certified to work with that distribution. Commercial distributions like Red Hat
and SUSE have web pages dedicated to listing hardware that is certified or approved to work with
their software.
Additional tips on being successful with connecting your devices: avoid brand new or highly
specialized devices and check with the vendor of the device to see if they support Linux before
making a purchase.
HARDWARE RECORD

Package Management
Package management is a system by which software can be installed, updated, queried or removed
from a filesystem. In Linux, there are many different software package management systems, but
the two most popular are those from Debian and Red Hat.

Debian Package Management


The Debian distribution and its derivatives such as Ubuntu and Mint, use the Debian package
management system. At the heart of Debian-derived distributions package management are the
software packages that are distributed as files ending in ".deb".

The lowest level tool for managing these files is the dpkg command. This command can be tricky
for novice Linux users, so the Advanced Package Tool, apt-get, a front-end program to the
dpkg tool, makes management of packages even easier. There are other command line tools which
serve as front-ends to dpkg, such as aptitude, as well as GUI front-ends like synaptic and
software-center.

Debian - Adding Packages


The Debian repositories contain more than 65,000 different packages of software. To get an updated
list from these Internet repositories, you can execute the sudo apt-get update command.

To search for keywords within these packages, you can use the sudo apt-cache search
keyword command.

Once you've found the package that you want to install, you can install it with the sudo apt-get
install package command.

Important: To execute these commands, your system will need access to the Internet. The apt-
cache command searches repositories on the Internet for these software programs.

Debian - Updating Packages


If you want to update an individual package, you perform the command to install that package:
sudo apt-get install package
If an older version of the package is already installed, then it will be upgraded. Otherwise, a new
installation would take place.
If you want to update all possible packages, then you would execute the sudo apt-get
upgrade command.
Users who log in with a graphical interface may have a message appear in the notification area from
the update-manager indicating that updates are available, as shown below:

Debian - Removing Packages


Beware that removing one package of software may result in the removal of other packages. Due to
the dependencies between packages, if you remove a package, then all packages that need, or
depend on that package will be removed as well.
If you want to remove all the files of a software package, except for the configuration files, then you
can execute the sudo apt-get remove package command.
If you want to remove all the files of a software package, including the configuration files, then you
can execute the sudo apt-get --purge remove package command.
You may want to keep the configuration files in the event that you plan to reinstall the software
package at a later time.
Debian - Querying Packages
There are several different kinds of queries that administrators need to use. To get a list of all the
packages that are currently installed on the system, execute the dpkg -l command.

To list the files that comprise a particular package, you can execute the dpkg -L package
command.

To query a package for information, or its state, use the dpkg -s package command.
To determine if a particular file was put on the filesystem as the result of installing a package, use
the dpkg -S /path/to/file command. If the file was part of a package, then the package
name could be provided. For example:
sysadmin@localhost:~$ dpkg -S /usr/bin/who
coreutils: /usr/bin/who

The previous example shows the file /usr/bin/who is part of the coreutils package.

RPM Package Management


The Linux Standards Base, which is a Linux Foundation project, is designed to specify (through a
consensus) a set of standards that increase the compatibility between conforming Linux systems.
According to the Linux Standards Base, the standard package management system is RPM.

RPM makes use of an .rpm file for each software package. This system is what Red Hat-derived
distributions (like Red Hat, Centos, and Fedora) use to manage software. In addition, several other
distributions that are not Red Hat-derived (such as SUSE, OpenSUSE and Mandriva) also use
RPM.
Note: RPM commands are not available within the virtual machine environment of this course.
Like the Debian system, RPM Package Management systems track dependencies between
packages. Tracking dependencies ensures that when you install a package, the system will also
install any packages needed by that package to function correctly. Dependencies also ensure that
software updates and removals are performed properly.

The back-end tool most commonly used for RPM Package Management is the rpm command.
While the rpm command can install, update, query and remove packages, the command line front
end tools such as yum and up2date automate the process of resolving dependency issues.

In addition, there are GUI-based front end tools such as yumex and gpk-application (shown
below) that also make RPM package management easier.
You should note that the many of following commands will require root privileges. The rule of
thumb is that if a command affects the state of a package, you will need to have administrative
access. In other words, a regular user can perform a query or a search, but to add, update or remove
a package requires the command be executed as the root user.

RPM - Adding Packages


To search for a package from the configured repositories, execute the yum search keyword
command.

To install a package, along with its dependencies, execute the yum install package command.

RPM - Updating Packages


If you want to update an individual software package, you can execute the yum update package
command.

If you want to update all packages, you can execute the yum update command.

If updates are available and the user is using the GUI, then the gpk-update-viewer may show
a message in the notification area of the screen indicating that updates are available.
Removing Packages
As is the case with any package management system that tracks dependencies, if you want to
remove one package, then you may end up removing more than one, due to the dependencies. The
easiest way to automatically resolve the dependency issues is to use a yum command:
yum remove package

While you can remove software packages with the rpm command, it won't remove dependency
packages automatically.

Querying Packages
Red Hat package management is similar to Debian package management when it comes to
performing queries. It is best to use the back-end tool, rpm, instead of the front-end tool, yum.
While front-end tools can perform some of these queries, performance suffers because these
commands typically connect to multiple repositories across the network when executing any
command. The rpm command performs its queries by connecting to a database that is local to the
machine and doesn't connect over the network to any repositories.

To get a list of all the packages that are currently installed on the system, execute the rpm -qa
command.

To list the files that comprise a particular package, execute the rpm -ql package command.

To query a package for information, or its state, execute the rpm -qi package command.
To determine if a particular file was put on the filesystem as the result of installing a package,
execute the rpm -qf /path/to/file command.

Linux Kernel
When most people refer to Linux, they are really referring to GNU/Linux, which defines the
operating system. The Gnu's Not Unix (GNU) part of this combination is provided by a Free
Software Foundation project. GNU is what provides the open source equivalents of many common
UNIX commands, the bulk of the essential command line commands. The Linux part of this
combination is the Linux kernel, which is core of the operating system. The kernel is loaded at boot
time and stays loaded to manage every aspect of the running system.
The implementation of the Linux kernel includes many subsystems that are a part of the kernel itself
and others that may be loaded in a modular fashion when needed. Some of the key functions of the
Linux kernel include a system call interface, process management, memory management, virtual
filesystem, networking, and device drivers.
In a nutshell, the kernel accepts commands from the user and manages the processes that carry out
those commands by giving them access to devices like memory, disks, network interfaces,
keyboards, mice, monitors and more.
The kernel provides access to information about running processes through a pseudo filesystem that
is visible under the /proc directory. Hardware devices are made available through special files
under the /dev directory, while information about those devices can be found in another pseudo
filesystem under the /sys directory.

The/proc directory not only contains information about running processes, as its name would
suggest (process), but it also contains information about the system hardware and the current kernel
configuration. See an example output below:
Keep in mind that the information displayed in the examples below will be different from what you
may see within the virtual machine environment of this course.
The output from executing ls /proc shows more than one hundred numbered directories. There
is a numbered directory for each running process on the system, where the name of the directory
matches the PID (process ID) for the running process.

Because the /sbin/init process is always the first process, it has a PID of 1 and the information
about the /sbin/init process can be found in the /proc/1 directory. As you will see later in
this chapter, there are several commands that allow you to view information about running
processes, so it is rarely necessary for users to have to view the files for each running process
directly.

You might also see that there are a number of regular files the /proc directory, such as
/proc/cmdline, /proc/meminfo and /proc/modules. These files provide information
about the running kernel:

The /proc/cmdline file can be important because it contains all of the information that
was passed to the kernel was first started.
The /proc/meminfo file contains information about the use of memory by the kernel.
The /proc/modules file holds a list of modules currently loaded into the kernel to add
extra functionality.
Again, there is rarely a need to view these files directly, as other commands offer more "user
friendly" output and an alternative way to view this information.
While most of the "files" underneath the /proc directory cannot be modified, even by the root
user, the "files" underneath the /proc/sys directory can be changed by the root user. Modifying
these files will change the behavior of the Linux kernel.
Direct modification of these files cause only temporary changes to the kernel. To make changes
permanent, entries can be added to the /etc/sysctl.conf file.

For example, the /proc/sys/net/ipv4 directory contains a file named


icmp_echo_ignore_all. If that file contains a zero 0, as it normally does, then the system
will respond to icmp requests. If that file contains a one 1, then the system will not respond to
icmp requests:
[user@localhost ~]$ su -
Password:
[root@localhost ~]# cat /proc/sys/net/ipv4/icmp_echo_ignore_all
0
[root@localhost ~]# ping -c1 localhost
PING localhost.localdomain (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost.localdomain (127.0.0.1): icmp_seq=1 ttl=64 time=0.026 ms

--- localhost.localdomain ping statistics ---


1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.026/0.026/0.026/0.000 ms
[root@localhost ~]# echo 1 > /proc/sys/net/ipv4/icmp_echo_ignore_all
[root@localhost ~]# ping -c1 localhost
PING localhost.localdomain (127.0.0.1) 56(84) bytes of data.

--- localhost.localdomain ping statistics ---


1 packets transmitted, 0 received, 100% packet loss, time 10000ms
ps (Process) Command
Another way of viewing processes is with the ps command. By default, the ps command will only
show the current processes running in the current shell. Ironically, you will see ps running when
you want to see what else is running in the current shell:
sysadmin@localhost:~$ ps
PID TTY TIME CMD
6054 ? 00:00:00 bash
6070 ? 00:00:01 xeyes
6090 ? 00:00:01 firefox
6146 ? 00:00:00 ps
sysadmin@localhost:~$
Similar to the pstree command, if you run ps with the option --forest, then it will show lines
indicating the parent and child relationship:
sysadmin@localhost:~$ ps --forest
PID TTY TIME CMD
6054 ? 00:00:00 bash
6090 ? 00:00:02 \_ firefox
6180 ? 00:00:00 \_ dash
6181 ? 00:00:00 \_ xeyes
6188 ? 00:00:00 \_ ps
sysadmin@localhost:~$

To be able to view all processes on the system you can execute either the ps aux command or the
ps -ef command:
sysadmin@localhost:~$ ps aux | head
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 17872 2892 ? Ss 08:06 0:00 /sbin?? /ini
syslog 17 0.0 0.0 175744 2768 ? Sl 08:06 0:00
/usr/sbin/rsyslogd -c5
root 21 0.0 0.0 19124 2092 ? Ss 08:06 0:00 /usr/sbin/cron
root 23 0.0 0.0 50048 3460 ? Ss 08:06 0:00 /usr/sbin/sshd
bind 39 0.0 0.0 385988 19888 ? Ssl 08:06 0:00 /usr/sbin/named
-u bind
root 48 0.0 0.0 54464 2680 ? S 08:06 0:00 /bin/login -f
sysadmin 60 0.0 0.0 18088 3260 ? S 08:06 0:00 -bash
sysadmin 122 0.0 0.0 15288 2164 ? R+ 16:26 0:00 ps aux
sysadmin 123 0.0 0.0 18088 496 ? D+ 16:26 0:00 -bash
sysadmin@localhost:~$ ps -ef | head
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 08:06 ? 00:00:00 /sbin?? /init
syslog 17 1 0 08:06 ? 00:00:00 /usr/sbin/rsyslogd -c5
root 21 1 0 08:06 ? 00:00:00 /usr/sbin/cron
root 23 1 0 08:06 ? 00:00:00 /usr/sbin/sshd
bind 39 1 0 08:06 ? 00:00:00 /usr/sbin/named -u bind
root 48 1 0 08:06 ? 00:00:00 /bin/login -f
sysadmin 60 48 0 08:06 ? 00:00:00 -bash
sysadmin 124 60 0 16:46 ? 00:00:00 ps -ef
sysadmin 125 60 0 16:46 ? 00:00:00 head
sysadmin@localhost:~$

The output of all processes running on a system can definitely be overwhelming. In the example
provided, the output of the ps command was filtered by the head command, so only the first ten
processes were shown. If you don't filter the output of the ps command, then you are likely to have
to scroll through hundreds of processes to find what might interest you.

A common way to run the ps command is to use the grep command to filter the output display
lines that match a keyword, such as the process name. For example, if you wanted to view the
information about the firefox process, you may execute a command like:
sysadmin@localhost:~$ ps -e | grep firefox
6090 pts/0 00:00:07 firefox

As the root user, you may be more concerned about the processes of another user, than you are over
your own processes. Because of the several styles of options that the ps command supports, there
are different ways to view an individual user's processes. Using the traditional UNIX option, to
view the processes of the sysadmin user, execute the following command:
[root@localhost ~]# ps -u username

Or use the BSD style of options and execute:


[root@localhost ~]# ps u U username

top Command
The ps command provides a "snapshot" of the processes running at the instant the command is
executed, the top command that will regularly update the output of running processes. The top
command is executed as follows:
sysadmin@localhost:~$ top

By default, the output of the top command is sorted by the % of CPU time that each process is
currently using, with the higher values listed first. This means processes that are "CPU hogs" are
listed first:
top - 16:58:13 up 26 days, 19:15, 1 user, load average: 0.60, 0.74, 0.60
Tasks: 8 total, 1 running, 7 sleeping, 0 stopped, 0 zombie
Cpu(s): 6.0%us, 2.5%sy, 0.0%ni, 90.2%id, 0.0%wa, 1.1%hi, 0.2%si, 0.0%st
Mem: 32953528k total, 28126272k used, 4827256k free, 4136k buffers
Swap: 0k total, 0k used, 0k free, 22941192k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND


1 root 20 0 17872 2892 2640 S 0 0.0 0:00.02 init
17 syslog 20 0 171m 2768 2392 S 0 0.0 0:00.20 rsyslogd
21 root 20 0 19124 2092 1884 S 0 0.0 0:00.02 cron
23 root 20 0 50048 3460 2852 S 0 0.0 0:00.00 sshd
39 bind 20 0 376m 19m 6100 S 0 0.1 0:00.12 named
48 root 20 0 54464 2680 2268 S 0 0.0 0:00.00 login
60 sysadmin 20 0 18088 3260 2764 S 0 0.0 0:00.01 bash
127 sysadmin 20 0 17216 2308 2072 R 0 0.0 0:00.01 top

There is an extensive list of commands that can be executed from within the running top program:

Keys Meaning
h or ? Help
l Toggle load statistics
t Toggle time statistics
m Toggle memory usage statistics
< Move sorted column to the left
> Move sorted column to the right
F Choose sorted field
R Toggle sort direction
P Sort by % CPU
M Sort by % memory used
k Kill a process (or send it a signal)
r Renice priority of a process
One of the advantages of the top command is that it can be left running to stay on "top" of
processes for monitoring purposes. If a process begins to dominate, or "run away" with the system,
then it will by default appear at the top of the list presented by the top command. An administrator
that is running the top command can then take one of two actions:

1. Terminate the "run away" process: Pressing the k key while the top command is running
will prompt the user to provide the PID and then a signal number. Sending the default signal
will request the process terminate, but sending signal number 9, the KILL signal, will force
the process to terminate.
2. Adjust the priority of the process: Pressing the r key while the top command is running
will prompt the user for the process to renice, and then a niceness value. Niceness values
can range from -20 to 19, and affect priority. Only the root user can use a niceness that is a
lower number than the current niceness, or a negative niceness value, which causes the
process to run with an increased priority. Any user can provide a niceness value that is
higher than the current niceness value, which will cause the process to run with a lowered
priority.

Another advantage of the top command is that it is able to give you an overall representation of
how busy the system is currently and the trend over time. The load averages shown in the first line
of output from the top command indicate how busy the system has been during the last one, five
and fifteen minutes. This information can also be viewed by executing the uptime command or
directly by displaying the contents of the /proc/loadavg file:
sysadmin@localhost:~$ cat /proc/loadavg
0.12 0.46 0.25 1/254 3052

The first three numbers in this file indicate the load average over the last one, five and fifteen
minute intervals. The fourth value is a fraction which shows the number of processes currently
executing code on the CPU 1 and the total number of processes 254. The fifth value is the last PID
value that executed code on the CPU.
The number reported as a load average is proportional to the number of CPU cores that are able to
execute processes. On a single core CPU, a value of one would mean that the system is fully loaded.
On a four core CPU, a value of one would mean that the system is only 1/4 or 25% loaded.

Another reason administrators like to keep the top command running is the ability to monitor
memory usage in real-time. Both the top and the free command display statistics for how overall
memory is being used.

The top command also has the ability to show the percent of memory used by each process, so a
process that is consuming an inordinate amount of memory can quickly be identified.

Log Files
As the kernel and various processes run on the system, they produce output that describes how they
are running. Some of this output is displayed in the terminal window where the process was
executed, but some of this data is not sent to the screen, but instead it is written to various files. This
is called "log data" or "log messages".
These log files are very important for a number of reasons; they can be helpful in trouble-shooting
problems and they can be used for determining whether or not unauthorized access has been
attempted.
Some processes are able to "log" their own data to these files, other processes rely on another
process (a daemon) to handle these log data files.
These logging daemons can vary from one distribution to another. For example, on some
distributions, the daemons that run in the background to perform logging are called as syslogd
and klogd. In other distributions, a single daemon such as rsyslogd in Red Hat and Centos or
systemd-journald in the Fedora may serve this logging function.
Regardless of what the daemon process is named, the log files themselves are almost always placed
into the /var/log directory structure. Although some of the file names may vary, here are some
of the more common files to be found in this directory:

File Contents
boot.log Messages generated as services are started during the startup of the system.
Messages generated by the crond daemon for jobs to be executed on a recurring
cron
basis.
dmesg Messages generated by the kernel during system boot up.
maillog Messages produced by the mail daemon for e-mail messages sent or received
Messages from the kernel and other processes that don't belong elsewhere.
messages Sometimes named syslog instead of messages after the daemon that writes this
file.
Messages from processes that required authorization or authentication (such as the
secure
login process).
Xorg.0.log Messages from the X windows (GUI) server.
Log files are rotated, meaning older log files are renamed and replaced with newer log files. The
file names that appear in the table above may have a numeric or date suffix added to them, for
example: secure.0 or secure-20131103
Rotating a log file typically occurs on a regularly scheduled basis, for example, once a week. When
a log file is rotated, the system stops writing to the log file and adds a suffix to it. Then a new file
with the original name is created and the logging process continues using this new file.
With the modern daemons, a date suffix is typically used. So, at the end of the week ending
November 3, 2013, the logging daemon might stop writing to /var/log/messages, rename
that file /var/log/messages-20131103, and then begin writing to a new
/var/log/messages file.
Although most log files contain text as their contents, which can be viewed safely with many tools,
other files such as the /var/log/btmp and /var/log/wtmp files contain binary. By using the
file command, you can check the file content type before you view it to make sure that it is safe
to view.
For the files that contain binary data, there are normally commands available that will read the files,
interpret their contents and then output text. For example, the lastb and last commands can be
used to view the /var/log/btmp and /var/log/wtmp files respectively.
For security reasons, most of the files found are not readable by ordinary users, so be sure to
execute commands that interact with these files with root privileges.

dmesg Command
The /var/log/dmesg file contains the kernel messages that were produced during system
startup. The /var/log/messages file will contain kernel messages that are produced as the
system is running, but those messages will be mixed in with other messages from daemons or
processes.
Although the kernel doesn't have its own log file normally, one can be configured for them typically
by modifying either the /etc/syslog.conf or the /etc/rsyslog.conf file. In addition,
the dmesg command can be used to view the kernel ring buffer, which will hold a large number of
messages that are generated by the kernel.
On an active system, or one experiencing many kernel errors, the capacity of this buffer may be
exceeded and some messages might be lost. The size of this buffer is set at the time the kernel is
compiled, so it is not trivial to change.

Executing the dmesg command can produce up to 512 kilobytes of text, so filtering the command
with a pipe to another command like less or grep is recommended. For example, if you were
troubleshooting problems with your USB device, then searching for the text "USB" with the grep
command being case insensitive may be helpful:
sysadmin@localhost:~$ dmesg | grep -i usb
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
ohci_hcd 0000:00:06.0: new USB bus registered, assigned bus number 1
usb usb1: New USB device found, idVendor=1d6b, idProduct=0001
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1

The kernel and /proc


In this task, you will explore the /proc directory and commands which communicate with the
Linux kernel. The /proc directory appears to be an ordinary directory, like /usr or /etc, but it
is not. Unlike /usr or /etc directories, which are usually written to a disk drive, the /proc
directory is a pseudo filesystem maintained in the memory of the computer.

The /proc directory contains a subdirectory for each running process on the system. Programs
such as ps and top read the information about running processes from these directories. The
/proc directory also contains information about the operating system and its hardware in files like
/proc/cpuinfo, /proc/meminfo and /proc/devices.
The /proc/sys subdirectory contains pseudo files that can be used to alter the settings of the
running kernel. As these files are not "real" files, an editor should not be used to change them;
instead you should use either the echo or sysctl command to overwrite the contents of these
files. For the same reason, do not attempt to view these files in an editor, but use the cat or
sysctl command instead.

For permanent configuration changes, the kernel uses the /etc/sysctl.conf file. Typically,
this file is used by the kernel to make changes to the /proc files when the system is starting up.
Recall that the directories that have numbers for names represent running processes on the system.
The first process is always /sbin/init, so the directory /proc/1 will contain files with
information about the running init process.

The cmdline file inside the process directory (/proc/1/cmdline, for example) will show the
command that was executed. The order in which other processes are started varies greatly from
system to system. Since the content of this file does not contain a new line character, an echo
command will be executed to cause the prompt to go to a new line.

Use cat and then ps to view information about the /sbin/init process (Process IDentifier
(PID) of 1):
cat /proc/1/cmdline; echo
ps -p 1

Your output should look something like this:


sysadmin@localhost:~$ cat /proc/1/cmdline; echo
/sbin/init

Note: The echo command in this example is executed immediately after the cat command. Since
it has no argument, it functions only to put the following command prompt on a new line. Execute
the cat command alone to see the difference.
sysadmin@localhost:~$ ps -p 1
PID TTY TIME CMD
1 ? 00:00:00 init

The other files in the /proc directory contain information about the operating system. The
following tasks will be used to view and modify these files.

View the /proc/cmdline file to see what arguments were passed to the kernel at boot time:
cat /proc/cmdline

The output of the command should be similar to this:


sysadmin@localhost:~$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.2.0-34-generic root=/dev/mapper/vlabs--vg-root ro
cgroup_enable=memory swapaccount=1
sysadmin@localhost:~$

To start the process in the background, type:


ping localhost > /dev/null &

Your output should be similar to the following:


sysadmin@localhost:~$ ping localhost > /dev/null &
[1] 158

By adding the ampersand & to the end of the command, the process is started in the background
allowing the user to maintain control of the terminal.
An easier way to enter the above command would be to take advantage of the command history.
You could have pressed the Up Arrow Key on the keyboard, add a Space and & to the end of the
command and then pressed the Enter key. This is a time-saver when entering similar commands.
Notice that the previous command returns the following information:
[1] 158

This means that this process has a job number of 1 (as demonstrated by the output of [1]) and a
Process ID (PID) of 158. Each terminal/shell will have unique job numbers. The PID is system-
wide; each process has a unique ID number.
This information is important when performing certain process manipulations, such as stopping
processes or changing their priority value.
Note: Your Process ID will likely be different than the one in the example.

Bring the first command to the foreground by typing the following:


fg %1

Your output should be similar to the following:


sysadmin@localhost:~$ fg %1
ping localhost > /dev/null

To have this process continue executing in the background, execute the following command:
bg %1

Your output should be similar to the following:


sysadmin@localhost:~$ bg %1
[1]+ ping localhost > /dev/null &

Using the job number, stop the last ping command with the kill command and verify it was
stopped executing the jobs command:
kill %1
jobs

Your output should be similar to the following:


sysadmin@localhost:~$ kill %1
sysadmin@localhost:~$ jobs
[1] Terminated ping localhost > /dev/null &
Finally, you can stop all of the ping commands with the killall command. After executing the
killall command, wait a few moments, and then run the jobs command to verify that all
processes have stopped:
killall ping

You might also like