Professional Documents
Culture Documents
Lecture 4
Lecture 4
Lecture 4
Variables - review
Arrays
An array can store multiple pieces of data.
They are essential for the most useful
functions of Perl. They can store data such as:
the lines of a text file (e.g. primer sequences)
a list of numbers (e.g. BLAST e values)
which array
Loops
A loop repeats a bunch of functions until it is done.
The functions are placed in a BLOCK some code
delimited with curly brackets {}
Loops are really useful with arrays.
The foreach loop is probably the most useful of all:
foreach my $base (@dnaarray) {
print "$base ;
}
Comparing strings
String comparison (is the text the same?)
eq (equal )
ne (not equal )
There are others but beware of them!
which string
where in
the string
how many
letters to take
Combining strings
Strings can be concatenated (joined).
Use the dot . operator
$seq1= ACTG;
$seq2= GGCTA;
$seq3= $seq1 . $seq2;
print $seq3;
ACTGGGCTA
Reading a file
Once the file is open, you can read from it, line by
line, using the readline <> operator again
(put the filehandle between the angle brackets)
This reads the whole file, and puts each line into the variable
$longsequence one at a time.
Writing to a File
Writing to a file is similar to reading from it
Use the > operator to open a file for writing:
open OUTPUT, >/home/class30/output.txt;
elsif
You can string a lot of ifs together using elsif
if ($site eq GAATTC {
print EcoR1 site\n;
}
elsif ($site eq CCATGG {
print BamHI site\n;
}
elsif ($site eq AAGCTT) {
print HindIII site\n;
}
else { #only happens if none of the
Subscripts
Bioinformatics data often can be made into array
format:
multi-line sequence files
Microarray or statistics data in tab delimited
format
Regular Expressions
Sounds odd, doesnt it? It means a pattern that
the computer can match, in a standard format.
Very useful in bioinformatics work
DNA patterns
restriction sites
promoters/transcription factor binding sites
intron splice site
Protein patterns
conserved domains (motifs)
active sites
structural motifs (membrane spanning, signal peptide, etc.)
A regular expression..
is a joy forever. And a pattern to match:
can be just a text string, such as:
/GATC/
/G[AT]TC/
/\/[^\/]*\/\.\./
Alternative Characters
Square brackets within the match expression allow for
alternative characters:
if
($dna =~ /CAG[AT]CAG/)
This will match an DNA string that starts with CAG; has A or T in
the 4th position, followed by another CAG.
($dna =~ /GAAT|ATTC/)
Special characters
Perl has a large set of special characters to use
in regular expressions:
the dot (.) matches any character
\d matches any digit (a number from 0-9)
\w matches any word character
(a letter or a number, not punctuation or
space)
\s matches white space (any amount)
\t matches a tab (useful for tab delimited files)
^ matches the beginning of a line
$ matches the end of a line
Knowing this makes you lots of fun at parties.
Special characters
What if you need to match text that contains a special
character?
Arent there dots at the end of sentences?
Bringing it together
So now, when you think about it, you can:
Open a file
Check whether each line of the file contains a particular
pattern
Recover part of that line
Write it out to another file
So.. isnt that what you wanted to know?
But really, its very useful combined with the UNIX
command line.
A last exercise?...
Now were getting up to speed with Perl,
lets try something more fun:
Open up a BLAST output file
Spit out the name of the query sequence,
the top hit, and how many hits there were.