Professional Documents
Culture Documents
Perl and Bioperl
Perl and Bioperl
CONTROL STRUCTURES
Syntax
for (START; STOP; ACTION) { BODY }
Initially execute START statements once.
Repeatedly execute BODY until STOP is false.
Execute ACTION after each iteration.
Example
for ($i=0; $i<10; $i++) {
print(“Iteration: $i\n”);
}
THE FOREACH STATEMENT
Syntax
foreach SCALAR ( ARRAY ) { BODY }
Assign ARRAY element to SCALAR.
Execute BODY.
Repeat for each element in ARRAY.
Example
asTmp = qw(One Two Three);
foreach $s (@asTmp){$s .= “sy ”;}
print(@asTmp); # Onesy Twosy Threesy
CONTROL STRUCTURES
“while” loop
while (condition) { code }
$cars = 7;
while ($cars > 0) {
print “cars left: ”, $cars--, “\n”;
}
while ($game_not_over) {…}
CONTROL STRUCTURES
Bottom-check Loops
do { code } while (condition);
do { code } until (condition);
$value = 0;
do {
print “Enter Value: ”;
$value = <STDIN>;
} until ($value > 0);
SUBROUTINES (FUNCTIONS)
Defining a Subroutine
sub name { code }
Arguments passed in via “@_” list
sub multiply {
my ($a, $b) = @_;
return $a * $b;
}
Last value processed is the return value
(could have left out word “return”, above)
SUBROUTINES (FUNCTIONS)
Calling a Subroutine
subname; # no args, no return value
subname (args);
retval = &subname (args);
The “&” is optional so long as…
subname is not a reserved word
subroutine was defined before being called
SUBROUTINES (FUNCTIONS)
Passing Arguments
Passes the value
Lists are expanded
@a = (5,10,15);
@b = (20,25);
&mysub(@a,@b);
this passes five arguments: 5,10,15,20,25
mysub can receive them as 5 scalars, or one array
SUBROUTINES (FUNCTIONS)
Examples
sub good1 {
my($a,$b,$c) = @_;
}
&good1 (@triplet);
sub good2 {
my(@a) = @_;
}
&good2 ($one, $two, $three);
DEALING WITH HASHES
chop
chop(VARIABLE)
chop(LIST)
index(STR, SUBSTR)
length(EXPR)
STRING MANIPULATION (CONT.)
Example: string.pl
PATTERN MATCHING
$0 = program name
@ARGV array of arguments to program
Example
yourprog -a somefile
$0 is “yourprog”
$ARGV[0] is “-a”
$ARGV[1] is “somefile”
BASIC FILE I/O
Reading a File
open (FILEHANDLE, “$filename”) || die \ “open of
$filename failed: $!”;
while (<FILEHANDLE>) {
chop $_; # or just: chop;
print “$_\n”;
}
close FILEHANDLE;
BASIC FILE I/O
Writing a File
open (FILEHANDLE, “>$filename”) || die \ “open of
$filename failed: $!”;
while (@data) {
print FILEHANDLE “$_\n”;
# note, no comma!
}
close FILEHANDLE;
BASIC FILE I/O
#
# Get Day Of Month Package
#
package getDay;
sub main::getDayOfMonth {
local ($sec, $min, $hour, $mday) = localtime;
return $mday;
}
1; # otherwise “require” or “use” would fail
PACKAGES
use List::Util;
my @list = 10..20;
my $sum = List::Util::sum(@list);
print “sum (@list) is $sum\n”;
use List::Util qw(shuffle sum);
my $sum = sum(@list);
my @list = (10,10,12,11,17,89);
print “sum (@list) is $sum\n”;
my @shuff = shuffle(@list);
print “shuff is @shuffle\n”;
MODULE NAMING
my $adder3 = MyAdder->new(75);
$adder3->add(7);
print $adder3->value, “\n”;
WRITING A MODULE: INSTANTIATION
Starts with package to define the module name
multiple packages can be defined in a single module file -
but this is not recommended at this stage
The method name new is usually used for instantiation
bless is used to associate a datastructre with an object
WRITING A MODULE: SUBROUTINES
The first argument to a subroutine from a module is
always a reference to the object - we usually call it ‘$self’
in the code.
This is an implicit aspect Object-Oriented Perl
sub new {
my ($package, $val) = @_;
$val ||= 0;
my $obj = bless { ‘value’ => $val}, $package;
return $obj;
}
sub add {
my ($self,$val) = @_;
$self->{’value’} += $val;
}
sub value {
my $self = shift;
return $self->{’value’};
}
WRITING A MODULE II (ARRAY)
package MyAdder;
use strict;
sub new {
my ($package, $val) = @_;
$val ||= 0;
my $obj = bless [$val], $package;
return $obj;
}
sub add {
my ($self,$val) = @_;
$self->[0] += $val;
}
sub value {
my $self = shift;
return $self->[0];
}
USING THE MODULE
my $url = 'http://us.expasy.org/uniprot/P42003.txt';
my $ua = LWP::UserAgent->new(); # initialize an object
$ua->timeout(10); # set the timeout value
my $response = $ua->get($url);
if ($response->is_success) {
# print $response->content; # or whatever
if( $response->content =~ /DE\s+(.+)\n/ ) {
print "description is '$1'\n";
}
if( $response->content =~ /OS\s+(.+)\n/ ) {
print "species is '$1'\n";
}
}
else {
die $response->status_line;
}
OVERVIEW OF BIOPERL TOOLKIT
Bioperl is...
A Set of Perl modules for manipulating gnomic and other
biological data
An Open Source Toolkit with many contributors
A flexible and extensible system for doing bioinformatics
data manipulation
SOME THINGS YOU CAN DO
Read in sequence data from a file in standard formats
(FASTA, GenBank, EMBL, SwissProt,...)
Manipulate sequences, reverse complement, translate
coding DNA sequence to protein.
Parse a BLAST report, get access to every bit of data in
the report
Dr. Mikler will post some detailed tutorials
MAJOR DOMAINS COVERED
Bibliographic data
Taxonomy
Protein Structure
SEQUENCE FILE FORMATS
Bio::SeqIO
multiple drivers: genbank, embl, fasta,...
Sequence objects
Bio::PrimarySeq
Bio::Seq
Bio::Seq::RichSeq
LOOK AT THE SEQUENCE OBJECT
@com=$annotation-> get_Annotations(’comment’)
$annotation-> add_Annotation(’comment’,$an)
ANNOTATIONS
Annotation::Comment
comment field
Annotation::Reference
author,journal,title, etc
Annotation::DBLink
database,primary_id,optional_id,comment
Annotation::SimpleValue
CREATE A SEQUENCE OUT OF THIN AIR
use Bio::Seq;
my $seq = Bio::Seq->new(-seq => ‘ATGGGTA’,
-display_id => ‘MySeq’,
-description => ‘a description’);
print “base 4 is “, $seq->subseq(4,5), “\n”;
print “my whole sequence is “,$seq->seq(), “\n”;
print “reverse complement is “,
$seq->revcom->seq(), “\n”;
READING IN A SEQUENCE
use Bio::SeqIO;
my $in = Bio::SeqIO->new(-format => ‘genbank’,
-file => ‘file.gb’);
while( my $seq = $in->next_seq ) {
Database: wormpep87
20,881 sequences; 9,238,759 total letters.
Searching....10....20....30....40....50....60....70....80....90....100% done
Smallest
Sum
High Probability
Sequences producing High-scoring Segment Pairs: Score P(N) N
……
USING THE SEARCH::RESULT OBJECT
use Bio::SearchIO;
use strict;
my $parser = new Bio::SearchIO(-format => ‘blast’, -file => ‘file.bls’);
while( my $result = $parser->next_result ){
print “query name=“, $result->query_name, “ desc=”,
$result->query_description, “, len=”,$result->query_length,“\n”;
print “algorithm=“, $result->algorithm, “\n”;
print “db name=”, $result->database_name, “ #lets=”,
$result->database_letters, “ #seqs=”,$result->database_entries, “\n”;
print “available params “, join(’,’,
$result->available_parameters),”\n”;
print “available stats “, join(’,’,
$result->available_statistics), “\n”;
print “num of hits “, $result->num_hits, “\n”;
}
USING THE SEARCH::HIT OBJECT
use Bio::SearchIO;
use strict;
my $parser = new Bio::SearchIO(-format => ‘blast’, -file => ‘file.bls’);
while( my $result = $parser->next_result ){
while( my $hit = $result->next_hit ) {
print “hit name=”,$hit->name, “ desc=”, $hit->description,
“\n len=”, $hit->length, “ acc=”, $hit->accession, ”\n”;
print “raw score “, $hit->raw_score, “ bits “, $hit->bits,
“ significance/evalue=“, $hit->evalue, “\n”;
}
}
TURNING BLAST INTO HTML
use Bio::SearchIO;
use Bio::SearchIO::Writer::HTMLResultWriter;
sub intro_with_overview {
my ($result) = @_;
my $f = &generate_overview($result,$result->{"_FILEBASE"});
$result->rewind();
return sprintf(
qq{
<center>
<b>Hit Overview<br>
Score: <font color="red">Red= (>=200)</font>, <font color="purple">Purple 200-
80</font>, <font color="green">Green 80-50</font>, <font color="blue">Blue 50-40</font>,
<font color="black">Black <40</font>
MULTIPLE SEQUENCE ALIGNMENTS
Bio::AlignIO to read alignment files
Produces Bio::SimpleAlign objects
use Bio::Perl;
#!/usr/bin/perl -w
use strict;
use Bio::DB::GenPept;
use Bio::DB::GenBank;
use Bio::SeqIO;
my $acc = ‘AB077698’;
my $seq = $db->get_Seq_by_acc($acc);
if( $seq ) {
$out->write_seq($seq);
} else {
print STDERR "cannot find seq for acc $acc\n";
}
$out->close();
SEQUENCE RETRIEVAL FROM LOCAL
DATABASE
use Bio::DB::Flat;
$db->make_index(’/data/protein/swissprot’);
my $seq = $db->get_Seq_by_acc(’BOSS_DROME’);