Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 13

Ministerul Educatiei al Republicii Moldova

Facultatea Calculatoare Informatica si


Microelectronica
Catedra Informatica Aplicata

Raport
Lucrare de Laborator Nr.1,2,3
Tema:Analiza Textului
La

A efectuat: Ursu Ion


Grupa: MAI-131
A examinator:

Chiinu 2014

Am folosit un fragment din lucrarea:

Ligeia
by Edgar Allan Poe
(published 1838)
I CANNOT, for my soul, remember how, when, or even precisely where, I first became
acquainted with the lady Ligeia. Long years have since elapsed, and my memory is feeble
through much suffering. Or, perhaps, I cannot now bring these points to mind, because, in truth,
the character of my beloved, her rare learning, her singular yet placid cast of beauty, and the
thrilling and enthralling eloquence of her low musical language, made their way into my heart by
paces so steadily and stealthily progressive that they have been unnoticed and unknown. Yet I
believe that I met her first and most frequently in some large, old, decaying city near the Rhine.
Of her family --I have surely heard her speak. That it is of a remotely ancient date cannot be
doubted. Ligeia! Ligeia! in studies of a nature more than all else adapted to deaden impressions
of the outward world, it is by that sweet word alone --by Ligeia --that I bring before mine eyes in
fancy the image of her who is no more. And now, while I write, a recollection flashes upon me
that I have never known the paternal name of her who was my friend and my betrothed, and who
became the partner of my studies, and finally the wife of my bosom. Was it a playful charge on
the part of my Ligeia? or was it a test of my strength of affection, that I should institute no
inquiries upon this point? or was it rather a caprice of my own --a wildly romantic offering on
the shrine of the most passionate devotion? I but indistinctly recall the fact itself --what wonder
that I have utterly forgotten the circumstances which originated or attended it? And, indeed, if
ever she, the wan and the misty-winged Ashtophet of idolatrous Egypt, presided, as they tell,
over marriages ill-omened, then most surely she presided over mine.

Programu1 1:
$line = "For eyes we have no models in the remotely antique.";
$target = "have" ;
$position = index($line, $target,0 ) ;
print "The word \"$target\" i s at position $position. ";
Rezultaul programului 1:

Programu1 2:
$line = "Dog, dog, dog, dogest.";
$target ="dog" ;
while ( $line =~ /$target/gi ) {
$pos = pos($line) ;
print " $target $pos" ;
}
Rezultaul programului 2:

Programul 3:
open (FILE, "C://Perl//bin//Mi101//poveste.txt" ) or die("Fisierul nu a fost gasit");
$/=""; # Paragraph mode for first while loop
# Initialize variables
$target ='\b(the)\b' ;
$radius = 20;
$width = 2*$radius; # Width of extract without target
# First while loop
while (<FILE>) {
chomp;
s/\n/ /g; # Replace newlines by spaces
s/--/ -- /g; # Add spaces around dashes
# Second while loop
while ( $_ =~ /$target/gi ) {
$match = $1;
$pos = pos($_);
$start = $pos - $radius - length($match);
if ( $start< 0) {
$extract = substr($_, 0 , $width+$start+length($match));
$extract = ( " " x - $start ).$extract;
$len = length($extract);
} else {
$extract = substr($_,$start , $width+length($match));
}
# Print the next concordance line
print "$extract\n";
}
}

Rezultaul programului 3:

Programul 4:
open (FILE,"C://Perl//bin//Mi101//poveste1.txt" ) or die("Fisierul nu a fost gasit");
$/= ""; # Paragraph mode for while loop
$regex =4
' (["\']{0,2}[a-zA-Z][^.?!]*["\']{0,2}[.?!]["\']{0,2}\s*)';
while ($_ = <FILE>) {
chomp;
s/\n/ /g;
# Replace newlines by spaces
s/--/ -- /g;
# Add spaces around dashes
s/Mr\./Mr/g;
# Remove period in Mr.
s/Mrs\./Mrs/g; # Remove period in Mrs.
$buffer = "";
while ( $_=~ /$regex/g ){
$match = $1;
if ( $' =~ /^"?[A-Z]/ ){ # Check for capital letter
print "$buffer$match\n" ; # Print sentence
$buffer = "" ;
} else {
$buffer .= $match;
if ($' =~ /^\w*$/) { # Check for end of paragraph
print "$buffer\n"; # Print sentence
}
}
}
print "\n" ;
}

Rezultaul programului 4: Replace newlines by spaces

Add spaces around dashes:

Remove period in I.

Remove period in Ligeia.

Programul 5:
#USAGE > perl regex-concordance.pl FileName.txt Regex Radius
# This program is case insensitive.
open (FILE,"$ARGV[0]" ) or die("$ARGV[0] not found");
$/ = ""; # Paragraph mode for first while loop
# Initialize variables
$target = "($ARGV[1])"; # Parentheses needed for $1 in 2nd while
$radius = $ARGV[2] ;
$width = 2*$radius;
while (<FILE>) {
chomp;
s/\n/ /g; # Replace newlines by spaces
s/--/ -- /g;
while ( $_ =~ /$target/gi ) {
$match = $1;
$pos = pos($_);
$start = $pos - $radius - length($match);
if ($start < 0) {
$extract = substr($_, 0, $width+$start+length($match)) ;
$extract = (" " x -$start) . $extract;
$len = length($extract);
} else {
$extract = substr($_, $start, $width+length($match));
}
print "$extract\n";
}
}

Rezultaul programului 5: which prints out extracts containing spaces.

Programul 6:
open(FILE, "C://Perl//bin//Mi101//poveste1.txt") or die("Text not found");
open(OUT, "C://Perl//bin//Mi101//poveste2.csv");
while (<FILE>) {
chomp;
$_ = lc;
# Convert to lowercase
s/--/ /g;
# Replace dash with space
s/ +/ /g;
# Replace multiple spaces to one space
s/[.,:;?"!()]//g;
# Remove punctuation (except ')
@words = split(/ /);
foreach $word (Qwords) {
if ( /(\w+)'\W/ ) {
if ( $1 eq 'Ligeia' ){
$word =~ s/'//g;
# Remove single quote
}
}
if ( /\W'(\w+)/){
if ( ($1 ne 'change') and ($1 ne 'em') and
($1 ne 'prentices')){
$word =~ s/'//g;
# Remove single quote
}
}
$dictionary{$word} += 1; # Increment count for $word
}
}

foreach $word (sort byDescendingValues keys %dictionary){


print OUT "$word, $dictionary{$word}\n" ;
}
sub byDescendingValues{
$value = $dictionary{$b} <=> $dictionary{$a};
if ($value == 0) {
return $a cmp $b;
} else {
return $value;
}
}
Rezultaul programului 6:

Programul 7:
open (FILE, "C://Perl//bin//Mi101//poveste95.txt") or die;
$nstory = 5 ; # Counter for number of stories
while (<FILE>) { # Read the file
chomp;
if ( $_ =~ /<TITLE> *(.*) *<\/TITLE>/ ) {
$name[++$nstory] = $1; # Save the name of the story
print "$1 detected.\n" ;
} else {
$_ = lc;
# Convert to lower case
s/ -- / /g;
# Remove double hyphen dashes
s/ - / /g; # Remove single hyphen dashes
s/ +/ /g;
# Replace multiple spaces with one space
s/[.,:;?"!_()\[\]]//g; # Remove punctuation
@words = split;
foreach $w (@words){
++$dict[$nstory]{$w}; # Array of hashes for each story
++$combined{$w};
# Hash with all words
}
}
}
@count = undef; # Clear this array
for $i ( 0 .. $#dict-1 {
for $j ($i .. $#diet) {
foreach $word (keys %combined) {
if ( exists($dict [$i]{$word}) and
not exists($dict [$j]{$word3})
{ ++$count [$i] [$j] }
# $len = length of the word in $ARGV[O]
# The keys of %list are from a word list
for ($i = 0;
$i ($len; ++$i) {
foreach $letter ( a .. 2 {
$word = $ARGV[O] ;
substr($word, $i, 1) = $letter;
if ( exists($list($word)) and $word ne $ARGV[O]){
}
}
}
# Compute number of words in common between any two stories
for $i ( 0 .. $#dict ) {
for $j (0 .. $#dict) {
foreach $word (keys %combined) {
if ( exists ($dict[$i]{$word})and exists($dict [$j]{$word}))

{ ++$count [$i][$j] }
}
}
}
# Print results
for $i ( 0 .. $#dict ) {
print "@{$count [$i]}\n" ;
print $word\n
}
}
}
}
Rezultaul programului 7:

Programul 8:
# EXAMPLE: per1 concordance.pl regex radius
open (FILE, "C://Perl//bin//Mi101//poveste1.txt") or die("Fi1e not found");
$/ = "";
$target = "($ARGV[O])";
$radius = $ARGV[1] ;
$width = 2*$radius;
$count = 0;
while (<FILE>) {
chomp;

s/\n/ /g;
# Replace newlines by spaces
s/\b--\b/-- /g; # Add spaces around dashes adjacent to words
while ( $_=~ /$target/gi){
$match = $1;
$pos = pos;
$start = $pos - $radius - length($match);
# Extracts are padded with spaces if needed
if ($start < 0) {
$extract = substr($_, 0, $width+$start+length($match));
$extract = (" " x -$start) . $extract;
} else {
$extract = substr($_, $start, $width+length($match));
$deficit = $width+length($match) - length($extract);
if ($deficit > 0) {
$extract .= (" " x $deficit);
}
}
$lines [$count] = $extract;
++$count;
}
}
$line_number = 0;
foreach $x (@lines) {
++$line_number;
printf "%5d", $line_number;
print " $x\n";
}
sub removePunctuation {
# USAGE: $unpunctuated = removePunctuation($string);
my $string = $_[o] ;
$string = Ic($string); # Convert to lowercase
$string =~ s/[^-a-z ]//g; # Remove non-alphabetic characters
$string =~ s/--+/ /g; # Replace 2+ hyphens with a space
$string =~ s/-//g; # Remove hyphens
$string =~ s/\s+/ /g; # Replace whitespaces with a space
return($string);
}
sub byMatch {
my $middle_a = substr($a, $radius, length($a) - 2*$radius);
my $middle_b = substr($b, $radius, length($b) - 2*$radius) ;
$middle_a = removePunctuation($middle_a);
$middle_b = removePunctuation($middle_b) ;
$middle_a cmp $middle_b
}

Rezultaul programului 8: Representative lines containing I. 6.4

Programul Sorting by array


$data = "Four score and seven years ago";
@words = split(/ /, $data);
@sorted-words = sort(@words); # Sort the words
print "@sorted-words\n" ;
@numbers = (1, 8, 11, 18, 88, 111, 118, 181, 188);
@sorted-numbers = sort(@numbers); # Sort the numbers
print "@sorted-numbers" ;
Rezultaul programului 3.13: Sorting by Array

Am adaugat codul sursa 6.3 si 6.2. am optinut rezultatul. A function to remove punctuation.
6.2 Code to print out the concordance lines found

Conculuzie:
Perl este un limbaj de programare general iniial dezvoltat pentru manipularea de text i n
prezent folosit pentru o gam larg de aplicaii inclusiv administrarea de sisteme, dezvoltare web,
aplicaii de reea, interfee grafice i altele.
Limbajul este practic, uor de folosit, eficient, complet. Caracteristicile principale sunt u urin a
de utilizare, suport pentru programare procedural i obiectual, are integrat un suport puternic
pentru procesarea de text i o colecie mare de module provenite de la teri
In urma efectuarei acestei lucrari de laborator ni-am familiarizat cu limbaj .Am folosit diferite
functi si am lucrat cu studierea textului.

You might also like