Professional Documents
Culture Documents
10,11,12,12,14
10,11,12,12,14
10,11,12,12,14
NCP Pipeline"
Information Overload ,
a state of being overwhelmed by
Processing !
Solutions Approaches!
= &
Info
S
&
·
dustring
Text summerziation ,
the proces of creating a shorter
Tocative Cricut
&
Informative
Summary Summary Summary
provides more detailed information
provides a general overview of the text provides with author’s perspective on
about the text
• Identifies the main points and overall message the text and give the chance to critically
• Goes beyond main points to
develop into it
provide more context and
M explanation
• Help the reader to think more critically
about the text and develop their own
just understanding of it
more
Ento were randomly assigned to either a group that ate chocolate on
a regular basis or a group that did not eat chocolate. The
participants in the chocolate group ate 70 grams of dark
chocolate per day for 12 weeks. The participants in the control
group did not eat chocolate for the 12-week period. At the end
of the study, the participants in the chocolate group had lost an
average of 5 pounds and had smaller waistlines than the
participants in the control group.
,= • Critical summary – Overall, this is a good article that provides
valuable information about the potential benefits of eating
chocolate for weight loss. However, it is important to note that
more research is needed to confirm these findings.
Automatic Text Summerziaion (ATS) Catogries!
*
(two main Calognies) S
• Indicative summaries
• Generate sentences describing the content of the text
• Are more complex than extractive summarization
approaches, but they can
produce more informative and coherent summaries.
• Informative and critical summaries
Dimensions >
-
Single document us . mill-document
Context - QuerySpecific vs .
Query independent !
Gener C
& &151 , is Its s Absotrack Ve
⑤ g
summarizatiou summarization
Query focused,
Update g
summarization
g generating
generating summarization summaries that are
summaries for & Di faithful to the
general-
↑
generating generating meaning of the
& 3 summaries of summaries of text original document,
purpose text
documents. !
&S1051 text & documents that
contain new
but which may not
contain any of the
documents 53
! 61
· information relevant original sentences.
-
&
that are -
& to a previously
generated summary.
relevant to a 500 8:54
Ex-movie summaries, specific query. 111 Headlines
Biographics ,
minutes, to minutes
News articles ! Headlines ,
movie series
, ... movie/Tu
Summaries, series
..
research papers
Stagea &
I
b
Content Conceptual
Realization
identification Organization
involves identifying involves generating the
Jostei
important involves organizing the
information in the identified content into a Go I summary text based on
the conceptual
input text. coherent structure. As w
· - organization.
&61
• i.e. Finding relations 19990
• i.e. Extracting • i.e. Selecting existing
keywords and between pieces of sentences and/or
information and grouping generating new ones
phrases, named related ones
entities, main topic
eatualuation ③ Sechniques!
Summarization Syste - -
sid wit
&
&jps1 , 130
9053 da 11
-10
jel and
u
! masi W , !
Ratio sini
5 : 10
03 E -
-41
!
w
&
choll
Text -
Examples Spermfitering !
duss (calagony 1)
NLp Data Mining &
Document
,
duss (calagony 2) =
duss
learning
Machine
(calagony 3)
assign documents do one or more predefined Together
calogonies !
T
learning algorithms S
for TC
test
&
categories S
jg be
Boundaries are -
> decision boundaries
jiji
The rector
Space
model
us
I
weights/1 ,
misiglysic as
-
lerm Weights ;Term frequency &
do jimigi
make most common
weights/1 ,
misiglysic as
-
term Weights ; Inverse Document frequency
6
·
&
-Tf-LDf Weighting
· -
Similarity Measure ,
function that computes the degree
of similarity between two vectors!
"
·! is bi
C Similarity Measure
-
lime
!"
query &one document
at a
termel term Q
&
t I
↓. i similarity a Dice -
.
-
Normalized
2 For each
.
category Compute Prototype rector a i, versive i
19 .
&
.
3 assign text ,
Cesine Similarity s
Vectors 1
big classes
s
.
& cosine 11 id.
11 .s & ↑ is ji
calognies S1j
gijs o
Vector Space Model , Decision Boundries
are defiend by Centroi
-ks !
Centroids Computions
T
Rocchio Properties
I G
&
&
numo,
Ex
• Modeled as generating a d
bag of words for a
document in a given
category by repeatedly
sampling with replacement Gaininga
,
from a
vocabulary V = {w1, w2,…
wm} based on the
probabilities P(wj | ci). le gi
• Smooth probability
estimates with Laplace
&
= Dir
! De
m-estimates assuming a
uniform distribution
over all words (p = 1/|V|)
and m = |V|
Chol2
Clusters Approches 22
(1
A Commerative
g &
vs
. Divisive
&
custring
(Bottom-up (partitional &
top down(
!j 99
/
1019 9
/
--
5
clustering evaluation function -> (كانت االفضل )افضل عدد من الكلسترز للتقسيم & انه يقيم ايت قيمة لــ
3
Cluster
↳
Similiarity
. &
Similarity of two most similar Similarity of two least similar Average similarity between
members. members. members.
! gi
Bir two points !
I
Issues in Clustening ! 5
Internal External
• Tightness and separation • Compare to
of clusters (e.g. k-means known class
objective) labels on
benchmark data
• Fit of probabilistic model
to data
Cho13
It is good for
est Bad for
3334 9
33 3 9
web pages Emails Utas first Pass Computer-aided literature Meefings/court Madical War
human translation translation Tactics
recordings
1915005xi migl -
,
"
in hospital
postediding !
challanges of It
&
S b &
To verb its y ms
↳
– Examples: Weather
forecasting, air travel queries,
restaurant recommendation.
Language Divergence
4
Typology – is the study of systematic cross-linguistic similarities and divergences
– Study of the structure of the world’s languages for the purpose of classification, comparisons, and analysis.
·
– Morphological variation – differences in word structures (morphemes)
– Syntactic variation - differences in sentence structures ↳! se ,
pi
– Semantic variation - differences in meaning of words and phrases
– Segmentation variation - differences in sound patterns
– Inferential load - differences in context inferences and assumptions
Lexical Divergences!Ge
*
elt Approaches -
S
*
Rule-Based (RBMH)
is a traditional approach to machine
translation that utilizes a set of manually
crafted (hand-written) rules to translate
text from one language to another.
· R11j5
! ] NJ 9.
Is
parec translation -
Iransfer Model
=>
Interlingua
based ut
involves directly translating words involve translating the source
from the source language to the language text into an intermediate utilizes a semantic intermediate
target language using a bilingual language, then into the target representation of the source
dictionary or word list. language. language called interlingua, which is
- .
Die Ness ↑s j5
used to generate the target
Steps:
– also known as word-for-word language text using language-
translation &S specific rules.
– is a straightforward approach but 9
1920
capture the meaning of the "source
often produces inaccurate S • Analysis: Syntactically
translations due to differences in 3) parse Source language
u word ~ test
grammar, order, and idiomatic
-
:;%11 S
this parse into parse for formal
Target language
• Generation: Generate stiwig si ,
Semantic Lasid I
– Example: Target sentence from represention
• Input sentence in English: The cat sits on the mat parse tree
• Output sentence in German: Der Katze sitzt auf der Matte
Steps ;
Pros vis • Translate source
– can produce more accurate translations sentence into
than direct translation using language- meaning
specific rules at each stage of the representation
translation process • Generate target
Rule ! &
Direct/ig
·
sentence from
meaning
representation
Human evaluation
Need an evaluation metric that takes
is expensive
& very
! metrics se ,58 ,
%soviii) Seconds Not months !
Slow !
,
S S 5
Speach -
the most common analog signal produced by
humans
·
Text Transcript
&
feature
S
Acoustic signal language Acoustic Language The final product, the text
version of the spoken
Extraction modeling modeling language, is the result of
the previous stages.
captured through This stage
microphones, the specific spoken Identifying the This stage
language, with its decodes the decodes the
contains the relevant aspects sequence of
vocabulary, grammar, of the acoustic sequence of
speaker's voice but and pronunciation sounds sounds • Accuracy and fluency of
can also be mixed signal that (phonemes) from
rules, guides the represent the (phonemes) sentences are crucial
with background interpretation of the the extracted from the • but factors like keywords,
noise and other spoken words, features,
acoustic signal. like pitch, extracted punctuation and speaker
environmental essentially features, identification
factors. formants, and recognizing the
spectral energy. essentially can also be part of the
building blocks of recognizing output.
speech. the building
↑
blocks of
• Microphone: close- e speech.
mic, throat-mic, gi 8/15i
↳
microphone array
• Sources: band-
↳S MAP2
limited, background -12/ :
noise
• Speaker: speaker
dependent, speaker
independent
Dictonary ! Jim
"
. 50