Professional Documents
Culture Documents
For Seminar
For Seminar
Presented by:
Ansh Abhay Balde
715EE4018
1
Outline
• Summarize where the field is and explain what future work would be covered.
2
Introduction
• Information retrieval (IR) is the activity of obtaining information
system resources relevant to an information need from a collection of information
resources. Searches can be based on full-text or other content-based indexing.
• One of the best examples for the usage of IR is in search engines like Google,
Yahoo and Yandex (Its quite popular in Russia since Google is banned there.)
xi,j
target: y y1 y2 y3
cost function
output/prediction: y^ ^y ^y ^y e.g.: 1/2 (y^- y)2 φ: activation function
1 2 3
e.g.: sigmoid
1
1 + e-o
output layer
weights
hidden layer j
weights w 1,4
xi-1, 1 xi-1, 4
x1 x3 x4 xi-1, 2 xi-1, 3
input: x x2
node j at level i
4
Back propagation y1 y2 y3
• Until convergence: cost function
e.g.: 1/2 (y^- y)2 ^y ^y ^y
i. do a forwardpass 1 2 3
∂cost ∂x i,j
= −α input: x x1 x2 x3 x4
∂x i,j ∂wi,j
∂cost ∂x i,j ∂oi,j
= −α cost
7
Language modeling using RNNs
9
Long Short Term Memory [Hochreiter and Schmidhuber, 1997]
LSTMs designed to combat vanishing gradients through gating mechanism
Image credits: [Sutskever et al., 2014; Cho et al., 2014; Bahdanouet al., 2014]
11
Sequence-to-sequence models
Used for a “traditional information retrieval task”
12
Convolutional neural networks
Major breakthroughs in image classification – at core of many computer visions systems
Some initial applications of CNNs to problems in text and informationretrieval
What is a convolution? Intuition: sliding window function applied to a matrix
Example: convolution with 3 × 3 filter
Multiply values element-wise with original matrix, then sum. Slide over whole matrix.
Image credits: h t t p : / / d e e p l e a r n i n g . s t a n f o r d . e d u / w i k i / i n d e x . p h p / F e a t u r e _ e x t r a c t i o n _ u s i n g _ c o n v o l u t i on
13
Convolutional neural networks
e Use convolutions over input layer to e Image classification a CNN may learn to
compute output detect edges from raw pixels in first layer
e Yields local connections: each region of e Then use edges to detect simple shapes in
input connected to a neuron in output second layer
e Each layer applies different filters and e Then use shapes to detect higher-level
combines results features, such as facial shapes in higher
layers
e Pooling (subsampling) layers
e Last layer is then a classifier that uses
e During training, CNN learns values of
high-level features
filters
14
CNNs in text
Example uses in IR
e MSR: how to learn semantically meaningful representations of sentences that can
be used for Information Retrieval
e Recommending potentially interesting documents to users based on what they are
currently reading
e Sentence representations are trained based on search engine log data
e Gao et al. Modeling Interestingness with Deep Neural Networks. EMNLP 2014;
Shen et al. A Latent Semantic Model with Convolutional-Pooling Structure for
Information Retrieval. CIKM 2014.
15
Future Presentations
• We’ll learn about learn text matching. How queries are matched with the
documents
• How Neural Click Model is used in practice to improve results over Probability
Click Models?
16
References
• Qingyao Ai, Liu Yang, Jiafeng Guo, and W. Bruce Croft. 2016a. Analysis of the Paragraph Vector Model for Information Retrieval. In ICTIR. ACM,
133–142.
• Qingyao Ai, Liu Yang, Jiafeng Guo, and W Bruce Croft. 2016b. Improving language estimation with the paragraph vector model for ad-hoc retrieval.
In SIGIR. ACM, 869–872.
• Qingyao Ai, Yongfeng Zhang, Keping Bi, Xu Chen, and Bruce W. Croft. 2017. Learning a Hierarchical Embedding Model for Personalized Product
Search. In SIGIR.
• Nima Asadi, Donald Metzler, Tamer Elsayed, and Jimmy Lin. 2011. Pseudo test collections for learning web search ranking functions. In SIGIR.
ACM, 1073–1082.
• Leif Azzopardi, Maarten de Rijke, and Krisztian Balog. 2007. Building simulated queries for known-item topics: An analysis using six European
languages. In SIGIR. ACM.
• Marco Baroni, Georgiana Dinu, and Germ´an Kruszewski. 2014. Don’t count, predict! A systematic comparison of context-counting
vs. context-predicting semantic vectors.. In ACL (1). 238–247.
• Yoshua Bengio and Jean-S´ebastien Sen´ecal. 2008. Adaptive importance sampling to accelerate training of a neural probabilistic language model.
IEEE Transactions on Neural Networks 19, 4 (2008), 713–722.
• Yoshua Bengio, Jean-S´ebastien Sen´ecal, and others. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling.. In AISTATS.
• Richard Berendsen, Manos Tsagkias, Wouter Weerkamp, and Maarten de Rijke. 2013. Pseudo test collections for training and tuning
microblog rankers. In SIGIR. ACM, 53–62.
• David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. JMLR 3 (2003), 993–
1022. Antoine Bordes and Jason Weston. 2017. Learning end-to-end goal-oriented dialog. ICLR (2017).
• Alexey Borisov, Ilya Markov, Maarten de Rijke, and Pavel Serdyukov. 2016. A neural click model for web search. In Proceedings of the 25th
International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 531–541.
• Chris Burges. 2015. RankNet: A ranking retrospective. (2015).
h t t p s : / / w w w . m i c r o s o f t . c o m / e n - u s / r e s e a r c h / b l o g / r a n k n e t - a - r a n k i n g - r e t r o s p e c t i v e / Accessed January 15,2019.
17