Professional Documents
Culture Documents
Transfer Learning Using PNN
Transfer Learning Using PNN
D Ravi Shankar
December, 2017
4 Datasets:
6 Experiments
IMBD: A large dataset for binary sentiment classifi-
cation (positive vs. negative) - 25k sentences. 6.1 Experiment 1:
MR: A small dataset for binary sentiment classifica- For this experiment we have used LSTM architecture.
tion ∼ 10k sentences. We have trained the model on IMDB and then trans-
QC: A (small) ) dataset for 6-way question classifica- ferred the weights to MR and QC datasets and the
tion (e.g., location, time, and number) ∼ 5000 ques- results are shown in Table 1.When we have trans-
tions. ferred the parameters from IMDB to MR, the accuracy
SNLI: A large dataset for sentence entailment recog- has improved by 1.95% and from IMDB to QC, there
nition. The classification objectives are entailment, isn’t much change in accuracy. The reason for this is
contradiction, and neutral ∼ 500k pairs. that IMDB and MR are semantically similar datasets
SICK: A small dataset with exactly the same classifi- whereas IMDB and QC are semantically different.
cation objective as SNLI ∼ 10k pairs.
MSRP: A (small) dataset for paraphrase detec-
tion.The objective is binary classification: judging Paper[1] Paper[1] Without With
whether two sentences have the same meaning ∼ 5000 Dataset (without (with Trans- Trans-
pairs. Transfer) Transfer) fer fer
Quora dataset: It contains duplicate questions pairs IMDB 87.0 - 84.10 -
with labels indicating whether the pair of questions MR 75.1 81.4 79.21 81.16
request the same information ∼ 400k question pairs. QC 90.8 93.2 96.93 96.90
8 Conclusions:
1. Transfer Learning is successful when we are dealing
with semantically similar tasks.
2. It is helpful when the target dataset is small.
4. INIT method is performing slightly better com-
Figure 2: Comparison of INIT and MULT. Graphs pared to MULT.
represent accuracy on Quora(target) dataset for dif- 3. Transfer Learning also depends on what layers we
ferent transfer learning schemes. are transferring.
5. Are we losing general information if the model is
trained on source data for best accuracy? The an-
swer seems to be NO as evident from Figure 3.The
accuracy on Quora dataset peaked along with that of
SNLI dataset.
References
[1] Lili Mou, Zhao Meng, Rui Yan, Ge Li, Yan Xu,
Lu Zhang, Zhi Jin. How Transferable are Neural
Networks in NLP Applications?. In Proceedings of
the 2016 Conference on Empirical Methods in Nat-
ural Language Processing (EMNLP), pages 478–
489, 2016.
Figure 3: Effect of over-fitting source dataset.Graphs
show how the accuracy on Quora(target) dataset [2] Zhilin Yang, Ruslan Salakhutdinov, William W.
varies with the accuracy of the SNLI(source) dataset. Cohen. Transfer Learning for Sequence Tagging
with Hierarchical Recurrent Networks. ICLR 2017.