Week 1 - Tutorial: Open Source & Git: Sebastian Ebert April 16, 2015

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

week 1 - tutorial: open source & git

Sebastian Ebert
April 16, 2015

0
Today

1
· Organizational things
· Open source project structure
· Gitlab & Git
· Resources

2
Organizational Things

3
· new room for tutorials (TBD)
· warning: programming skills required
· Master students: email your name, matriculation number, and
email address to us
· got your Gitlab password?
· check at https://webmail2.cip.ifi.lmu.de
· if necessary, set up email forwarding in CipConf

· students writing Bachelor’s thesis: due date for grade?

4
Open Source Project Structure

5
· src/: source code
· 01
· 02
· project

· res/: static (external) resource files


· var/: ever-changing files, e.g. logs
· etc/: configuration files

6
· lib/: external libraries
· perl/
· python/
· java/
· …

· build/: compiled binaries


· bin/: executables, e.g. shell script wrapper
· test/: code for (unit) tests

7
· doc/: documentation
· README.md: Markdown-formatted readme
· Makefile: routines for compiling and/or installing (even script
projects!)
· LICENSE: http://choosealicense.com
· .gitignore: https://www.gitignore.io

8
Git & Gitlab

9
Why Version Control?

· collaboration of multiple project members from different places


· keep track of changes in project
· roll back changes easily
· keep different source versions with branching

10
Why Git?

Benefits

· decentralized/distributed revision control


· developers do not need to share a common network
· work off-line until you want to publish your code

· widely used
· fast and easy branching and merging
· free and open source

11
Drawbacks

· steep learning curve


· (weird command names)
· maybe at first
· but then you learn where they come from

12
Why Gitlab?

· Easy collaboration: branches, bug reports (issues), bug


assignments
· Easy inspection for instructors
· Documentation support (markdown, Wiki)
· Everyone already has an account.

13
Task 1: Groups

· form a group
· 2-3 people
· mix skill levels

14
Task 2: Setup Gitlab

1. Before first use: activate your CIP Gitlab account on CipConf.


2. CIP Gitlab: https://gitlab.cip.ifi.lmu.de
3. Create your SSH key (necessary for all your PCs)
4. Create a new project “ap-[GROUP NAME]”.

15
5. Go to Settings -> Members and add your group members with
Developer or Master privileges.
6. Give us (the instructors) access by
· either making the project public at Project -> Visibility Level
· or adding us (David Kaumanns, Sebastian Ebert) as new members
with Reporter privileges.

7. Email us the link to the project repository, the group name and
your email addresses.

16
Git Walk Through

17
Task 3: Exercise

1. Create the skeleton directory structure.


2. Create a simple Hello world app in your designated programming
language, along with an executable and a basic readme and/or
Makefile to compile (if necessary) and run.
3. Stage, commit, push.
4. Tag the correct commit hash with name “ex_01”

18
Idea of Git

Figure 1:
http://www.git-scm.com/book/en/v2/Getting-Started-Git-Basics 19
Let’s do it

· see shell script git_handson.sh for all commands and walk


through

20
Commands you Need

· Clone your project:


· git clone git@gitlab.cip.ifi.lmu.de:<user
name>/<project name>.git

21
· Do your changes.
· Stage your changes (i.e. tell Git that they exist):
· git add <file name|patterns>

22
· Commit your changes to your local repository:
· git commit -am "initial commit"
· -a (--all): automatically stage files that have been modified and
deleted (only those Git already knows).
· -m (--message): use an inline commit message.

· Push your changes to the remote repository:


· git push
· For first push: git push -u origin master

· Do more changes. Repeat: stage, commit, push.

23
· Do fresh pulls regularly:
· git pull

· Check your status:


· git status

· Check differences
· git diff
· if problems with color occur: git config --global core.pager
"less -r"

24
· Review previous commit
· git show

· Use an alias for nicely formatted logs:


· git config --global alias.lga "log
--pretty=format:'%C(auto)%h %C(110)%ad%Creset%C(auto)%d
%s' --graph --date=short --all"
· git lga

25
Branching

26
· Create new branch:
· git branch awesome-feature

· Switch to new branch:


· git checkout awesome-feature
· (Shorthand for last two steps: git checkout -b awesome-feature)

27
Ready to merge your new feature into the master branch?

· Switch to the master branch:


· git checkout master

· Merge your branch into the current one (master):


· git merge awesome-feature

· Delete the deprecated branch:


· git branch -d awesome-feature

28
Words to remember

· HEAD: pointer to your current position in the history


· ORIGIN: original remote repository
· master: the (hopefully) stable master branch
· upstream: usually means the remote repository, i.e., where the
code is coming from (“up the stream”) or where you push to and
pull from
· fast-forward: moving the HEAD pointer forward in history
· .gitignore: list of stuff to ignore

29
Great Git tutorials

· http://www.git-scm.com/doc
· http://gitimmersion.com

30
Resources

31
Figure 3: Which resources do you know?

32
Lexicons

· Multi-language frequency lists


· https://invokeit.wordpress.com/frequency-word-lists
· created from subtitles

· English frequency list


· http://www.wordfrequency.info/free.asp
· created from Corpus of Contemporary American English

· The CMU Pronouncing Dictionary


· http://www.speech.cs.cmu.edu/cgi-bin/cmudict

· MPQA Subjectivity Clues


· http://mpqa.cs.pitt.edu/lexicons/subj_lexicon/

· CISLex
33
Treebanks

(S
(NP
(NNP John)
)
(VP
(VPZ loves)
(NP
(NNP Mary)
)
)
(...)
)
34
· The Penn Treebank Project
· https://www.cis.upenn.edu/~treebank/

· German Treebank: Tiger


· http://www.ims.uni-stuttgart.de/forschung/ressourcen/
korpora/tiger.en.html

· SMULTRON - Stockholm MULtilingual TReebank


· http://www.cl.uzh.ch/research/parallelcorpora/
paralleltreebanks/smultron_en.html

· University of Arizona TreeBank Viewer


· http://www.dingo.sbs.arizona.edu/~sandiway/
treebankviewer/index.html

35
Knowledge bases & ontologies

· WordNet
· https://wordnet.princeton.edu

· Germanet
· http://www.sfs.uni-tuebingen.de/GermaNet

· Chinese Wordnet, BalkaNet, IndoWordNet, FinnWordNet, …


· DBpedia
· machine readable Wikipedia content
· knowledge base of > 4 million “things”
· http://wiki.dbpedia.org/Ontology

36
Parallel text corpora

· Europarl
· proceedings of the European Parliament in 21 languages
· http://www.statmt.org/europarl/

· WMT11
· from a shared task
· http:
//www.statmt.org/wmt11/translation-task.html#download

37
Questions & answers

· Microsoft Research Question-Answering Corpus


· text of Encarta 98
· http://research.microsoft.com/en-us/downloads/
88c0021c-328a-4148-a158-a42d7331c6cf/

· PASCAL RTE datasets for textual entailment tasks


· http:
//pascallin.ecs.soton.ac.uk/Challenges/RTE/Datasets/

38
Collocations & NGrams

· Google Books Ngram Corpus


· http://storage.googleapis.com/books/ngrams/books/
datasetsv2.html

· http://corpus.byu.edu
· English collocations
· http://collocations.ooz.ie/

· CIS Wittfind
· http://wittfind.cis.uni-muenchen.de

39
Pretrained models & representations

· Stanford Core NLP


· tokenizer, sentence splitter, POS tagger, coreference resolution, etc.
· http://nlp.stanford.edu/software/corenlp.shtml

· Polyglot word embeddings


· low dimensional word representations for many Wikipedia languages
· https://sites.google.com/site/rmyeid/projects/
polyglot#TOC-Download-the-Embeddings

40
Text corpora

· UMBC WebBase corpus


· http://ebiquity.umbc.edu/resource/html/id/351

· News articles
· Reuters news:
http://trec.nist.gov/data/reuters/reuters.html
· Wall Street Journal:
https://catalog.ldc.upenn.edu/LDC94S13A
· North American News Text Corpus:
https://catalog.ldc.upenn.edu/LDC95T21

· Wikipedia

41
Wikipedia

· Raw dumps: https://dumps.wikimedia.org


· preprocessing scripts: Wikipedia Extractor
· http://medialab.di.unipi.it/wiki/Wikipedia_Extractor

· Preprocessed Wikipedia dumps


· https://sites.google.com/site/rmyeid/projects/
polyglot#TOC-Download-Wikipedia-Text-Dumps

42
Assignment

43
Exercise 01 - Hello CIS

1. Create a course project repository in CIP Gitlab (see instructions


above). Add your group members and us.
2. Create the skeleton directory structure.
3. Create a simple Hello world app in your designated programming
language, along with an executable and a basic readme and/or
Makefile to compile (if necessary) and run.
4. Stage, commit, push.
5. Tag the correct commit hash with name “ex_01”

Due: Thursday April 23, 2015, 16:00, i.e., the tag must point to a
commit earlier than the deadline

44
Have fun

45

You might also like