Professional Documents
Culture Documents
Storm Crawler
Storm Crawler
Crawling the German Health Web: Exploratory Study and Graph Analysis.[6]
The generation of a multi-million page corpus for the Persian language.[7]
The SIREN - Security Information Retrieval and Extraction engine.[8]
The project Wiki contains a list of videos and slides available online.[9]
See also
Apache Storm
Apache Nutch
Apache Solr
Elasticsearch
References
1. "Powered By · DigitalPebble/storm-crawler Wiki · GitHub" (https://github.com/DigitalPebble/s
torm-crawler/wiki/Powered-By). Github.com. 2017-03-02. Retrieved 2017-04-19.
2. "News Dataset Available – Common Crawl" (http://commoncrawl.org/2016/10/news-dataset-
available/).
3. "StormCrawler: An Open Source SDK for Building Web Crawlers with ApacheStorm |
Linux.com | The source for Linux information" (https://www.linux.com/news/stormcrawler-ope
n-source-sdk-building-web-crawlers-apachestorm). Linux.com. 2016-10-12. Retrieved
2017-04-19.
4. "Julien Nioche on StormCrawler, Open-Source Crawler Pipelines Backed by Apache Storm"
(http://www.infoq.com/news/2016/12/nioche-stormcrawler-web-crawler). Infoq.com. 2016-12-
15. Retrieved 2017-04-19.
5. "The Battle of the Crawlers: Apache Nutch vs. StormCrawler - DZone Big Data" (https://dzon
e.com/articles/the-battle-of-the-crawlers-apache-nutch-vs-stormcr). Dzone.com. Retrieved
2017-04-19.
6. Zowalla, Richard; Wetter, Thomas; Pfeifer, Daniel (2020). "Crawling the German Health
Web: Exploratory Study and Graph Analysis" (https://www.jmir.org/2020/7/e17853/). Journal
of Medical Internet Research. 22 (7): e17853. doi:10.2196/17853 (https://doi.org/10.2196%2
F17853). PMC 7414401 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7414401).
PMID 32706701 (https://pubmed.ncbi.nlm.nih.gov/32706701).
7. "MirasText: An Automatically Generated Text Corpus for Persian" (https://www.researchgate.
net/publication/325324201).
8. Sanagavarapu, Lalit Mohan; Mathur, Neeraj; Agrawal, Shriyansh; Reddy, Y. Raghu (2018).
Advances in Information Retrieval. Lecture Notes in Computer Science. Vol. 10772.
pp. 811–814. doi:10.1007/978-3-319-76941-7_81 (https://doi.org/10.1007%2F978-3-319-76
941-7_81). ISBN 978-3-319-76940-0.
9. "Presentations · DigitalPebble/storm-crawler Wiki · GitHub" (https://github.com/DigitalPebbl
e/storm-crawler/wiki/Presentations). Github.com. 2017-04-04. Retrieved 2017-04-19.