A Multi-Levels Geo-Location Based Crawling Method For Social Media Platforms

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/337985729
A Multi-Levels Geo-Location based Crawling Method for Social Media Platforms
Conference Paper · October 2019

DOI: 10.1109/SNAMS.2019.8931856
CITATIONS READS
4 51
4 authors:
Shadi AlZu'bi Darah Aqel

Al-Zaytoonah University of Jordan Al-Zaytoonah University of Jordan
51 PUBLICATIONS 434 CITATIONS 10 PUBLICATIONS 25 CITATIONS
SEE PROFILE SEE PROFILE
Ala Mughaid Yaser Jararweh

Hashemite University Duquesne University
7 PUBLICATIONS 12 CITATIONS 329 PUBLICATIONS 4,225 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Network, Security and privacy View project
A supervised approach for multi-label classification of Arabic news View project
All content following this page was uploaded by Shadi AlZu'bi on 11 February 2020.
The user has requested enhancement of the downloaded file.

A multi-levels Geo-location based crawling method
for social media platforms
Shadi AlZu’bi, Darah Aqel Alaa Mughaid Yaser Jararweh
Computer Science Department Computer Science Department Mathematics and Computer Science
Al Zaytoonah University of Jordan The Hashemite University Duquesne University
Amman, Jordan Zarqa, Jordan Pittsburgh, PA 15282, USA
{smalzubi, d.aqel}@zuj.edu.jo ala.mughaid@hu.edu.jo jararwehy@duq.edu
Abstract—The large size and the dynamic nature of the Web the web, which leads to significant savings in hardware and
highlight the need for continuous support and updating of network resources without losing the Data sources.
Web based information retrieval systems. Crawlers facilitate the The idea behind this work is to have your websites server
process by following the hyperlinks in Web pages to automatically
download a partial snapshot of the Web. While some systems rely talk directly to server with a request to create an event with the
on crawlers that exhaustively crawl the Web, others incorporate given details. Your server would then receive response, process
focus within their crawlers to harvest application or topic specific it, and send back relevant information to the browser, such
collections. This project studied web crawling and scraping at as a confirmation message to the user. This project studied
many different levels. It will aggregate information from multiple web crawling and scraping at many different levels. It will
sources into one central location. It Specifics a program for
downloading web pages. Given an initial set of seed URLs, it aggregate information from multiple sources into one central
recursively downloads every page that is linked from pages in location. It Specifics a program for downloading web pages.
the set, that have content satisfies specific criterion. Social media, Given an initial set of seed URLs, it recursively downloads
web applications, and mobile applications have been employed every page that is linked from pages in the set, that have con-
together in the proposed system to manage the search in the tent satisfies specific criterion. Social media, web applications,
rapidly growing worldwide web. Applying the proposed system
is resulting in a fast and comfortable search engine that fulfill and mobile applications have been employed together in the
the users requests based on specific geolocations. proposed system to manage the searching process in the web.
Index Terms—Social Media, Data set sampling, Crawling, The remaining of this paper is structured as follow: Section
Scraping, Search engines, Geo-Locations II overviewed the related literature in the field and what
previous researchers achieved in the same area. The method-
I. I NTRODUCTION ology that has been followed to do this research has been
introduced in section III, including the data set preparation and
A web crawler is a piece of code that travels over the how the focused Crawler is implemented. Section IV presents
Internet and collects data from various web pages, also known experimental results and a discussion about the proposed work
as web scraping. Application program interface (API) which is provided. Finally, the work is concluded in section V.
is a set of routines, protocols, and tools for building software
applications. It has been used widely with web crawlers, II. L ITERATURE REVIEW
Basically, an API specifies how software components should Many researches have investigated the quality of data in
interact. It allows the programmers to use predefined functions social media and social network, but still a huge gap between
to interact with systems, instead of writing them from scratch what was achieved and expectation. Several researches studies
[1]. social media and social networks mainly using data acquired
Social media applications have been employed significantly from Twitter. These data are with a risk of being disturbed and
in this field, retrieving information from social networks is misleading sample of the complete data. Many researches have
the first and primordial step at many data analysis fields been implemented in the last decade focusing on acquiring
such as Natural Language Processing, Sentiment Analysis and data sets from the social media [7]–[13]. The proposed project
Machine Learning. The use of Facebook AP, LinkedIn API, idea is unique, it focuses on implementing and providing a new
Twitter API, and other public platforms for collecting public research platform related to big data serving community within
streams of information makes it happen [2]. different services and events associated within the companies,
Many researchers browse the motivations for crawling or as data collected from social media had a significant concern
Scraping, including the content indexing for search engines, in this research.
Automated security testing and vulnerability assessment, and Hai Dong et al. reviewed in [14] the recent studies on one
automated testing and model checking [3]–[6]. Therefore, we category of semantic focused crawlers, which is a series of
are aiming in this research to seek out pages that are relevant crawlers that utilize ontologies to link acquired documents
to a pre-defined set of topics avoiding irrelevant regions in from the fetching process with the ontological concepts.
They organized documents in the web and filtered irrelevant [35]. Modern APIs are well documented for consumption
webpages regarding to the searching topics. The research and versioning for specific audiences, they are much more
team compared the crawlers at several perspectives including standardized, stronger discipline for security and governance,
(domain, working environment, evaluation metrics, special as well as monitored and managed for performance and scale.
functions, technologies utilized, and evaluation results). Most of the collected data come from several sources,
Gautam Pant et al discussed in [15] the related issues to de- including Direct User Input (survey, search form), Third Party
velop crawling infrastructures. They reviewed several crawling APIs (social media), Server Logs (logs from web servers
algorithms that might be used to evaluate the quality. Crawling like Apache, heritrix, and octoparse), and Web Crawling
social media has been considered as well by many researchers. or Scraping. Different requirements are needed to prepare
In [16]–[19], research studies investigated the quality of social the data set and implement the proposed system including
media data. They focused on how online recommendation Apache Nutch, Apache Tomcat, CYGWIN, Apache Hadoop,
systems and social media data can be evaluated. HERITRIX, Cloudera virtual Machine, Oracle virtual box, oc-
According to Gjoka et al in [20], social networks sampling toparse, Facebook Graph Search, LinkedIn Lead Extractor, and
studies can be considered as a of part of social media crawling, Netvizz. Data has been collected for the Artificial Intelligence
which are quite common, including. Gjoka used in [21] the Agent through Graph API Facebook which contains many
original graph sampling study by Leskovec and Faloutsos [22] spoken or published texts in English. Knowing the fact that
as a baseline. A motivating work on sampling social networks Facebook is the most widely used social media network, it is
efficiently with a restricted budget had been presented by the best place to get random and accurate data [36]. We have
Wang et al. in [23]. collected data From LinkedIn and twitter through LinkedIn
It is known that Facebook users content is hard to be Lead Extractor tool and REST APIs.
accessed because of the default privacy policy of Facebook
[16], [24]–[28]. Therefore, the collected amount of private B. Focused Crawler
Facebook data is limited. Furthermore, since Facebook does The role of the focused crawler in the proposed system
not have the option of selling data, crawling methods to collect is to selectively seek out pages that are relevant to a pre-
social interactions from publicly in facebook is needed, which defined set of topics. The topics are specified using keywords,
is the main challenge of the proposed work. rather than collecting and indexing all accessible hypertext
Buccafurri et al. discussed in [29] different methods to links. Focused crawler analyzes its crawl boundary to find the
transverse social networks from a crawling viewpoint. They links that are likely to be most relevant for the crawl, it avoids
focused on groups instead of personal users profiles. irrelevant regions of the web. This leads to significant savings
Erlandsson et al. presented in [30] a novel User-guided So- in hardware and network resources and helps in keeping the
cial Media Crawling method (USMC). USMC has been built to crawl more up to date. Web crawler is relatively a simple
gather data from social networks, it employed the knowledge automated program used by linguists and market researchers,
of users to agree with user generated content order to cover the who are fetching for information from the Internet in an
most possible user interactions. The research team validated organized manner [37]. Alternative names for a web crawler
USMC by crawling a plenty of Facebook pages, and contents include web spider, web robot, bot, crawler, and automatic
from millions of users having billions of interactions. The indexer.
proposed USMC was compared with other crawling methods. The crawler begins as a basic exposure to search algorithms
The achieved results showed the possibility of covering most and then can be extended in several directions to include
of the Facebook page interactions by sampling a few posts. information retrieval, statistical method Learning, unsuper-
Ahlers and Boll presented in [31]–[34] a search engine vised learning, natural language processing, and knowledge
based on Geo-location. Their engine was automatically derives representation.
spatial context from the unorganized resources in the web,
IV. EXPERIMENTAL RESULTS
but allows for location-based search. A focused crawler is
presented in this research that applied heuristics to crawl, it A. Apache Nutch
analyzed Web pages which are probably relate to a region In 2003 Doug Cutting, the Lucene creator and Mike Ca-
or place; the actual location was identified using location farella founded Apache Nutch [38], it is an open source
extractor. The presented work proved good results in Web WebCrawler software written in java and used for crawling
search based on location applications that provide fast search- websites. Apache Nutch facilitates parsing, indexing, creating
ing results and right on the spot. a search engine, customizing the search according to needs,
scalability, robustness and Scoring Filter for custom imple-
III. M ETHODOLOGY
mentations. Apache Nutch can run on a single machine as
A. Data Collection and Preparation well as on a distributed environment such as Apache Hadoop.
Modern APIs adhere to standards (typically HTTP and It can be integrated with eclipse and CYGWIN easily and can
REST), that are developer-friendly, easily accessible and un- index all the web pages that are crawled by Apache Nutch to
derstood broadly. API has its own software development Cygwin or to eclipse. Figure 1 illustrates the operation classes
lifecycle (SDLC) of designing, testing, building and managing within the Nutch. Crawling is driven by the Apache Nutch
Fig. 1. Operation classes within the Nutch
crawling tool, once Apache Nutch has indexed the web pages
to Cygwin or to eclipse, user can search the required web Fig. 3. starting Tomcat with CYGWIN
pages in Cygwin. According to [39] CrawlDB is generated by
Apache Nutch, a crawling cycle has four steps, in which each
is implemented as a Hadoop MapReduce job (GeneratorJob,
FetcherJob, ParserJob, and DbUpdaterJob) [40]. The Nutch is
distributed by creating the seed file and copy it into a ”urls”
directory then copy the directory up to the HDFS, then copy
the configuration to the Hadoop configuration directory.
Apache Nutch can be easily integrated with Apache
Hadoop, and we can make our process much faster than
running Apache Nutch on a single machine. After integrating
Apache Nutch with Apache Hadoop, we can perform crawling
on the Apache Hadoop cluster environment. So, the process
will be much faster, and we will get the highest amount of
throughput.
B. Cygwin
In [41], Morteza defined Cygwin as a POSIX-compatible
environment that runs natively on Microsoft Windows. Its
goal is to allow Unix programs to be recompiled and run Fig. 4. successfully crawling within CYGWIN
natively on Windows with minimal source code modifications
. However, it provides the same underlying POSIX API they
would expect. Figures (2, 3, and 4) illustrate a successful is a command-line tool that can optionally be used to initiate
crawling within CYGWIN. crawls. Heritrix was developed jointly by the Internet Archive
and the Nordic national libraries on specifications written
in early 2003 [42]. Then, it has been continually improved
by employees of the Internet Archive and other interested
parties. In the proposed methods, Heritrix visits web pages
and searches for links. However, it follows links to new pages,
where it once again identifies links, then follows the identified
links, and so on. Therefore, a huge amount of links were
gathered rapidly. In the proposed system, three hops limit has
been set, at the limit, Heretrix will stop collecting links and
move on to the next seed from the list. This allows many
territories to be covered who are moving rapidly through the
huge governmental domains. The following figure illustrates
how Heritrix is extracted in the proposed system at Cloudera
Fig. 2. injector progress in CYGWIN
home.
C. Heritrix D. Cloudera Virtual Machine

Heritrix is a web crawler designed for web archiving, which Cloudera Virtual Machine provides a scalable, flexible, and
have an interface that is accessible using a web browser, there integrated platform facilitating the process of managing the
public and pullable. This is a huge advantage for gathering a
large amount of data for analytics. Twitters API allows users
to do complex queries like pulling every tweet about a certain
topic within the last twenty minutes or pull a certain users
non-retweeted tweets. There are two APIs that can be used
for collecting tweets (REST API and streaming API). In the
proposed work, Tweepy Python library is used for connecting
Twitters API.
V. C ONCLUSION AND FUTURE WORK

The proposed research studied web crawling and scraping
at many different levels. A website and Android application
Fig. 5. Extract Heritrix at Cloudera home GeoLOCK have been implemented as an interface to show
the resulting database as a small search engine. The main
objectives were to configure tools for crawling links in the Web
rapid increase in volumes and varieties of data in enterprises and scraping in social media APIs. Deep study to crawling and
[43]. Cloudera enables the deployment and management of scraping strategies has been proposed in this work. However,
Apache Hadoop, it manipulates and analyzes data securely. huge data were collected for targeted audience at a specific
E. Application Programming Interface geo-location. This kind of research shows the importance
of web crawling which deemed high performance are basic
API advantages including (automation, flexibility, Effi- components of various web services.
ciency, Integration, Personalization, Adaptation and scalabil- The proposed project solves the problem of requiring long
ity) gives it the best chance to be employed in the proposed time in the searching process. It proves that optimizing the
work. Several types of APIs have been extensively employed way of handling available software will improve the results,
over web networks to exchange information and enhancing and by creating a smart interface facilitate the searcher job.
communication quality.
In other words, the proposed system can compile a set of
1) Facebook APIs: Access tokens allow applications to
data through a set of advanced software from several places
access the Graph API. They typically allow applications to
easily to come up with an advanced search. As for future work,
access User’s information without requiring passwords. And
the proposed system could be applied on various topics, and
identify applications, applications user, and the type of data
could be viewed at famous search engines.
the User has permitted application to access. All Graph API
endpoints require an access token of some kind, so each ACKNOWLEDGMENTS
time you access an endpoint, your request must include an
access token. Facebook implements OAuth 2.0 as its standard We gratefully acknowledge the support of the Deanship of
authentication mechanism but provides a convenient way to Research at Al-Zaytoonah University of Jordan for supporting
access token for development purposes. this work via Grant # 12/18/2018-2019. And we gratefully
The Graph API is the primary way to get data into and acknowledge the efforts provided by the undergraduate stu-
out of the Facebook platform. It’s a low-level HTTP-based dents Violet Jaber, Ibtihal Saqr, Farah Abu Rayash, and Bushra
API that apps can use to programmatically query data, post Abu Hamdieh to complete the data collection and Preparation
new stories, manage ads, upload photos, and perform a wide process.
variety of other tasks. However, a collected data set has been
acquired from Facebook graph searching tool. R EFERENCES
2) LinkedIn Lead Extractor: LinkedIn Lead Extractor ex- [1] P.-Y. P. Chi and Y. Li, “Weave: Scripting cross-device wearable in-
tracts endless number of data sets from LinkedIn at an teraction,” in Proceedings of the 33rd Annual ACM Conference on
Human Factors in Computing Systems, CHI ’15, (New York, NY, USA),
exceptionally fast rate. This tool captures contact information pp. 3923–3932, ACM, 2015.
and profile links from LinkedIn. However, users can approach [2] S. Lomborg and A. Bechmann, “Using apis for data collection on social
Physicians to freshly baked food providers listed on LinkedIn media,” The Information Society, vol. 30, no. 4, pp. 256–265, 2014.
in through the employed LinkedIn Lead Extractor. [3] R. N. Landers, R. C. Brusso, K. J. Cavanaugh, and A. B. Collmus,
“A primer on theory-driven web scraping: Automatic extraction of big
If the product is particularly useful for community such as data from the internet for use in psychological research.,” Psychological
financial advisors, users can get contact information of the methods, vol. 21, no. 4, p. 475, 2016.
Administrators rightly divided up by category on LinkedIn. [4] A. Al-Fuqaha, D. Kountanis, S. Cooke, M. Elbes, and J. Zhang, “A
genetic approach for trajectory planning in non-autonomous mobile
Simple searches in a specific field will reveal thousands of ad-hoc networks with qos requirements,” in 2010 IEEE Globecom
experts, service providers, and potential clients in a very short Workshops, pp. 1097–1102, IEEE, 2010.
time. [5] E. K. Almaita et al., “Improving stability and convergence for adap-
tive radial basis function neural networks algorithm.(on-line harmonics
3) Twitter APIs: Twitter is a gold mine of data. Unlike other estimation application),” International Journal of Renewable Energy
social platforms, almost every users tweets are completely Development, vol. 6, no. 1, p. 9, 2017.
[6] J. Abukhait, “An automated surface defect inspection system using [27] I. Obaidat, R. Mohawesh, M. Al-Ayyoub, A.-S. Mohammad, and
local binary patterns and co-occurrence matrix textures based on svm Y. Jararweh, “Enhancing the determination of aspect categories and
classifier,” Jordan Journal of Electrical Engineering, vol. 4, no. 2, their polarities in arabic reviews using lexicon-based approaches,” in
pp. 100–113, 2018. 2015 IEEE Jordan Conference on Applied Electrical Engineering and
[7] M. Faqeeh, N. Abdulla, M. Al-Ayyoub, Y. Jararweh, and M. Quwaider, Computing Technologies (AEECT), pp. 1–6, IEEE, 2015.
“Cross-lingual short-text document classification for facebook com- [28] Y. Jararweh, S. Alzubi, and S. Hariri, “An optimal multi-processor allo-
ments,” in 2014 International Conference on Future Internet of Things cation algorithm for high performance gpu accelerators,” in 2011 IEEE
and Cloud, pp. 573–578, IEEE, 2014. Jordan Conference on Applied Electrical Engineering and Computing
[8] S. AlZubi, M. Shehab, M. Al-Ayyoub, Y. Jararweh, and B. Gupta, Technologies (AEECT), pp. 1–6, IEEE, 2011.
“Parallel implementation for 3d medical volume fuzzy segmentation,” [29] F. Buccafurri, G. Lax, A. Nocera, and D. Ursino, “Moving from social
Pattern Recognition Letters, 2018. networks to social internetworking scenarios: The crawling perspective,”
[9] R. Abooraig, S. Al-Zu’bi, T. Kanan, B. Hawashin, M. Al Ayoub, and Information Sciences, vol. 256, pp. 126–137, 2014.
I. Hmeidi, “Automatic categorization of arabic articles based on their [30] F. Erlandsson, P. Bródka, M. Boldt, and H. Johnson, “Do we really need
political orientation,” Digital Investigation, vol. 25, pp. 24–41, 2018. to catch them all? a new user-guided social media crawling method,”
[10] S. AlZu’bi, B. Hawashin, M. EIBes, and M. Al-Ayyoub, “A novel recom- Entropy, vol. 19, no. 12, p. 686, 2017.
mender system based on apriori algorithm for requirements engineering,” [31] D. Ahlers and S. Boll, “Location-based web search,” in The geospatial
in 2018 Fifth International Conference on Social Networks Analysis, web, pp. 55–66, Springer, 2009.
Management and Security (SNAMS), pp. 323–327, IEEE, 2018. [32] S. Al-Zubi, M. Al-Ayyoub, Y. Jararweh, and M. A. Shehab, “Enhanced
[11] M. Elbes, S. Alzubi, T. Kanan, A. Al-Fuqaha, and B. Hawashin, “A 3d segmentation techniques for reconstructed 3d medical volumes:
survey on particle swarm optimization with emphasis on engineering Robust and accurate intelligent system,” Procedia computer science,
and network applications,” Evolutionary Intelligence, pp. 1–17, 2019. vol. 113, pp. 531–538, 2017.
[12] T. Kanan, O. Sadaqa, A. Aldajeh, H. Alshwabka, S. AlZubi, M. Elbes, [33] M. E. M. A.-A. Shadi AlZu’bi, Bilal Hawashin, “A novel recommender
B. Hawashin, M. A. Alia, et al., “A review of natural language process- system based on apriori algorithm for requirements engineering,” in 2018
ing and machine learning tools used to analyze arabic social media,” Fifth International Conference on Social Networks Analysis, Manage-
in 2019 IEEE Jordan International Joint Conference on Electrical ment and Security (SNAMS), 2018.
Engineering and Information Technology (JEEIT), pp. 622–628, IEEE, [34] S. AlZubi, 3D multiresolution statistical approaches for accelerated
2019. medical image and volume segmentation. PhD thesis, Brunel University
School of Engineering and Design PhD Theses, 2011.
[13] B. Hawashin, S. Alzubi, T. Kanan, and A. Mansour, “An efficient
[35] S. K. Johnston and M. P. Nally, “Representing models in systems
semantic recommender method forarabic text,” The Electronic Library,
development lifecycle (sdlc) tools using a network of internet resources,”
vol. 37, no. 2, pp. 263–280, 2019.
2015. US Patent 9,122,422.
[14] H. Dong, F. K. Hussain, and E. Chang, “A survey in semantic web
[36] C. Dwyer, S. Hiltz, and K. Passerini, “Trust and privacy concern
technologies-inspired focused crawlers,” in 2008 Third International
within social networking sites: A comparison of facebook and myspace,”
Conference on Digital Information Management, pp. 934–936, IEEE,
AMCIS 2007 proceedings, p. 339, 2007.
2008.
[37] Z. Xiang and U. Gretzel, “Role of social media in online travel
[15] G. Pant, P. Srinivasan, and F. Menczer, “Crawling the web,” in Web information search,” Tourism management, vol. 31, no. 2, pp. 179–188,
Dynamics, pp. 153–177, Springer, 2004. 2010.
[16] A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattachar- [38] R. Khare, D. Cutting, K. Sitaker, and A. Rifkin, “Nutch: A flexible
jee, “Measurement and analysis of online social networks,” in Proceed- and scalable open-source web search engine,” Oregon State University,
ings of the 7th ACM SIGCOMM conference on Internet measurement, vol. 1, pp. 32–32, 2004.
pp. 29–42, ACM, 2007. [39] A. Schumacher, L. Pireddu, M. Niemenmaa, A. Kallio, E. Korpelainen,
[17] E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne, “Finding G. Zanetti, and K. Heljanko, “Seqpig: simple and scalable scripting for
high-quality content in social media,” in Proceedings of the 2008 large sequencing data sets in hadoop,” Bioinformatics, vol. 30, no. 1,
international conference on web search and data mining, pp. 183–194, pp. 119–120, 2013.
ACM, 2008. [40] A. Crespi, D. Lachat, A. Pasquier, and A. J. Ijspeert, “Controlling
[18] A.-S. Mohammad, Z. Jaradat, A.-A. Mahmoud, and Y. Jararweh, “Para- swimming and crawling in a fish robot using a central pattern generator,”
phrase identification and semantic text similarity analysis in arabic Autonomous Robots, vol. 25, no. 1-2, pp. 3–13, 2008.
news tweets using lexical, syntactic, and semantic features,” Information [41] M. Kashyian, S. L. Mirtaheri, and E. M. Khaneghah, “Portable inter pro-
Processing & Management, vol. 53, no. 3, pp. 640–652, 2017. cess communication programming,” in 2008 The Second International
[19] S. AlZubi, N. Islam, and M. Abbod, “Enhanced hidden markov models Conference on Advanced Engineering Computing and Applications in
for accelerating medical volumes segmentation,” in GCC Conference Sciences, pp. 181–186, IEEE, 2008.
and Exhibition (GCC), 2011 IEEE, pp. 287–290, IEEE, 2011. [42] G. Mohr, M. Stack, I. Rnitovic, D. Avery, and M. Kimpton, “Introduction
[20] M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou, “Walking to heritrix,” in 4th International Web Archiving Workshop, 2004.
in facebook: A case study of unbiased sampling of osns,” in 2010 [43] M. Kiran, A. Kumar, S. Mukherjee, and R. G. Prakash, “Verification and
Proceedings IEEE Infocom, pp. 1–9, Ieee, 2010. validation of mapreduce program model for parallel support vector ma-
[21] M. Gjoka, C. T. Butts, M. Kurant, and A. Markopoulou, “Multigraph chine algorithm on hadoop cluster,” International Journal of Computer
sampling of online social networks,” IEEE Journal on Selected Areas Science Issues (IJCSI), vol. 10, no. 3, p. 317, 2013.
in Communications, vol. 29, no. 9, pp. 1893–1905, 2011.
[22] J. Leskovec and C. Faloutsos, “Sampling from large graphs,” in Proceed-
ings of the 12th ACM SIGKDD international conference on Knowledge
discovery and data mining, pp. 631–636, ACM, 2006.
[23] X. Wang, R. T. Ma, Y. Xu, and Z. Li, “Sampling online social networks
via heterogeneous statistics,” in 2015 IEEE Conference on Computer
Communications (INFOCOM), pp. 2587–2595, IEEE, 2015.
[24] M. Elbes, S. Alzubi, T. Kanan, A. Al-Fuqaha, and B. Hawashin, “A
survey on particle swarm optimization with emphasis on engineering
and network applications,” Evolutionary Intelligence, pp. 1–17, 2019.
[25] S. A. Catanese, P. De Meo, E. Ferrara, G. Fiumara, and A. Provetti,
“Crawling facebook for social network analysis purposes,” in Proceed-
ings of the international conference on web intelligence, mining and
semantics, p. 52, ACM, 2011.
[26] T. Crnovrsanin, C. W. Muelder, R. Faris, D. Felmlee, and K.-L. Ma,
“Visualization techniques for categorical analysis of social networks with
multiple edge sets,” Social Networks, vol. 37, pp. 56–64, 2014.
View publication stats

A Multi-Levels Geo-Location Based Crawling Method For Social Media Platforms

Uploaded by

Copyright:

Available Formats

You might also like

A Multi-Levels Geo-Location Based Crawling Method For Social Media Platforms

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Multi-Levels Geo-Location Based Crawling Method For Social Media Platforms

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A Multi-Levels Geo-Location based Crawling Method for Social Media Platforms

Conference Paper · October 2019

Shadi AlZu'bi Darah Aqel

SEE PROFILE SEE PROFILE

Ala Mughaid Yaser Jararweh

SEE PROFILE SEE PROFILE

Network, Security and privacy View project

A supervised approach for multi-label classification of Arabic news View project

The user has requested enhancement of the downloaded file.

C. Heritrix D. Cloudera Virtual Machine

V. C ONCLUSION AND FUTURE WORK

View publication stats

You might also like