Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 34

PHD Research Proposal

An Integrated Comprehensive Approach Towards Road


Traffic Accident Reduction
Supervisor: Avisors:
Dr. Seyed Jafar Sadjadi Dr. Ali Tavakoli Kashani
Dr. Roozbeh Ghousi
PHD Candidate:
Zohreh Alizadeh Elizee
Jan. 2019
Outline
• Accident prediction modeling : state of the art
• Research purpose
• Motivation for research
• Research challenges
• Research gap
• Text analysis features
• Research questions
• Research Methods
• Research design
• Key publications
Accident prediction modeling : state of the art

1.Scenario analysis
2.Regression methods
3.Time series methods
4.Markov chain model
5.Grey model
6.Neural networks
7.Bayesian networks
Zheng, Xiaoping, and Mengting Liu. "An overview of accident forecasting
Research purpose

To use NLP- natural language processing, integrated with machine

learning methods to extract, identify, and derive high-quality

information from texts focusing on road traffic safety in order to better

predict and prevent crashes .


Structured vs unstructured data
Structured data Unstructured data
Structured vs unstructured data
Research Challenges
• Information is in unstructured textual form

• All possible words and phrase types in the language are the
dimensions of the model but sparse

• Complex and subtle relationships exist between concepts in text


“Company x merges with company y” “company x is bought by
company y”
Research Challenges contd
• Word ambiguity and context sensitivity
automobile= car = vehicle = Toyota
apple(the company) or apple (the fruit)

• noisy data: Spelling mistakes

• Persian application-language functions programming

• Social media micro blog not prevalent


Research motivations
• Approximately 90% of the World’s data is held in
unstructured formats
• Growing rapidly in size and importance
• untapped information contained in text
• captures opinions not easily quantified
Political
science
Social
network
/ Transportation
Research microblogs
research?

Gap Topic
Software modeling
engineeri applications
ng Linguistic
science

Geograp
hical Medical/biome
/location dical
s Crime
predictio
n
Research question
• R1-what possible accessible texts contain latent
variables contributing to accident prediction and
prevention?

• R2-can mining this type of text produce results


competitable to traditional predictive analytics?
Research method: Latent Dirichlet Allocation
key assumptions:

• Documents exhibit multiple topics


• LDA is a probabilistic model with a corresponding generative process
- each document is assumed to be generated by this process
• A topic is a distribution over a fixed vocabulary
Idea behind LDA
• Suppose you have the following set of sentences:

• I ate a banana and spinach smoothie for breakfast


• I like to eat broccoli and bananas.
• Chinchillas and kittens are cute.
• My sister adopted a kitten yesterday.
• Look at this cute hamster munching on a piece of broccoli.
Idea behind LDA contd
• Latent Dirichlet allocation is a way of automatically discovering topics
that these sentences contain. For example, given these sentences and
asked for 2 topics, LDA might produce something like

• Sentences 1 and 2: 100% Topic A


• Sentences 3 and 4: 100% Topic B
• Sentence 5: 60% Topic A, 40% Topic B
Idea behind LDA contd
• Topic A: 30% broccoli, 15% bananas, 10% breakfast, 10% munching, ...
(at which point, you could interpret topic A to be about food)

• Topic B: 20% chinchillas, 20% kittens, 20% cute, 15% hamster, ... (at
which point, you could interpret topic B to be about cute animals)
The Generative Process
• To generate a document:
• 1. Randomly choose a distribution over topics
• 2. For each word in the document

a. randomly choose a topic from the distribution over topics


b. randomly choose a word from the corresponding topic (distribution
over the vocabulary)
LDA generative Process:
The joint distribution (of the hidden and observed variables):
Classification by

Text analysis categorization


taxonomy
Language

features detection

Document level

Sentiment analysis Entity level

Text analysis features


Aspect based

Entity extraction
entities
Concept extraction

Article extraction

processing summarization

Hashtag
suggestion

vision Image tagging


Text analysis features-Classification by
Taxonomy
• Classifies a piece of text or an URL according to a pre-defined
taxonomy
Text analysis features-Language Detection
• Detects the main language a document is written in
Text analysis features-Document-Level
• Detects the sentiment of a document in terms of polarity (positive or
negative) and subjectivity (subjective or objective).
Text analysis features-Entity-Level
• Analyzes sentiment towards entities found in text. Extracts mentions
of named entities (Person, Organization, Location), and evaluates
sentiment towards each of the entities.
Text analysis features-Aspect-Based
• Given a review for a product or service, analyzes the sentiment of the
review towards each of the aspects of the product or review that are
mentioned in it.
Text analysis features-Entity Extraction
• extracts named entities (people, organizations, products and
locations) and values (URLs, emails, telephone numbers, currency
amounts and percentages) mentioned in a body of text.
Text analysis features-Concept Extraction
• Extracts named entities mentioned in a document, disambiguates and
cross-links them to Linked Data entities, along with their semantic
types
Text analysis features-Article Extraction
• Extracts the main body of text from an article, as well as author name,
publish date, embedded RSS feeds and media such as images &
videos.
Text analysis features-Summarization
• Summarizes an article into a few key sentences.
Text analysis features-Hashtag Suggestion
• Automatically suggests hashtags for optimal content sharing on Social
Media.
Text analysis features-Image Tagging
• Detects and tags up objects and concepts in an image and returns
them along with their confidence score.
Research design
key publications:
Transportation Research Part A deep learning approach for detecting
C-2017 traffic accidents from social media data

Transportation Research Part Using topic modeling to develop multi-


F-2018 level descriptions of
naturalistic driving data from drivers
with and without sleep
apnea
Transportation Research Part Recognizing driving styles based on
D-2018 topic models
key publications contd:
J of transport and Investigating injury severity risk factors in automobile
health-2017 crashes with predictive analytics and sensitivity analysis
methods
Transportation Automating a framework to extract and analyse transport
Research Part C- related social media content: The potential and the
2017 challenges
 
Transportation From Twitter to detector: Real-time traffic incident
Research Part C- detection
2016 using social media data
key publications contd:
Accident Analysis Detection of driver engagement in secondary tasks from
and Prevention- observed
2017 naturalistic driving behavior
Transportation Competing risk mixture model and text analysis for
Research Part C- sequential
2015 incident duration prediction
Transportation From Twitter to detector: Real-time traffic incident
Research Part C- detection
2016
Transportation Exploring the capacity of social media data for
Research Part C- modelling travel
2017 behaviour: Opportunities and challenges

You might also like