Professional Documents
Culture Documents
Dav Ia 2
Dav Ia 2
Where
would the team expect to spend the least time?
In a typical project lifecycle, the team would expect to invest most of the project time
during the execution phase. This is because the execution phase is where the bulk of
the project activities, tasks, and deliverables are completed. It involves putting the
project plan into action, coordinating resources, managing stakeholders, and carrying
out the actual work outlined in the project scope.
Several factors contribute to the high time investment during the execution phase:
1. Task Execution: This phase involves carrying out all the planned activities,
which can be time-consuming depending on the complexity of the project.
4. Quality Control: Monitoring and ensuring the quality of work being produced
during execution is essential, often requiring thorough reviews and
adjustments.
5. Risk Management: Mitigating risks and addressing any issues that arise
during execution can be time-intensive, as unexpected challenges may require
immediate attention.
Conversely, the team would expect to spend the least time during the project closure
phase. This phase occurs after all project activities are completed and the project
objectives are achieved.
What are the benefits of doing a pilot program before a full-scale rollout of a new
analytical methodology? Discuss this in the context of the mini case study.
4. Feedback Collection: During the pilot program, the company can gather
feedback from users, technicians, and stakeholders involved in the testing
process. This feedback is invaluable for identifying areas of improvement,
addressing user concerns, and refining the methodology before full-scale
implementation.
2. Plot ACF: Once the ACF is computed, it is often visualized using a plot called
the autocorrelation plot. In this plot, the lagged values are plotted on the x-
axis, and the corresponding autocorrelation coefficients are plotted on the y-
axis. This plot helps identify any significant correlations or patterns in the time
series data.
1. Simplifies Analysis: Stationary time series are easier to analyze and model
because their statistical properties remain constant over time. This simplifies
the process of identifying patterns, estimating parameters, and making
forecasts.
1. Augmented Dickey-Fuller (ADF) Test: The ADF test is used to test the null
hypothesis that a unit root is present in a time series, indicating that the series
is non-stationary. A unit root implies that the time series has a stochastic trend
and does not revert to a constant mean over time. The ADF test estimates a
regression model of the form:
The test statistic from the ADF test is compared to critical values from a specific
distribution to determine whether to reject the null hypothesis of a unit root. If the
test statistic is less than the critical value, the null hypothesis is rejected, indicating
that the time series is stationary.
3.
is compared to critical values from a specific distribution to determine
whether to reject the null hypothesis of stationarity. If the test statistic is
greater than the critical value, the null hypothesis is rejected, indicating that
the time series is non-stationary.
Difference between ARMA and ARIMA.
Assumes stationary time series Can handle both stationary and non-stationary
time series
Does not include differencing Includes differencing to make the series stationary
Model parameters: p (AR order), q (MA Model parameters: p (AR order), d (differencing
order) order), q (MA order)
Suitable for stationary processes Widely used for modeling time series with trends
and seasonality
Commonly used for financial time series, Commonly used for economic indicators, sales
climate data forecasts
Enlist and explain the seven practice areas of text analytics.
1. Text Preprocessing:
• Text preprocessing involves cleaning and preparing the raw text data
for analysis. This includes tasks such as removing irrelevant characters,
punctuation, and special symbols, as well as converting text to
lowercase and removing stop words (commonly occurring words that
carry little meaning). Text preprocessing is essential for improving the
quality of text analysis results and reducing noise in the data.
2. Text Tokenization:
• Text tokenization involves breaking down the text into smaller units
called tokens, which can be words, phrases, or sentences. This process
facilitates further analysis by converting the text into manageable units.
Tokenization can be performed using various techniques, such as word
tokenization (splitting text into individual words), sentence tokenization
(splitting text into sentences), and n-gram tokenization (splitting text
into sequences of n contiguous words).
3. Text Summarization:
1. Scatter Plot:
2. Line Plot:
3. Histogram:
4. Bar Plot:
5. Box Plot:
Tokenization is the process of breaking down a text document into smaller units
called tokens. These tokens can be words, phrases, sentences, or other meaningful
units depending on the specific task or analysis. Tokenization is a fundamental step
in natural language processing (NLP) and text analytics, and it serves several
important purposes:
1. Text Analysis: Tokenization allows for the analysis of text data at a granular
level by breaking it down into its constituent parts. This facilitates various text
processing tasks such as counting word frequencies, identifying patterns, and
extracting meaningful information.
Input: "Tokenization is an important step in text analysis." Output: [ "Tokenization" , "is" , "an" ,
"important" , "step" , "in" , "text" , "analysis" , "." ]
1. Data Collection:
• The first step in text analysis is to collect the raw textual data from
various sources such as websites, documents, social media, or other
sources. The data can be collected manually or automatically using web
scraping tools, APIs, or databases.
2. Text Preprocessing:
4. Feature Engineering:
5. Model Building:
• Once the features are engineered, machine learning or statistical
models can be built to analyze the text data and extract insights.
Common text analysis tasks include:
• Sentiment analysis: Determining the sentiment or emotional
tone expressed in text.
• Text classification: Categorizing text documents into predefined
categories or classes.
• Named entity recognition (NER): Identifying and extracting
entities such as names, organizations, or locations from text.
• Topic modeling: Discovering latent topics or themes present in a
collection of text documents.
• Finally, the results of the text analysis are interpreted and visualized to
communicate insights and findings effectively. This may include
generating summary reports, creating visualizations such as bar plots,
heatmaps, or word clouds, and presenting the results to stakeholders.
Which are Five Questions to Target the Practice Areas.
• Identifying the insights or information you want to extract from the text
will guide the selection of practice areas and methods. Are you
interested in identifying trends, sentiment analysis, categorizing topics,
or extracting named entities? Each of these tasks corresponds to
specific practice areas within text analysis.
1. Extractive Summarization:
2. Abstractive Summarization:
Extract insights from text data Solve business problems with text insights
Data Import:
• read.table() and read.delim(): Read data from tabular text files with
different delimiters.
• read.csv(): Read data from CSV files.
Data Export:
summary of subject and marks. Plot and interpret a boxplot for subject and marks in
R.
Name = c("John", "Emma", "Liam", "Olivia", "Noah", "Ava", "William", "Sophia", "James",
"Isabella"),
Marks = c(85, 90, 75, 80, 95, 85, 70, 92, 88, 82)
summary(students$Subject)
summary(students$Marks)
Exploratory Data Analysis (EDA) is an essential step in the data analysis process that involves
summarizing the main characteristics of a dataset, often using visual methods. In R
programming language, EDA is typically performed using various statistical and graphical
techniques to gain insights into the underlying structure, patterns, and relationships within
the data.
1. Summary Statistics:
2. Data Visualization:
4. Outlier Detection:
• Outliers, or data points that significantly deviate from the rest of the
data, are identified using statistical methods such as Z-score, box plots,
or scatter plots. Outliers may require further investigation or treatment
depending on the context of the analysis.
5. Data Transformation: