LP-II Oral Question Bank

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Data Mining

Oral Questions LP-II


1. What are the different Data Warehousing Schemas?

ANS:- There are three different Data Warehousing Schemas are as follows
1) Star Schema
2) Snowflake Schema
3) Fact constellation Schema

2. Explain Star Schema vs. Snowflake Schema

ANS:-

Star Schema Snowflake schema

In star schema, The fact tables While in snowflake schema, The fact tables,
and the dimension tables are dimension tables as well as sub dimension
1. contained. tables are contained.

Star schema is a top-down


2. model. While it is a bottom-up model.

3. Star schema uses more space. While it uses less space.

It takes less time for the While it takes more time than star schema
4. execution of queries. for the execution of queries.

In star schema, Normalization While in this, Both normalization and


5. is not used. denormalization are used.

6. It’s design is very simple. While it’s design is complex.

The query complexity of star While the query complexity of snowflake


7. schema is low. schema is higher than star schema.

It’s understanding is very


8. simple. While it’s understanding is difficult.

It has less number of foreign


9. keys. While it has more number of foreign keys.
10. It has high data redundancy. While it has low data redundancy.

3. Mention what is the responsibility of a Data analyst?

ANS:- Responsibilities of a Data analyst are


1) Data Mining
2) Maintaining Databases
3) Data Preparation
4) Quality assurance
5) Collaborating with other teams
6) Maintaining confidentially of data
7) Preparing reports
8) Troubleshooting

4. List out some of the best practices for data cleaning?


ANS:- the best practices for data cleaning are as follows
1) Parsing the data
2) Correcting data
3) Standardizing data
4) Matching data
5) consolidation
6) dealing with missing data
7) dealing incorrect and noisy data

5. Mention what is data cleansing?


ANS:- Data cleaning is the process of fixing or removing incorrect, corrupted,
incorrectly formatted, duplicate, or incomplete data within a dataset. When
combining multiple data sources, there are many opportunities for data to be duplicated
or mislabeled.

6. List out some common problems faced by data analyst?


ANS:- 1) Poor data quality
2) Inaccessible data
3) Security of data
4) Duplicated data
5) Missing values in data
6) Inconsistence data

7. List of some best tools that can be useful for data-analysis?


ANS:- 1) Raid Miner
2) Google search operator
3) Open Refine
4) KNIME
5) Solver
6) Node xl
7) IO

8. What is difference between Supervised and Unsupervised Learning?


ANS:-
Supervised Learning Unsupervised Learning

Supervised Learning can be used for 2 different types Unsupervised Learning can be used for 2
of problems i.e. regression and classification different types of problems i.e. clustering
and association.

Input Data is provided to the model along with the Only input data is provided in
output in the Supervised Learning. Unsupervised Learning.

Output is predicted by the Supervised Learning. Hidden patterns in the data can be found
using the unsupervised learning model.

Labeled data is used to train supervised learning Unlabeled data is used to train
algorithms. unsupervised learning algorithms.

Accurate results are produced using a supervised The accuracy of results produced are less
learning model. in unsupervised learning models.

Training the model to predict output when a new data Finding useful insights, hidden patterns
is provided is the objective of Supervised Learning. from the unknown dataset is the objective
of the unsupervised learning.

Supervised Learning includes various algorithms such Unsupervised Learning includes various
as Bayesian Logic, Decision Tree, Logistic Regression, algorithms like KNN, Apriori Algorithm,
Linear Regression, Multi-class Classification, Support and Clustering.
Vector Machine etc.

To assess whether right output is being predicted, No feedback will be taken by the
direct feedback is accepted by the Supervised unsupervised learning model.
Learning Model.

In Supervised Learning, for right prediction of output, Unsupervised Learning has more
the model has to be trained for each data, hence resemblance to Artificial Intelligence, as it
Supervised Learning does not have close keeps learning new things with more
resemblance to Artificial Intelligence. experience.

Number of classes are known in Supervised Learning. Number of classes are not known in
Unsupervised Learning

In scenarios where one is aware of output and input In the scenarios where one is not aware of
data, supervised learning can be used. output data, but is only aware of the input
data then Unsupervised Learning could be
used.

Computational Complexity is very complex in There is less computational complexity in


Supervised Learning compared to Unsupervised Unsupervised Learning when compared to
Learning Supervised Learning.

Supervised Learning will use off-line analysis Unsupervised Learning uses Real time
analysis of data.
Some of the applications of Supervised Learning are Some of the applications of Unsupervised
Spam detection, handwriting detection, pattern Learning are detecting fraudulent
recognition, speech recognition etc. transactions, data preprocessing etc.

9. What are different similarities between Kmeans and KNN Algorithm?


ANS:- KNN represents a supervised classification algorithm that will give new data
points accordingly to the k number or the closest data points, while k-means clustering
is an unsupervised clustering algorithm that gathers and groups data into k number of
clusters

10. What is Euclidean distance? Explain with Suitable example?


ANS:- Euclidean distance is considered the traditional metric for
problems with geometry. It can be simply explained as the ordinary
distance between two points. It is one of the most used algorithms in
the cluster analysis. One of the algorithms that use this formula would
be K-mean.

EXAMPLE:- Manhattan Distance:


This determines the absolute difference among the pair of the coordinates.
Suppose we have two points P and Q to determine the distance between these
points we simply have to calculate the perpendicular distance of the points from
X-Axis and Y-Axis.
In a plane with P at coordinate (x1, y1) and Q at (x2, y2).
Manhattan distance between P and Q = |x1 – x2| + |y1 – y2|

11. What is hamming distance? Explain with Suitable example?


ANS:- Hamming distance is a metric for comparing two binary data strings. While
comparing two binary strings of equal length, Hamming distance is the number of bit
positions in which the two bits are different.
The Hamming distance between two strings, a and b is denoted as d(a,b).
It is used for error detection or error correction when data is transmitted over computer
networks. It is also using in coding theory for comparing equal length data words.
EXAMPLE:- Suppose there are two strings 1101 1001 and 1001 1101.
11011001 ⊕ 10011101 = 01000100. Since, this contains two 1s, the Hamming distance,
d(11011001, 10011101) = 2.

12. What is Chi Square Distance? Explain with Suitable example?


ANS:- A chi-square (χ2) statistic is a test that measures how a model compares
to actual observed data. ... The chi-square statistic compares the size of any
discrepancies between the expected results and the actual results, given the size of the
sample and the number of variables in the relationship.
Example:- applications like similar image retrieval, image texture, feature extractions etc

13. What are different types of Clustering?


ANS:-

1) Partitioning Clustering Method. In this method, let us say that “m” partition is
done on the “p” objects of the database. ...
2) Hierarchical Clustering Methods. ...
3) Density-Based Clustering Method. ...
4) Grid-Based Clustering Method. ...
5) Model-Based Clustering Methods. ...
6) Constraint-Based Clustering Method.

14. What is Weka Tool? Explain the Step to Perform Clustering on Sample data set?
ANS:-

The WEKA Simple K-Means algorithm uses Euclidean distance measure to compute
distances between instances and clusters. To perform clustering, select the "Cluster"
tab in the Explorer and click on the "Choose" button. This results in a drop down list
of available clustering algorithms.

15. Explain Association Rule?

ANS:- Association rule mining finds interesting associations and relationships


among large sets of data items. This rule shows how frequently a itemset
occurs in a transaction. A typical example is Market Based Analysis.

16. What is the Application of A-Priori algorithm?


ANS:- 1) In education field :- Extracting Association rules data mining of admitted students
through charcteristics and specalities.

2) In medical field :- Analysis of patient’s db-


3) In forestry:- Analysis of probability of forest fire .
4) In Amazon recommender system.
5) Google Autocomplete feature.

17. What is Market Basket Analysis? Explain with suitable example?


ANS:- In market basket analysis (also called association analysis or frequent itemset mining),
you analyze purchases that commonly happen together. For example, people who buy bread
and peanut butter also buy jelly. Or people who buy shampoo might also buy conditioner. What
relationships there are between items is the target of the analysis. Knowing what your
customers tend to buy together can help with marketing efforts and store/website layout.

18. Who propose A-Priori algorithm?


ANS:- Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding
frequent itemsets in a dataset for boolean association rule.

19. What is minimum support and minimum confidence?


ANS:- 1. Minimum support is applied to find all frequent itemsets in a data set.

2. These frequent itemsets and the minimum confidence constraint are used to
compose the rules. Finding all frequent itemsets in a data set is a complex procedure
since it involves analyzing all possible itemsets.

20. What is use of Tokenize operator?


ANS:- It is the process of breaking up a given text into units called tokens.
Tokens can be individual words, phrases or even whole sentences. In the process of
tokenization, some characters like punctuation marks may be discarded. The tokens
usually become the input for the processes like parsing and text mining.

21. What are different modes of Tokenize operator?


ANS:-

22. How to use Read Document operator?


ANS:- 1.- Create a folder under C: and place an empty TXT file inside it:
2.- Open TXT file and add any text to it:

3.- Create a process with "Read from File" operator and configure it as follow:
Few things to notice:

- The amount of "\" in the path is mandatory, 3 after the Drive and 2 on the folder.
- On this example the 'Administrator' user is used directly, this can be changed using process
variables or even without a user if Process Automation service is configured to be executed
by specific user with higher permissions:
When this process is executed, the result is shown as follow:
The operator shows the following information:

- Red: Name of the file read.


- Yellow: Path where this file is located.
- Green: User used to read the file.
- Blue: Content of the file.

23. Why we use Filter token and Filter stop word?


ANS:- Removes stop words from a token stream. In addition to English, the stop
filter supports predefined stop word lists for several languages. You can also specify
your own stop words as an array or file. The stop filter uses Lucene's StopFilter.

24. How to use Filter Class operator?


ANS:- The filter() method filters the given sequence with the help of a
function that tests each element in the sequence to be true or not.

Syntax:- filter(function, sequence)


Parameters:
function: function that tests if each element of a
sequence true or not.
sequence: sequence which needs to be filtered, it can
be sets, lists, tuples, or containers of any iterators.
Returns:
returns an iterator that is already filtered.

STQA Questions:-
Q #1) What is the difference between Quality Assurance, Quality Control, and
Testing?
Answer: Quality Assurance is the process of planning and defining the way of
monitoring and implementing the quality(test) processes within a team and
organization. This method defines and sets the quality standards of the projects.
Quality Control is the process of finding defects and providing suggestions to
improve the quality of the software. The methods used by Quality Control are
usually established by quality assurance. It is the primary responsibility of the
testing team to implement quality control.

Testing is the process of finding defects/bugs. It validates whether the software


built by the development team meets the requirements set by the user and the
standards set by the organization.

Here, the main focus is on finding bugs and the testing teams work as a quality
gatekeeper.

Q #2) When do you think QA activities should start?


Answer: QA activity should start at the beginning of the project. The more early it
starts the more beneficial it is to set the standard for achieving the quality.
The cost, time and efforts are very challenging in case the QA activities get
delayed.

Q #3) What is the difference between the Test Plan and Test Strategy?
Answer: Test Strategy is at a higher level, mostly created by the Project Manager
which demonstrates the overall approach of the testing for the entire project,
whereas the Test plan depicts how the testing should be performed for a
particular application, falling under a project.
Q #4) Can you explain the Software Testing Life Cycle?
Answer: Software Testing Life Cycle refers to a testing process that has specific
steps to be executed in a definite sequence to ensure that the quality goals have
been met.
Q #5) How do you define a format of writing a good test case?
Answer: The format of Test Case includes:
• Test case ID
• Test case description
• Severity
• Priority
• Environment
• Build version
• Steps to execute
• Expected results
• Actual results
Q #6) What is a good test case?
Answer: In simple words, a good test case is one that finds a defect. But all test
case will not find defects, so a good test case can also be one which has all the
prescribed details and coverage.
Q #7) What would you do if you have a large suite to execute in very less time?
Answer: In case we have less time and have to execute the larger volume of test
cases, we should prioritize the test case and execute the high priority test cases
first and then move on to the lower priority ones.
This way we can make sure that the important aspects of the software are
tested.

Alternatively, we may also seek customer preference that which is the most
important function of the software according to them, and we should start
testing from those areas and then gradually move to those areas which are of
less importance.

Q #8) Do you think QA’s can also participate to resolve production issues?
Answer: Definitely!! It would be a good learning curve for QA’s to participate in
resolving production issues. Many time production issues could be resolved by
clearing the logs or making some registry settings or by restarting the services.
These kinds of environmental issues could be very well fixed by the QA team.

Also, if QA has an insight into resolving the production issues, they may include
them while writing the test cases, and this way they can contribute to improve
quality and try to minimize the production defects.

Q #9) Suppose you find a bug in production, how would you make sure that the
same bug is not introduced again?
Answer: The best way is to immediately write a test case for the production
defect and include it in the regression suite. This way we ensure that the bug
does not get introduced again.
Also, we can think of alternate test cases or similar kinds of test cases and
include them in our planned execution.

Q #10) What is the difference between Functional and Non-functional testing?


Answer:
Functional testing deals with the functional aspect of the application. This
technique tests that the system is behaving as per the requirement and
specification. These are directly linked with customer requirements. We validate
the test cases against the specified requirement and make the test results as
pass or fail accordingly.
Examples include regression, integration, system, smoke, etc
Nonfunctional testing, on the other hand, tests the non-functional aspect of the
application. It does not focus on the requirement, but environmental factors like
performance, load, and stress. These are not explicitly specified in the
requirement but are prescribed in the quality standards. So, as QA we have to
make sure that these testing are also given sufficient time and priority.
Q #11) What is Negative testing? How is it different from Positive testing?
Answer: Negative testing is a technique that validates that the system behaves
gracefully in case of any invalid inputs. For example, in case the user enters any
invalid data in a text box, the system should display a proper message instead of
the technical message which the user does not understand.
Negative testing is different from positive testing in a way that positive testing
validates that our system works as expected and compares the test results with
the expected results.
Most of the time scenarios for negative testing are not mentioned in the
functional requirement documents. As a QA we have to identify the negative
scenarios and should have provisions to test those.

Q #12) How would you ensure that your testing is complete and has good
coverage?
Answer: Requirement Traceability Matrix and Test coverage matrices will help us
to determine that our test cases have good coverage.
Requirement traceability matrix will help us to determine that the test conditions
are enough so that all the requirements are covered. Coverage matrices will help
us to determine that the test cases are enough to satisfy all the identified test
conditions in RTM.

Q #13) What are the different artifacts you refer to when you write the test
cases?
Answer: The main artifacts used are:
• Functional requirement specification
• Requirement understanding document
• Use Cases
• Wireframes
• User Stories
• Acceptance criteria
• Many a time UAT test cases
Q #14) Have you ever managed writing the test cases without having any
documents?
Answer: Yes, there are cases when we have a situation where we have to write
test cases without having any concrete documents.
In that case, the best way is to:
• Collaborate with the BA and development team.
• Dig into mails which have some information.
• Dig into older test cases/regression suite
• If the feature is new, try to read the wiki pages or help of the
application to have an idea
• Sit with the developer and try to understand the changes being
made.
• Based on your understanding, identify the test condition and send it
to BA or stakeholders to review them.
Q #15) What is meant by Verification and Validation?
Answer:
Validation is the process of evaluating the final product to check whether the
software meets the business needs. The test execution which we do in our day to
day life is the validation activity which includes smoke testing, functional testing,
regression testing, systems testing, etc.
Verification is a process of evaluating the intermediary work products of a
software development lifecycle to check if we are in the correct track of creating
the final product.
Q #16) What are the different verification techniques you know?
Answer: Verification techniques are static. There are 3 verification techniques.
These are explained as follows:
(i) Review – This is a method by which the code/test cases are examined by the
individual other than the author who has produced it. It is one of the easy and
best ways to ensure coverage and quality.
(ii) Inspection – This is a technical and disciplined way to examine and correct
the defects in the test artifact or code. Because it is disciplined, it has various
roles:
• Moderator – Facilitates the entire inspection meeting.
• Recorder – Records the minutes of the meeting, defects occurred,
and other points discussed.
• Reader – Read out the document/code. The leader also leads to the
entire inspection meeting.
• Producer – The author. They are ultimately responsible to update
their document/code as per the comments.
• Reviewer – All the team members can be considered as a reviewer.
This role can also be played by some group of experts is the project
demands.
(iii) Walkthrough – This is a process in which the author of the document/code
reads the content and gets the feedback. This is mostly a kind of FYI (For Your
Information) session rather than seeking corrections.
Q #17) What is the difference between Load and Stress testing?
Answer:
Stress Testing is a technique which validates the behavior of the system when it
executes under stress. To explain, we reduce the resources and check the
behavior of the system. We first understand the upper limit of the system and
gradually reduce the resources and check the system behavior.
In Load testing, we validate the system behavior under the expected load. The
load can be of concurrent user or resources accessing the system at the same
time.
Q #18) In case you have any doubts regarding your project, how do you approach?
Answer: In case of any doubts, first, try to get it cleared by reading the available
artifacts/application help. In case of doubts that persist, ask an immediate
supervisor or the senior member of your team.
Business Analysts can also be a good choice to ask doubts. We can also convey
our queries with the development team in case of any other doubts. The last
option would be to follow up with the manager and finally to the stakeholders.

Q #19) Have you used any Automation tools?


Answer: The answer to this question is very much exclusive to the individual.
Reply to all the tools and strategies of automation that you have used in your
project.
Q #20) How do you determine which piece of software requires how much
testing?
Answer: We can know this factor by finding out the Cyclomatic Complexity.
The technique helps to identify the below 3 questions for the programs/features
• Is the feature/program testable?
• Is the feature/program understood by everyone?
• Is the feature/program reliable enough?
As a QA, we can use this technique to identify the “level” of our testing.

It is a practice that if the result of cyclomatic complexity is more or a bigger


number, we consider that piece of functionality to be of complex nature and
hence we conclude as a tester; that the piece of code/functionality requires in-
depth testing.
On the other hand, if the result of the Cyclomatic Complexity is a smaller
number, we conclude as QA that the functionality is of less complexity and
decide the scope accordingly.

It’s very important to understand the entire testing lifecycle and should be able
to suggest changes in our process if required. The goal is to deliver high-quality
software and in that way, a QA should take all the necessary measures to
improve the process and way the testing team executes the tests.

You might also like