Bits - Bytes - Data Digest - January - Editio - 2023 Edition

BITS & BYTES
manish.kakumani22@gmail.com
WIREJ2S1OC
JANUARY
2023 EDITION
This file is meant for personal use by manish.kakumani22@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
January 2023 Edition
WHAT’S INSIDE?
Leadership Speaks 03
Great Learning Journey 05
Discover 07
That’s A Good Question! 09
What’s New? 12
Industry Trends
13
WIREJ2S1OC
Data Science at Work 14
AI at Work 15
Mentor Speaks 16
Crossword Solution 18
This file is meant for personal use by manish.kakumani22@gmail.com only. 02

LEADERSHIP SPEAKS
your team (juniors). Loyalty to the organisation
does not simply imply a long tenure; it also entails
making honest and continuous efforts to achieve
the organization’s goals. On the other hand, you
must be loyal to your junior team members who
look to you for guidance and inspiration. Leaders
and organisations can demonstrate loyalty to their
teams by not overburdening them, maintaining
respectful interactions, identifying developmental
areas, and providing the necessary support.
RISHABH GUPTA
Associate Director, Sales, Great Learning c) Meritocracy – Meritocracy is not diametrically
1. What are your core values and how do you opposed to loyalty. A meritocratic workplace
ensure that the organization and its activities are promotes high performers rather than rewarding
aligned with them? someone based on tenure, relationship, or any
other factor. These are also the places where high
For me, the values of any individual/ performers feel valued, are not dragged down by
organization should be easily understandable bean counters, and can give their all.
and implementable rather than lofty concepts.
I conduct my business keeping in mind the 2. Talk about a leader that inspires you and why?
following:
WIREJ2S1OC
I admire all entrepreneurs, but especially those who
a) Integrity – Being true to your mission and build long-term businesses rather than those that
maintaining Integrity in your communication and rely on borrowed funds. Mr Sridhar Vembu was
delivery will provide a long-term advantage in born in a small village in India but went on to build
a world where sales and marketing are taking a hugely profitable business (Zoho Corporation)
over and enabling the distribution of low-quality with no outside funding. Despite being a billionaire,
products and services. False promises are strictly he has remained true to his roots and is extremely
prohibited. humble in his demeanour. He operates in rural
Tamil Nadu and has established schools and
b) Loyalty – loyalty is a two-way street, and you training centres for rural people to learn software
should be loyal to both your organisation and development, making a significant impact on the
ground.

3. What advice would you give someone going

into a leadership position for the first time?
One of the most difficult but crucial things for

young leaders to remember, in my opinion, is not
to develop a big ego and become carried away
by success. What worked for you may or may not
work for someone else, and rather than imposing
your thought process on your juniors, you should
create a customised path for their success. Second,
don’t be afraid to continue seeking advice from
your seniors. Nobody will judge you for not
knowing all of the answers.
4. What’s the best book/movie/series you’ve

read/watched this year? Do share takeaways.
Nowadays, I don’t have much time to watch Netflix

or other entertainment, and I prefer to watch
live sports. Due to my love of sports, I did watch
a documentary called ‘Down Underdogs’ about
India’s most recent tour of Australia. It contains
some wonderful behind-the-scenes anecdotes as
well as lessons on leadership and team spirit - how
WIREJ2S1OC
a team can become much greater than the sum of
its parts and achieve the unthinkable. Some of the
stories of Indian cricketers who learned their trade
through gully cricket but went on to represent their
country are extremely inspiring.

GREAT LEARNING JOURNEY
learning of the language. Furthermore, Python is

one of the easier languages to learn than others
such as C# and Java.
I also believe that data science appears to be very

difficult without a strong mentor because there
are so many concepts to understand, such as
advanced statistics, python, sql, tableau, machine
learning, and so on. The mentoring sessions are the
best part of Great Learning; they were essential
VINYAS SREEDHAR for clarifying our fundamental ideas and principles,

and mentors provide precisely this function.
PGPDSBA ALUMNUS
Mentors would explain the concepts in simple
Following the completion of my B.A. (Hons) in terms as I went through the modules and studied
Economics, I began working as a financial analyst them each week, which assisted me in learning and
at the custodian bank Northern Trust Corporation establishing a solid foundation because they are
in the United States. I had the opportunity to work skilled industry veterans who are excellent in their
there for nine years before moving on to a position respective fields. The combination of theory and
as an associate manager in the risk and compliance actual industry experience is extremely beneficial
department at Standard Chartered Bank, a bank
WIREJ2S1OC for hopefuls like me.
with headquarter in the United Kingdom.
Because of my newly acquired expertise, I can see
During the COVID lockdown, I planned to learn facts from a different perspective. I am capable of
new skills, leave banking operations, and work on investigating and comprehending the behaviour
something intriguing and difficult. Because I come and properties of any provided data. The data can
from a non-IT background, I expected learning help me learn new things. Capable of developing
to code would be challenging. At the time I first ML and statistical models for forecasting or
learned about the Great Learning PGPDSBA resolving business issues.
Program, data science was already a popular
buzzword on the internet and was regarded as
the sexiest career of 2022. Before enrolling in this
course, I gave it a lot of thought and research. I
wasn’t worried about learning online because I
knew that regardless of the method, I had to put
in 80% of the preparation time offline by reading
through a lot of internet content and resources.
Since Python is presently the industry standard

for DS and ML, I was concerned about my ability
to learn it. However, the training included a built-
from-scratch Python module that greatly aided my

GREAT LEARNING JOURNEY

The mentoring sessions are extremely beneficial.
The mentor covers every single case study in detail
and with lengthy explanations—the mentor clears
all of our doubts during the sessions. The case
studies are my favourite part of the mentoring
sessions. Case studies help us understand the
business use case and apply the appropriate ML
techniques.
I’ve learned a lot during this learning phase, but

there’s still a long way to go. Analysis abilities have
MOHIT BENI greatly improved. I’ve learned a new language

(Python), which is crucial for AI and ML.
PGP AIML ALUMNUS
I am currently employed as a java and big data I believe we would have to work harder to improve
engineer at Impetus Technology Solutions. I our skills. We may not want to do it at times,
have 2.5 years of experience in the IT industry but remember that we started it to improve
with experience in GCP, DevOps, Jenkins, Linux, our skills and learn more. It will undoubtedly be
Hadoop, and strong command of data structures uncomfortable, but growth occurs only during such
and algorithms. At cognizant technology solutions, times. Just don’t think or worry about it.
I worked as a java and bigquery developer.
WIREJ2S1OC Simply do it.
Prior to beginning this programme, I had

encountered difficulties with coding concepts.
For me, the data structure section was quite
perplexing. I began working on it every day, and
now I’m getting better at understanding the
concepts, which is assisting me in learning Machine
Learning Algorithms more efficiently.
I had not considered enrolling in a self-paced

PGPAIML programme that would be conducted
online. However, looking at the curriculum,
faculties, and my desire to learn more and upskill
in the AI and ML fields aided me in this decision.
I mentally prepared myself for more work and
commitment.
The mentoring sessions are the most important

part of it. The sessions assist us in better
understanding the use case and case studies. It
clarifies why we are doing what we are doing and
what a better approach could be.

DISCOVER
WIREJ2S1OC
Data Science vs Machine What are Expert Systems in

Learning and Artificial Artificial Intelligence? 2023
Intelligence: The Difference
Explained (2023) Expert systems in Artificial Intelligence are a
While the terms Data Science, Artificial Intelligence prominent domain for research in AI. It was initially
(AI), and Machine Learning are all related to the introduced by researchers at Stanford University
same domain, they have different applications and and was developed to solve complex problems in a
meanings. There may be some overlap in these particular domain. This blog on Expert Systems in
domains from time to time, but each of these three Artificial Intelligence will cover the following topics.
terms serves a distinct purpose.
To know more about it, visit:

To know more about it, please visit:
https://www.mygreatlearning.com/blog/expert-
https://www.mygreatlearning.com/blog/difference- systems-in-artificial-intelligence/
data-science-machine-learning-ai/

A Complete understanding of
LASSO Regression
In this blog, we will see the techniques used to
overcome overfitting for a lasso regression model.
Regularization is one of the methods widely used
to make your model more generalized.
Lasso regression is a regularization technique. It is
used over regression methods for a more accurate
prediction. This model uses shrinkage. Shrinkage
is where data values are shrunk towards a central
point as the mean. The lasso procedure encourages
simple, sparse models (i.e. models with fewer
parameters). This particular type of regression
is well-suited for models showing high levels of
multicollinearity or when you want to automate
certain parts of model selection, like variable
selection/parameter elimination.
To know more about it, please visit:
https://www.mygreatlearning.com/blog/
understanding-of-lasso-regression/
WIREJ2S1OC

THAT’S A GOOD QUESTION!

• Parallel Learning: It can divide the data into
In this edition, our question will be -
smaller chunks and run processes in parallel.
“What is so extreme about extreme gradient
boosting?”
• Sparsity-Aware Split Finding: When there
is some missing data, Aware Split Finding
Thenkavi says - XGBoost is an abbreviation for
calculates Gain by moving observations with
Extreme Gradient Boosting. Extreme Gradient
missing values to the left leaf. The process
Boosting is a tree-based algorithm in the Machine
is then repeated by placing them in the
Learning supervised branch. It is applicable to
appropriate leaf and selecting the scenario with
both classification and regression problems. It is
the highest Gain.
a distributed gradient boosting library that has
been optimised to be highly efficient, flexible, and
• Cache-Aware Access: XGBoost uses the CPU’s
portable. It uses the Gradient Boosting framework
cache memory to store gradients so it can
to implement machine learning algorithms.
calculate similarity scores faster.
Boosting is an ensemble learning technique that
combines multiple weak learners in a sequential
Goals of XGBoost:
method to improve observations iteratively. This
method aids in reducing the high bias that is
• Execution Speed: XGBoost was almost always
common in machine learning models. Gradient
faster than the other benchmarked R, Python
boosting, also known as a Generalization of
Spark, and H2O implementations, and it is
AdaBoost, is one of the most powerful techniques
WIREJ2S1OC significantly faster when compared to the other
for developing predictive models. Gradient Boost’s
algorithms.
main goal is to minimise the loss function by using
a gradient descent optimization algorithm to add
• Model Performance: XGBoost dominates
weak learners.
structured or tabular datasets on classification
and regression predictive modelling problems.
Gradient Boosting with XGBoost is a more
regularised version of Gradient Boosting. XGBoost
When to use XGBoost?
employs advanced regularisation (L1 & L2) to
improve model generalisation. When compared
• When there are more training samples
to Gradient Boosting, XGBoost provides superior
available. Ideally, more than 1000 training
performance. It has a very fast training time and
samples and less than 100 features, or when
can be parallelized across clusters.
the number of features exceeds the number of
training samples.
XGBoost optimizations: In addition to its unique
method of generating and pruning trees, XGBoost
• When there is a mixture of categorical and
includes several built-in optimizations to make
numeric features or just numeric features.
training faster when working with large datasets.
Here are a few of the most important:
• Approximate Greedy Algorithm: Instead of

evaluating each candidate split, this algorithm
uses weighted quantiles to determine the best
node split.

THAT’S A GOOD QUESTION!

Gaurav Das says - Extreme Gradient Boosting hyperparameters can take since erroneous values
(XGboost) is a tree-based ensemble algorithm will not serve the purpose of tuning the model
that provides faster output than existing gradient effectively.
boosting frameworks. It has both linear model
solver and tree learning algorithms, so it can In addition to this, the XGBoost algorithm
be used for both regression and classification. automatically learns missing values during training.
Extreme Gradient Boosting is faster because This means that it can handle missing values by
it supports parallel computation on a single default. However, it is always better to be aware of
machine. This is a definite advantage over existing the restrictions on such features. For example, the
implementations which support single-threaded gblinear algo treats missing values as zeros. This
processing. The additional factor which makes it may not be a valid practice for most datasets.
extreme and different from traditional boosting
trees is that, Xgboost uses a more regularized Like most other algorithms, XGBoost works only
model formalization to lower the likelihood of over- with numeric values. So, object/string features
fitting the model. need to be subjected to pre-processing tools
like One hot encoding, Dummy encoding, Label
The versatility and accuracy of this algorithm can Encoding etc. High cardinality in features is
be attributed to its focus on model complexity, also a major problem for most algorithms. This
which is a departure from previous gradient can be reduced by clubbing various labels/
boosting algorithms that only focused on categories based on statistical analyses. Tree
improving impurity/gain. It applies a variety of based algorithms work faster if the data is
WIREJ2S1OC
regularization techniques to avoid overfitting. discrete in nature because of the process used in
creating the trees. Recursive binary splitting works
In a departure from classical regression trees, slower in case of continuous data as gin index
L1 regularization is applied to leaf scores rather computation becomes very tedious. Continuous
than directly to features as in regression. The L1 data can be converted into categories/labels using
regularization reduces the impact of less-predictive the binning technique based on either quantile
features but it is not as severe as in regression (when continuous data has skewness) or constant
where Lasso can set the contribution of features to interval width (when the continuous data is nearly
zero. Xgboost can use both L1 and L2 which serves symmetrical). Note: XGBoost can handle outliers as
the purpose of Elasticnet regularization. well. However, this approach may not always lead
to significant improvement in accuracy for certain
This algorithm also provides an additional set datasets. So, we need to execute this method in an
of features for performing cross-validation and iterative process.
calculating the importance of features, based
on impurity reduction. Xgboost provides a wide
range of hyper-parameters which provides a good
opportunity to the user for better optimization
and produce robust efficient predictions. So,
it becomes imperative that we tune these
hyperparameters using either Grid Search Cross
Validation or Randomised Search tuner. It is also
important to know the range of values which these

Let us go through some of the important

hyperparameters in Xgboost. Although the ideal
range of these parameters completely depends
on the dataset which we are working on, I have
also mentioned the ideal range for some of these
parameters which had yielded good results over
the years of my corporate experience.
1. booster: gbtree (tree based) or gblinear (linear

function).
2. eta: Range is 0 to 1. Lower value indicates

higher regularization.
3. gamma: Acceptable range is 0 to ∞. Higher

value leads to a more conservative algorithm
4. max_depth: Acceptable range is 0 to ∞. Ideal

range is 5 to 25.
5. min_child_weight: Acceptable range is 0 to ∞.

Ideal range is 2 to 8.
6. max_delta_step: Acceptable range is 0 to ∞.
WIREJ2S1OC
Ideal range is 1 to 10.
7. subsample: Range is 0 to 1. It represents the

percentage of the data instances required to
grow trees.
8. colsample_bytree: Range is 0 to 1. It represents

the percentage of the feature columns required
to grow trees.
9. lambda and alpha: Range is 0 to 5.
10. scale_pos_weight: It si the ratio of the count of

negative class to that of positive class. This is
used for handling binary imbalanced data.

WHAT’S NEW
What is ChatGPT and generative AI? NXP Protects Machine Learning IP with
eIQ® Model Watermarking
ChatGPT is a free chatbot that can generate
responses to almost any question. It was developed NXP® Semiconductors has added the eIQ Model
by OpenAI and will be available for public testing Watermarking tool to its eIQ Toolkit for machine
in November 2022. It is already widely regarded learning development. eIQ Model Watermarking
as the best AI chatbot ever created. According is the market’s first practical tool for protecting
to ecstatic fans who posted examples online, the developers’ machine learning investments.
chatbot has been known to generate computer Developers can use the tool to demonstrate that
code, college-level essays, poems, and even half- a machine learning model is a replica or clone of
decent jokes. Others, from tenured professors to their intellectual property without having access to
advertising copywriters, among the diverse range the model’s source code, and the model is granted
of professionals who make a living by creating copyright ownership.
content, are trembling. Why is it significant? The adage “data is the
new gold” has never been more true than in
Despite the reservations that many people have the field of machine learning, where developing
expressed about ChatGPT, machine learning has highly effective models is critically dependent
undeniably positive potential (and AI and machine on domain expertise and good training data.
learning more generally). Since its widespread Despite the fact that machine learning models
adoption, machine learning has had an impact are a significant and differentiating asset to a
on a variety of industries, enabling tasks such as
WIREJ2S1OC firm, they typically lack the copyright protection
high-resolution weather forecasting and medical that prevents unauthorised copying or cloning
imaging analysis. It is obvious that generative AI of regular software. Developers can use eIQ
tools like ChatGPT and DALL-E (an AI-generated Model Watermarking to protect their proprietary
art tool) have the potential to change the way intellectual property (IP) and copyright their
many professions are carried out. However, the full machine learning models while also detecting
scope of that impact and its consequences remain illegal use.
unknown.
What is generative AI?
Generative AI is artificial intelligence (AI) that

creates new content such as audio, code,
images, texts, simulations, and videos. This
includes ChatGPT algorithms. Recent industry
developments may fundamentally alter how we
think about content creation.

INDUSTRY TRENDS
Anticipating the potential of such Large Language
In the last edition, we read about the
Models few pertinent questions come into mind:
importance of Large Language Models. Let us
read about the challenges faced in developing
1. How much is it going to impact human lives
Large Language Models
due to its enormous and unknown possible
The major challenges can be divided into three uses (or misuses)?
parts:
2. Impact on the labour market (what should be
1. Proper understanding of the limitations of the automated vs what should not be?)
model developed:
The presence of the statistical relationships in 3. Misinformation or Disinformation can be a
the dataset used for training the models can be real concern. Incorrect narratives may get
biased as they itself may have discriminatory generated which are much cheaper and easily
texts, historical bias etc. This will lead to incorrect be used for false propaganda compared to
associations between groups of the population such man-made propaganda.
along with their attributes (sentiments, culture,

custom etc.). The key here is to understand these 4. Would formulating principles/norms help in
underlying characteristics and formulate proper restricting or for that matter making correct
methods to curate the datasets used. Full-fledged use of such enormously potential models?
research is on in coming up with embeddings for

the Language Models so that it can understand the 5. Should Academia involve in research to
WIREJ2S1OC
Social Bias. develop tools and metrics to curb the potential
misuse and harm?
2. Model fine-tuning:
A. Choice of the right dataset (domain-specific) for
fine-tuning
B. Sufficient amount of data to fine-tune the model
C. Formulating fine-tuning guidelines to get the
best performance of the model
3. Choice of the correct set of parameters:

It has been observed that the bigger the model
better is the model performance. But at the same
time, it becomes more and more expensive and
time-consuming as well to build a large model.
Hence striking the right balance between quality
and cost becomes inevitable

DATA SCIENCE AT WORK

own opinion on whether data should be kept or
deleted. He was processing more than a million
rows of data. However, because of the tool’s
design, he was able to automate laborious work
that often required months of work.
He also made the reporting process easier for top

management by using a Python script to clean up
the data with pandas and other tools. This data
was then used to constantly update and feed
KRISHNA PRASAD D dashboards with various visualisations. End users
PGPDSBA ALUMNUS could use the dashboards’ clear data to determine
which nation or asset was producing the most
Krishnaprasad D is currently employed as a greenhouse emissions while it was operational.
Technical Data Specialist with a well-known energy
company. His responsibilities include acting as a The timely collection of such information aided
business analyst to understand the needs of his in the auditing of his organization’s compliance
end users, who are mostly senior management, with environmental laws. Even better, he was
and maintaining BI-based dashboards for tracking able to automate and reduce manual labour by
financial audits of petroleum assets. He has approximately 50%. When compared to manual
WIREJ2S1OC
recently developed a strong interest in employing labour, there was a reported cost savings of up to
methods to ensure the accuracy and accessibility 30%.
of private data. He believes that the impact of
data alone in making critical business decisions is When his BI solution with near-real-time data was
gradually increasing in today’s business ecosystem. implemented, process efficiency was a critical side
effect. Upper management in his organisation was
He believed that disclosing data on greenhouse able to make sound decisions without being overly
gas emissions was critical to establishing investor reliant. The entire exercise validated and reinforced
trust. His company’s data, on the other hand, was his belief that analytics can improve organisational
manually entered into Excel sheets from various effectiveness. Because proactive reporting of
offices around the world. As a business analyst, he greenhouse gas emissions data is uncommon
was also tasked with working with international among oil companies, his company has received
offices to find a single BI platform that could meet public praise.
all of his organization’s KPIs or requirements.
He made use of Python, SQL, and Power BI.

Python simplified the analysis process, while
SQL helped him with initial data storage in tables
that could later be saved in Azure databases and
accessed by Power BI for visualisation. Dealing
with duplicate rows and null values presented
additional challenges because each asset had its

AI AT WORK
Even though we started this to try out the
concepts of AIML into our application just for
curiosity. However, after implementing this solution
to one of our modules, we were blown away by the
model’s efficiency and accuracy. We reduced the
use of the Shipping Rates API in the Awards feed
module. Shipping Rates API is now only used for
Real-Time Checkout processes.
Having said that, what began as a curiosity has led
to the implementation of AIML in our application;
we have listed a few modules in our application
where AIML can assist in increasing performance
VISHNU KP
and producing better results.
PGPAIML ALUMNUS
My name is Vishnu KP, and I have completed my

B.E in Information Science & Engineering. For
the past seven years, I’ve worked as a Program
Development Manager for Mansion Ecommerce. To
put the concepts I learned in Great Learning to the
test, I created a “Ship Rate Prediction” Model that
predicts ship rates for an item based on a given
WIREJ2S1OC
country and zip code. Currently, we have a Module
that does the same thing, but with different TRY’s;
when all TRY’s fail to get a ship rate, our current
module initiates a shipping rates API to get a
product’s ship rates. We reduced the shipping rate
API call by 98% after developing the “Ship Rate
Prediction” Model.
LinearRegression, SVR, RandomForestRegressor,
XGBRegressor, and TensorFlow Neural Network
regression models were used to create this model.
Out of the above-mentioned NN Regression Model,
it gives good accuracy. So I used the NN Model for
“Ship Rate Prediction.”
I compare the “Ship Rate Prediction” Model
to the “Boston House Pricing” Model, which is
explained in one of our Great Learning course
videos. I followed this case study and modified it
to meet my needs when developing the “Ship Rate
Prediction” Model. The current model has a 95%
accuracy rate when compared to test data. And
in reality, the predicted Ship Rate is ± $0.50 to ±
$0.95, which is acceptable.

MENTOR SPEAKS
techniques on the internet to support the above

decisions and found impactful results. This was the
biggest driver for me to pursue a career in Data
Science.
Q3. What preparations you did to achieve your

goal?
From my experience, I can surely say, Only way

to learn Data Science is by doing it. Results can
surely motivate anyone working with Data. I would
In this edition, we will hear about our mentor
always apply whatever stat & ML techniques I learn
Udaykumar D’s journey of becoming a data
in the domain I understood well (Stock Market)
science industry expert.
and the Insights, predictions, and forecasts I was
able to get from the data were helping me to take
Q1. Describe your current role.
many right decisions (more importantly stopping
me from bigger losses). In the process, I felt the
I am currently working as Deputy Manager in Data
need for very structured & well-guided learning.
Science (Retail Domain), which is more similar to
PG course offered by Great learning really played
a Lead Data Scientist. Primary responsibilities of
a pivotal role in my career in terms of widening the
my role include identifying solutions for business
WIREJ2S1OC
applications of DS in other domains and knowing
problems using ML/DL techniques, locating
the right practices.
opportunities to support business objectives using
DS tools & techniques, and guiding a team of data
While Hackathons are a great place to learn how to
scientists with solution design, execution and
apply techniques, I would suggest aspirants focus
delivery.
on working with real-time data (like Cricket, FIFA,
Stock Market, Weather, Government data etc).
Q2. How did you decide you want to be a Data
Which is where one can learn the challenges, and
scientist?
patterns and become a master in handling them
well.
It was not really a decision or a desire. I love taking
decisions backed by evidence & probable patterns
that lead to an outcome that is most likely to
happen. In my early career days, I was working
in the Stock market (a place where uncertainty
would find its Mother) and witnessed people with
experience always sail safely. All I wanted to figure
out was how someone who just came into the
Stock market would gain that Experience. Here is
where data science helped me. Decisions like what
to, when to, how much to buy and how long to
hold etc can be precisely decided using Stat and
ML techniques on historical data. I learned many

Q4. How did you get your first job and describe In terms of ML algorithms, If you are learning
your journey (difficulties that you faced and how one algorithm learn it completely. I have used
did you overcome) Linear regression to find if a laptop is costly or
not, if a product is a seasonal product or not
My first ever DS role was within the organization, and forecasted stock market price. While Linear
where I found some key areas where DS can help regression is dedicated for regression problems, it
and came out with proper solution architecture to can also be used for classification and forecasting
achieve the same. I would always encourage DS problems. Rather than learning n number of
aspirants to at least consider this route. Domain algorithms, what matters is how well we learn the
knowledge (Subject Matter Expert) is one of the algorithm.
most important skill sets for a Data Scientist and
if someone has worked in an industry then they
must possess this already. Why this is important?
While Data scientist is the hottest job, it is also the
role where many attritions happen. A very recent
survey found 80%+ DS projects are failing and
this is mainly because these models are not able
to make any business impact nor are explainable.
For eg: We may build a highly accurate model that
would predict the customers who would churn
out of our company, but what good in having this
model if it cannot say why these customers would
churn out and what measures the company has to
take to prevent the same? Data Scientists without
WIREJ2S1OC
domain knowledge can never answer this question.
Q5. Advice to budding Data Scientists and

Business Analysts?
It is all about impactful insights that one brings

to the table matters. More than the accuracy of
the model, Focus on the business impact your
model would create. If we are predicting whether a
customer is going to churn or not, whatever factor
affects a customer’s decision should be brought
into the input data. Without them, no model can
perform well, as they say, Quality data - Quality
Results.

CROSSWORD SOLUTION
WIREJ2S1OC
ACROSS DOWN
1. The _______ keyword designates a function with 1. In Python, ________ can be string, numeric, or
no name or with several statements that return its Boolean- LITERALS
results- LAMBDA
3. A _______ serves as a blueprint or template
2. With the use of______, programmers can divide from which objects can be built- CLASS
or decompose a problem into smaller parts, each
of which can carry out a specific task- FUNCTIONS 4. A _______ is a sequence of characters and is
written within single or double quotes- STRING
6. PEP 8 is a manual for python code that contains
______ outlining how to write better-looking 5. A _______ is an ordered immutable collection of
python code- RULES data- TUPLE
7. An ______ is a container that can hold multiple 9. A ________ is a selection of observations from a
items simultaneously in Python- ARRAY population- SAMPLE
8. An ordered changeable collection of objects is

referred to as a ______ in Python- LIST

LEARNING BIRD CHIRPS:
Learning is not attained by chance.

It must be sought for with ardour
and attended with diligence.
- Abigail Adams
THE EDITORIAL TEAM:

WIREJ2S1OC
Surbhi Bhandari Mugdha Deepala Anamika Singhal


Bits - Bytes - Data Digest - January - Editio - 2023 Edition

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bits - Bytes - Data Digest - January - Editio - 2023 Edition

Uploaded by

Copyright:

Available Formats

BITS & BYTES

Great Learning Journey 05

That’s A Good Question! 09

Data Science at Work 14

This file is meant for personal use by manish.kakumani22@gmail.com only. 02

This file is meant for personal use by manish.kakumani22@gmail.com only. 03

3. What advice would you give someone going

One of the most difficult but crucial things for

4. What’s the best book/movie/series you’ve

Nowadays, I don’t have much time to watch Netflix

This file is meant for personal use by manish.kakumani22@gmail.com only. 04

GREAT LEARNING JOURNEY

learning of the language. Furthermore, Python is

I also believe that data science appears to be very

VINYAS SREEDHAR for clarifying our fundamental ideas and principles,

Since Python is presently the industry standard

This file is meant for personal use by manish.kakumani22@gmail.com only. 05

GREAT LEARNING JOURNEY

I’ve learned a lot during this learning phase, but

MOHIT BENI greatly improved. I’ve learned a new language

Prior to beginning this programme, I had

I had not considered enrolling in a self-paced

The mentoring sessions are the most important

This file is meant for personal use by manish.kakumani22@gmail.com only. 06

Data Science vs Machine What are Expert Systems in

To know more about it, visit:

This file is meant for personal use by manish.kakumani22@gmail.com only. 07

To know more about it, please visit:

This file is meant for personal use by manish.kakumani22@gmail.com only. 08

THAT’S A GOOD QUESTION!

• Approximate Greedy Algorithm: Instead of

This file is meant for personal use by manish.kakumani22@gmail.com only. 09

THAT’S A GOOD QUESTION!

This file is meant for personal use by manish.kakumani22@gmail.com only. 10

Let us go through some of the important

1. booster: gbtree (tree based) or gblinear (linear

2. eta: Range is 0 to 1. Lower value indicates

3. gamma: Acceptable range is 0 to ∞. Higher

4. max_depth: Acceptable range is 0 to ∞. Ideal

5. min_child_weight: Acceptable range is 0 to ∞.

7. subsample: Range is 0 to 1. It represents the

8. colsample_bytree: Range is 0 to 1. It represents

9. lambda and alpha: Range is 0 to 5.

10. scale_pos_weight: It si the ratio of the count of

This file is meant for personal use by manish.kakumani22@gmail.com only. 11

What is generative AI?

Generative AI is artificial intelligence (AI) that

This file is meant for personal use by manish.kakumani22@gmail.com only. 12

associations between groups of the population such man-made propaganda.

along with their attributes (sentiments, culture,

research is on in coming up with embeddings for

3. Choice of the correct set of parameters:

This file is meant for personal use by manish.kakumani22@gmail.com only. 13

DATA SCIENCE AT WORK

He also made the reporting process easier for top

He made use of Python, SQL, and Power BI.

This file is meant for personal use by manish.kakumani22@gmail.com only. 14

My name is Vishnu KP, and I have completed my

This file is meant for personal use by manish.kakumani22@gmail.com only. 15

techniques on the internet to support the above

Q3. What preparations you did to achieve your

From my experience, I can surely say, Only way