ChatGPT for Data Analytics Beginner Tutorial

ChatGPT for Data Analytics Beginner Tutorial

data nerds welcome to this tutorial on how to use chat TBT

for DEA analytics we're going to be covering all my tips and
tricks that I've been using in my job as a DEA analyst over
the past year and it's helped automate a lot of portions of my
job saving me up to 20 hours a week so what are we going to
be covering well in the first portion we're going to be
covering the basics of chat gbt from understanding your
options and setting it up to best practices for how to properly
prompt so it actually does your job we'll even be using this
to read graphs yeah it can read graphs kind of crazy when I
found this out anyway in the second portion of this video
we're going to be focusing on Advanced Data analysis a
powerful feature within chat gbt that allows you to write and
read code all without having to code yourself we'll go
through all the steps in the data pipeline from first importing
and exploring a data set to then cleaning it up and creating
beautiful visualizations oh we'll even have a bonus of using
some machine learning to predict values by the end of this
you'll have a full project that you can then showcase on how
you use this tool for data analytics and don't worry if you
don't have any prior data analytical experience or coding
experience none of this is required for this video one last
setting up chat GPT all right so let's get into the options that
you have available for using chat gbt for this course and then
finally we'll go into one of the options on how to actually set
it up which I think it's going to be applicable to most of the
users of this course so the first option is chat gbt plus and
this is the option that I'm using Freelancers use this
contractors and even job Seekers uh some people with even
within companies maybe even be using this so this is going
to be the choice of most people now chat gbd plus here in the
United States is about $20 a month and with this you have an
availability to access their newest and most capable model to
this case it's GPT 4 um this may change be a higher number
uh model depending on when you take the course and we'll
update this course depending on if that affects the course
contents but overall you have access to the newest and
greatest model from there it has some faster response speeds
also you have access to plugins and Advanced Data analysis
and both of these things are the core of what this course is
going to take advantage of to make sure that you're doing
data analytics correctly in chat GPT now the other option is
chat GPT Enterprises and it's going to have a similar
interface that as uh chat gbt plus but it's going to be through
a separate service and it's going to be mainly that your
company is now paying for this chat BT Enterprise Edition
and then you as an employee of the company have access to
it now chat gbt Enterprise solves a lot of problems when
dealing with secure data specifically stuff like Hippa data
confidential or even proprietary data it will all maintain that
safe Chach BT plus doesn't necessarily do this but we're
going to be going over in this course how to safeguard your
data if you have concerns with that so if you have either of
these options such as chat gbt Plus or chat gbt Enterprise this
is the end of the video for you your task for this video will
be to open up a browser and get J gbt loaded for everybody
else we're going to continue on and actually set up chat gbt
plus the first thing to do to get set up is go to open.com and
select try chaty BT from there we're going to select sign up I
use my Google credentials CU I feel that's easier and so I
don't have to forget a password and so you'll use that and
login with your Google credentials it will send you an email
to verify that it's actually you after that you'll be directed
back into that chat that we're going to be operating in for
basically the rest of this course I'll go ahead and accept these
terms and agreements and also these tips so right now we're
using the free version of chaty BT which is this model right
here gbt 3.5 but we need the new and greatest model in order
to get all those Advanced capabilities and advanced analysis
so we need to upgrade to plus we can either do it right here
or you can select it up in this menu on the left hand side and
we can see from this we have the plus version and it's 20
bucks a month right now they have this sign up for wait list
and I don't think you're have to wait that long but they're
pausing it because there's been a lot of different influx based
on these new upgrades of chat gbt and apparently everybody
wants to get in now either if you have the wait list or you're
able to actually sign up immediately which hopefully you
can you'll then be directed to this screen right here which is
where you'll actually be putting in your payment information
they're accepting credit cards right now and you'll be
subscribing for that 20 bucks a month make sure you're
comfortable with paying that 20 bucks per month before
proceeding but just to reiterate you do need this chat gbt Plus
for this course after that you'll be directed back into this chat
and now we'll have all models available so in our case at the
time recording this I have that GPT 4 model and GPT 3.5
we're going to be using the GPT 4 for this course because it
has that browsing and Analysis in it and this home of this
chat is going to be located at chat. open.com and I would
save this to your bookmarks or to your favorite so that way
you can easily access it all right with that now it's your turn
to jump in and actually go through and set up chat gbt plus if
you don't have it set up already and after that we're going to
be jumping into some more examples on how to use this all
right all right in this video we're going to be going over the
layout of Chach gbt and all the different functionality that's
involved with it to get you up and running to do your first
prompt now Chachi BT just recently in November of 2023
went through a layout change and unfortunately I went
through and filmed this entire course and so I'm going back
and refilm some of these videos anyway you're going to
notice in this course sometimes that the old layout is inside
of some of these videos don't be alarmed by this I'm going
through and cting any ones that need to be updated but if you
do notice there's differences in what my Chach BT and your
Chach BT looks alike overall I'm trying to tell you this don't
be concerned anyway let's go through the layout you should
be seeing over here on the left hand side we have our sidebar
and then right here on the um right hand side we have our
actual chat we'll be interacting with our gbt model for the
sidebar you can either close it out or bring it back in up at
the top they have all the different gpts you probably only
only have one GPT right now of chat GPT below this it has
our different chat history and then underneath that you can
refer a friend and then next is settings settings it's a whole
another video because there a lot to go into this so stay tuned
for that one so back to the gpts up at the top gpts you can
actually click the explore menu right here are custombuilt
models built on top of Chad GPT to perform specific
functions so I built a GPT actually for this course called data
analytics and I'll link it below and in the exercise and you
can actually go into this data analytics title GPT and quiz it
on the contents of the course now there's also a whole host of
other gpts as well but the one we're primarily going to be
focusing on besides that chat bot for this course is this one
up at the top that you have already should have and that's
just chat gbt now with this specific one we can go up to the
top leftand corner and you can select the newest greatest
model which I recommend doing and that's going to include
as of filming this dolly browsing and Analysis and this
model is great because it includes everything we're going to
need from this course from browsing the internet to
performing with that Advanced Data analysis plug-in that
we'll be going over in a complete chapter gbt 35 as of
filming this is in the free version we're not going to be really
messing with that then we'll also be jumping into also
plugins in the future specifically this notable plugin but for
the time being let's just stick with that GPT 4 model so let's
prompt chat GPT with our first prompt asking it who the
heck are you and what can you do to find out what some of
the limitations are of it and it goes into telling you a lot of
the stuff that I've told you already now some things to note
with this so it provided a response you can copy this
response you can also like it and dislike it to help feed the
algorithm on whether it's performing good or not you can
also click this regenerate and this is great for if you're getting
response or it's getting held up and you want to regenerate a
new respon response to get it from a different angle and as
you can see it's completely different even a completely
different layout from what we got before I'll be honest I like
this one a little bit more so I'm going to say it was better up
at the top right we have a share icon so you can take this link
that is actually provided with Chad GPT and I'm going to go
ahead and paste it in a new browser right here so that way
you can see it and those even without a Chad GPT account
can go in and actually view the results of what you got from
this and then in the bottom right hand Corner we have this
question mark they have an help and FAQ some release
notes term and policy I really honest I don't really use that
much the one I do use is keyboard shortcuts specifically I
would commit these two to memory the copy last code block
and the copy last response these are great at actually
grabbing different things that I'm getting from Chach BT and
pasting it somewhere else where I may be working the last
thing to note is we can actually change these chat so this is
our chat history we're right now in this one titled data
content wizer and and I don't really like the name of it I can
actually go in and select rename for important chats I like to
begin them with an emoji so that way they're easy
recognizable and then also give it an appropriate title all
right so now it's your turn to perform some tasks I want you
to go into that base Chad gbt model and actually prompt it to
understand similar to that what I asked it who the heck are
you and what can you do additionally I want you to get that
chatbot for this course loaded into your menu so I'm going to
include a link in the exercise for you actually to go to it and
it's going to take you right to here and it should add it to your
sidebar for this one feel free to prompt it any questions about
the course right here they have some recommended things
I'm going to ask it hey what's Luke's course about and from
the transcripts that I built this bot on top of it actually goes
into a lot of the different areas that we're going to go in for
this course so this is pretty cool this will be a great tool for
you actually to quiz yourself and also ask questions if you
get stuck all right with that one see you in the next one all
right let's now get into basic basic prompting techniques that
you need to take advantage of in order to maximize Chachi
BT's capabilities so as you found out from the exercise in the
last video Chachi BT has a knowledge level up to a certain
level and in this case as we're filming this it's up to April of
2023 which is about 6 months ago not too bad so let's
actually try to quiz it on something that happened recently
Sam mman who was the CEO of open AI recently was outed
and they have a new person in let's ask if Chach PT knows
this so I ask it who is the CEO of open Ai and it tells me it
thinks it's Sam Alman still now this doesn't mean that this
model is useless we can actually browse the internet so if I
ask it can you access the internet it's going to tell me nope I
can't accent which is really confusing anyway chat gbt
sometimes is going to hallucinate and it's going to make up
things that it doesn't know that it's capable of you have to
just tell Chad PT that it can do it so if you can see me do
anything in any of these videos you need to just basically
reprompt chat gbt until it is capable of it so I'm going to
prompt chat gbt to use a specific feature Bally going to say
search the internet and find out who the CEO of open AI is
and we'll get in a little search bar right here saying that it's
going to different websites trying to figure out who it's
actually is and we finally have this update on open AO yep
uh Mira is now taken over as the interim CEO so this model
is really nice because not only does it have that internet
browsing like we just did but also analysis which we're
going to be getting to in a future chapter so let's now get into
the core of what this video is actually about and that is what
is a prompt because we need to understand this in order to
best understand how best to use chat gbt and it answers with
this a prompt is a message or instruction that guides or
initiates a response or action and we're going to be working
with improving our prompts a lot with this course because if
not you're going to think that it can actually do a lot of the
tasks that you can actually automate with your job let's get
into some examples so I tell chbt I'm a 5-year-old explain
what prompting is to me in the style of Dr Seuss and it gives
me this pretty nice nursery rhyme about how and what
prompting is and I think it does a pretty good job of
explaining what prompting is this would be pretty good if I
wanted to give it to somebody like my 5-year-old niece now
with this one button you need to notice is this regenerate so I
can actually regenerate a response if I'm not liking it or I
wanted to maybe try a different style I'll do this and then it'll
provides me even new results I like this one a little bit better
because it's a little bit shorter and easier to read this
summarizes pretty well with prompts you guide what I will
say like colors got a bright sunshin day all right so why is
this prompt so much more successful in my opinion well it
comprises of two different parts the first is the context and
the second is the task context is like your background in this
case I'm providing I am a 5-year-old that's the context the
task is explain what prompting is in the style of Dr Seuss
from now forward you're always going to be riding chat chbt
with not only a task but also context and we'll be able to
automate context via custom instructions but we'll get that in
a bit so let's take this to a more extreme example of how it
can actually provide this kind of detailed answer we may
need let's provide it with I am a distinguished Professor with
many academic achievements in the field of AI and machine
learning explain to me what prompting is in a similar format
of an academic research paper with this prompt it goes into a
lot more detail compared to our last example in defining
what a prompt is and if I was an academic Professor I would
say this would probably be more suited to what I would need
that Dr Seuss nursery rhyme so I think this is really good and
we need to get in order to frame it for us so that is going to
be your next task I want you to come up with a context
statement that best describes you in order to get the results
that you want out of Chach BT use the similar example of
explain to me what a prompt is and test different ways of
using that context statement all right I'll see you in the next
one all right all right in this video we're going to be going
over the settings that I have set up for chat gbt in order to
maximize its capabilities and give it the results that I need
now in the previous exercise you should have developed a
personal context statement that best describes you and how
chat gbt should perceive you in order to provide the best
results for me I have this one I'm a YouTuber that makes
entertaining videos for those that work with data AKA data
nerds give me concise answers and ignore all the Necessities
that open I I programmed you with use emojis liberally use
them to convey emotion or at the beginning of any Billet
Point basically I don't like Chach btb rambling so I use this
in order to get concise answers quick anyway instead of
providing this context every single time that I start a new
chat chat gbt actually has things called custom instructions
we can go to the settings down at the bottom lefthand corner
and click custom instructions in here there are two dialogue
boxes the first one is what would you like chat chbt to know
about you to provide better responses this is specifically
related to the context and I have in here the things like I'm a
YouTuber I prefer direct responses now below that it has
how would you like chat GPT to respond and this is more
aimed at getting right the format and the tone that it should
be replying in and so this has the section on giving concise
answers and to use things like emojis you need to make sure
here at the bottom is enabled for new chat so that way
whenever you start one this will be loaded into to it you'll be
adding your custom instructions for the exercise for this
video but let's keep going through this going back into the
settings they have a few things you can actually do first is to
access your plan right now we have chat GPT plus that's
expected next you can access your gpts which I have a whole
video on but it will take you to this menu which you can also
access via clicking explore right here the last thing to go
over in this is the settings and beta first is the general tab that
you can set the theme of either dark or light mode you can
also clear your chat for the beta feature tab you want to have
everything enabled specifically at the time of filming this
you want the plugins and Advanced Data analysis when chat
gbt has new features come out that they want to beta test
check back here and enable it and then you'll be able to get it
within your chats but these are the core two that you
definitely need for this course next is data controls and here
it has whether you want to maintain your chat history and
training now if you do not want open AI to actually use the
contents of your chat to train these models you want to
unclick this whenever you do this though the one drawback
is that it won't save chats greater than 30 days now one thing
to note on security if you're working with confidential or
proprietary data specifically things like Hippa dat you're not
going to want to put this into chat gbt plus I don't feel it's
secure enough for that type of data but a workaround to this
is chbt Enterprises and it's something that you're company
should be purchasing in order to be able to put secure and
confidential data into chat gbt this Enterprise Edition is sock
to compliant which is the same uh security compliance as a
lot of cloud providers like Google Cloud Amazon web
services so if your data is good enough to go in the cloud
there it's probably good enough to go within here but that's
specific to the Enterprise not necessarily chbt plus anyway
nothing from this course is proprietary or confidential so I'm
leaving this box unchecked the next is shared links and you
can go in and actually see all the different links that you
shared before they also have options to export the data and
then delete your account probably wouldn't touch that the
last thing is Builder profile which this is configured for
whenever you're building a GPT basically it has your name
and then if you have a special domain you can set it up here
we're not going to mess with any of that all right so now it's
your turn you have three different things to do the first thing
is go in and actually update your custom instructions the
second thing to do is go into settings and beta and then under
beta feature enable both plugins and Advanced Data analysis
and the last thing is to decide whether you're going to keep
your chat history and training if you're not comfortable with
it turn it off all right with that I'll see you in the next one all
right in this video we're going to be talking about how chat
GPT can now see images and this actually has a very unique
use case for data analytics we're not going to be just using it
to analyze some cute pictures instead we're going to going
actually be using this Vision capability to analyze data so
let's jump in so here I am in chat gbt and I using the most
advanced model at the time GPT 4 now because we're using
this most advanced model we can see down at the bottom we
have this little attach an icon that we can actually open up
and then from there upload a file if I were to change this to
that gbt 3.5 that goes away you can't do it so we need to be
in the most highest and greatest model in addition to this this
model ALS also has built into it Dolly web browsing and
that Advanced Data analysis so a lot of features packed into
this anyway I have some images that I wanted to analyze
instead of using that attachment thing I'm just going to go
ahead and drag it right into here after it's done loading all I'm
going do is press enter and Chachi BT analyzes it it's pretty
interesting with this right it goes on into saying hey it looks
like it's a Cena coding in Python which is really interesting
because it's actually able to not only look at this image but
also apparently read it apparently either from the laptop or
the actual python logo right here in the top left hand corner
now we're not going to be looking at cute panda piics for this
we're going to be having actually a unique use case for data
analytics so I prompted Chad gbt hey make me a graph in
Python and it asked me some more contents about it I said
hey make it a bar chart with various numbers give it random
numbers and make it about something funny anyway it
provided me this graph right here now I want Chachi BT to
actually look at this graph and analyze it so I prompted it
sweet I want you to actually read this graph and tell me the
insides from it cuz remember it looked at that Panda pick it
should be able to look at this and it first provided generic
results without actually any insights from this graph I kept
on trying to prompt it further and eventually got to the point
where I asked it can you actually view this graph and it says
since I'm unable to visually interpret imagees graph I can't
directly read or analyze the specific details now once again
we're getting into limitations of chat gbt you have to be
aware of it can read this graph I can actually come up here
and copy this image and come down into the chat press
contrl V press enter and have it upload to actually interpret it
and in this example it's about superheroes which is ranked
from Superman down to Spider-Man and it actually
pinpoints where these superheroes fall on this graph so let's
get into more of a real use case of data analytics so I have a
graph I want to analyze in it we have four bar charts and
therefore the four major roles in data science data Engineers
scientists analysts and even business analysts in it it shows
the top 10 most in demand skills for each one of these roles
and gives a percentage based on How likely it is to appear in
a job posting now this graph is great but it's a little hard to
interpret I'm trying to understand how these skills relate
across the different roles and I could go through one by one
and trying to analyze and compare this but that's going to
take me quite a bit of time so I just paste this image into chbt
like I did previously that Panda pick and it gets to town
analyzing this in it it identifies four main types of skill first
for python it basically identifies that data engineers and data
scientists have the Samo for SQL it says all skills are
actually requesting this for cloud platforms once again that
goes to that data science and engineering roles and finally it
wraps it up with data Vis tools where it says things like
tableau powerbi are most prominent in data analyst and
business analyst and then it finally gives me that summary
that I was actually looking for basically data engineers and
data scientists are the most similar when it comes to sales
and then data analyst and business analysts also follow some
similarities as well so this analysis would have normally
taken me minutes if not hours to do and now I just got this in
a matter of seconds so I'm really blown away by this feature
of Chachi BT now there's also another unique use case of
this and that's in interpreting graphs you may not understand
or be familiar with take this one for example this is a box
plot of different data science salaries not everybody's going
to be able to read this you yourself may not even be able to
read this so you can take it and feed it in and I did in this
case prompted it explain this graph to me like I'm 5 years old
and it goes into explain it using a color box related analogy
now you could change it up on what kind of analogy or how
you want to explain it to you but I think this is a great use
case especially anytime you're going through this course or
in real world and you're not sure of how to read a
visualization or what to interpret from it you just feed it in
and you'll get the insights back from chat gbt and also we're
not just limited to interpreting graphs or visualizations we
can also use it to interpret data models so here's a screenshot
of a data model inside of powerbi and it shows how all these
different tables are related now let's say I needed to run a
SQL query along this database quering across the sales
territory to sales order to date table I could just throw this
image into chat BT provided the prompt of I want to analyze
the sales order across different sales territories on a monthly
basis and it goes to town actually providing me this SQL
query with the names of the tables and the columns
necessary to get my results that I need this is just
mindblowing to me all right so now it's your turn I included
a bunch of images below feel free to go through and actually
upload each one of these images into chat PT and see what
results you get from it and actually analyzing data and even
these data models all right that see you in the next one hey
that I'll see you in there dad nerds in this chapter we're going
to be going over the Advanced Data analysis plugin and this
plugin is by far one of the most powerful that I've seen
within chat GPT and one of its capabilities is that you can
upload files to the chatbot in order for it to connect to it
analyze it and then provide insight one minor little bug that
I'm finding though is that because you can upload these files
to chat gbt is that the environment that it's running the
python code and that it's storing these files will sometimes
time out and you'll get a warning message saying that the
advanced data analyst beta chat has timed out you may
continue the conversation but previous files links and code
blocks below may not work as expected and so overall I
found that all that you have to do is go back in and whatever
file that we were using previously you just put that file back
into the chat and it picks back up where it left off so it recalls
everything all the analysis that we did previously so you
don't have to worry about that so you will be prompted from
time to time especially if you go away from the chat or come
back to it at another time to have to re-upload any uh files
that we were using I do expect chat gbt to fix this issue
especially with the rise in popular of it um not sure how
they're going to do this or when they're going to do this don't
have information on that but hopefully they do in the future
and then I can get rid of this video in the chapter and you'll
never see it again all right see you in the next one dead nerds
welcome to this chapter on the Advanced Data analysis plug-
in in this we're going to be walking through a typical
example of how I use this plugin in my job as a data analyst
we're going be walking through exploring a data set on data
science job postings to extract insight from it first we're
going to start by downloading and importing this data set
into it and having chat gbt read it next we'll have it explore it
and find some data that probably needs to be cleaned up so
we'll have chat gbt handle this as well from there we'll be
diving into performing some basic statistics and also
exploratory data analysis to extract out some visualizations
to help us learn more about this data set finally we're going
to wrap it up with my favorite part of machine learning and
we're going to actually be using the data inside of this data
set in order to predict salary because we're going to have
salary in this job posting so we'll be able to use the attributes
of this data in order to predict that really excited about this
portion one quick disclaimer on the knowledge level
required for this don't worry too much if you don't know a
lot about what Eda is what machine learning is we're going
to actually go deeper into this in another chapter but for now
I'm going to give you what the basics you need to know in
order to use this plugin for each one of these chapters make
sure that you're actually checking below cuz I'm going have
a link to the data set I'll also have all the prompts in the
description in addition I'll be including a link to my chat gbt
history so you can go in and also check out to see how I
went about analyzing this data set one note um right now
chbt doesn't have the ability to share images so any graphs or
images that I generate in these links that I share with you
you're not going to be able to see it but you'll be able to see
the prompts and the response from chat tot and I think that's
good enough all right as me talking let's actually dive into
this chapter all right in this video we're going to be doing an
intro to Advanced Data analysis and before this we're going
to be doing a comparison between using chat gbt without
this functionality and chat gbt with this functionality so you
really understand how it truly works one note about future
videos you may hear me refer to this as the Advanced Data
analysis plug-in and that's because previously before chat gbt
updated this was a separate type of feature that you had to
actually activate and you could only use this within a chat
but now it's pretty great because you get to use Advanced
Data analysis also called analysis here or data analysis
within a single chat in addition to things like web browsing
and generating images with Dolly so from time to time in
upcoming videos you may notice the UI that you're dealing
with isn't the same as the UI that I have I've gone through all
the different videos and verify that still the same chat that I
input in chat GPT produces the same results so you should
be getting the same exact results even if that UI is different
all right let's get into it one recap from the last video is to
make sure that you have custom instructions set up for your
context or use case right so for me in custom instructions I
have that I'm a YouTuber making entertaining videos for
those who work with data so that way chbt understands what
kind of results I want I could think of an example example
for maybe like a business student to have something like I'm
a business student specializing Finance I'm interested in
finding insights within the financial industry so that would
better shape the students abilities to get prompts so just make
sure that that's filled in because this is going to be the
context that just provided to chat gbt in order to get the best
most optimal results we need to have that with these
instructions be as specific as you can right now it's about a
1500 character limit so feel free to go wild and fill it up with
as much details as possible I found that you're only going to
get better results with more context so let's get into
performing some data analysis and for this we're going to be
do a comparison comparing that gp4 model currently that
has analysis included to GPT 35 without data analysis so
starting with gbt 3.5 first so I prompted it with this analytical
question 10 data nerds are on LinkedIn 50% of them are
unemployed each appli to approximately two jobs how many
jobs jobs were applied to so doing this mental math in my
head we know that 10 jobs probably should be applied to so
let's check it out and chat gbt gets it right so you're probably
like Luke hey this base model without Advanced Data
analysis included can do math well not so fast let's actually
do a more complex problem in it I'm going to have a similar
word example this time I have much bigger and more
complex numbers let's see what the results are I don't know
why chat gbt did all these emojis this is getting a little bit
crazy I'm hoping that's going to stop soon what is going on
and it stopped okay so it says that based on this 57 million
jobs were applied to and you didn't know any better that
probably looks correct but let's actually double check it and
using a calculator we can see that although chat gbt was
close it's actually not correct it's actually off by looks like
close to 100,000 so what happened here why did chat gbt
come up with this value that was actually pretty close to
what the value should have been well with chat gbt we're
working with a large language model and really these type of
models are great at predicting the next word in a sentence
take for example this I have Chach BT fill in the blank for
this of Jack and Jill went up the blank you can probably
guess what it's going to be if you're from America and you
know nursy Rhymes it's going to say Hill well they showed
an emoji available but let's actually ask the for the word okay
uh so we are confirming the word to fill in the blank ishale
similarly this filling in the blank of the next word in the
sentence it can do this with math problems as well look at
this one right here of fill in the blank of this next sentence 2
plus blank equals blank in my mind I kind of know what this
is going to already do it's going to do 2 + 2 = 4 let's try it out
yep and it did 2 + 2 equal 4 so in this case with this gbt 3.5
model that's what it's doing here it's using its General know
knowledge of what it should predict for the best word that
come out next in a sentence and using that to provide us a
value in this case which is not very accurate for data
analytics so that's why anytime we're doing any type of
analysis in here we want to make sure we're using a model
that has Advanced Data analysis let's see how to actually
make sure that you're using it the first way you need to make
sure that you're actually having enabled is going to the beta
features and ensuring that Advanced Data analysis is turned
on from from there there's multiple different ways you can
access it I can come up here and start a new chat by clicking
chat GPT and then from here actually select this model of
GPT 4 right now which has Dolly browsing and Analysis so
I can just click it and enable it now they also have this gbt
called Data analysis if you don't have it in your menu you
can actually go to explore and actually see it right here and
add it anyway this GPT itself only includes that Advanced
Data analysis functionality it doesn't include web browsing
or Dolly image generation and all that kind of stuff so I think
it's kind of limited I don't actually recommend using this
anytime you're using it I recommend going to chat gbt and
then using the most advanced model and selecting it with
analysis so let's plug in that same exact complex word
problem that we had from before and see what Chachi BT
does so first it goes through and identifies basically all the
different variables it needs to use and then it starts actually
analyzing it that's when it's when it showed just there is
when it's going to be using that Advanced Data analysis
functionality now it tells us that the value is this 57.6 million
which according to the calculator is exactly correct so how
did I actually get this result well I can click here at the end of
this sentence and go to view analysis and it shows me the
python code that it's actually executing here and let's walk
through this code real quick first it identifies all the different
variables we need for this has things like the total data nerds
the unemployment rate and then the applications per person
underneath it it starts getting to work calculating the total
applications which is the total data nerds time the
employment rate times applications per person to get the
final one and we can see the results right down here at the
bottom if I wanted to I can even copy the code and put it into
my own python environment and execute it but I sort of like
this because python is executed right here inside of chat gbt
and you get your results and you know it's accurate because
you can see it so what all can be done with this feature of
Advanced Data analysis well let's ask it and it goes into a lot
of the things that we're actually going to be covering in this
chapter specifically talks about we can do things like data
analysis statistical analysis data processing predictive
modeling and even going into things like data interpretation
and custom queries so a lot of things the core things that I do
as a data analyst this functionality of Chad gbt can also do
all right so I'm excited to jump into this to explore more
about how we're going to use this in this this chapter for you
for your task for this I'm going to have you going through
and actually quiz chbt on the same prompt asking it what it
can do with this feature because in the next video we're
going to be diving into importing data I want you to also ask
it what type of files can you import into this and use inside
of it all right with that I'll see you in the next one in this
video we're going to be going over connecting to data
sources specifically we're going to go to import a data set
that we're going to get from online and then we're going to
do some brief analysis of it so for your homework you
should have prompted chbt to find out what type of file types
it accepts I did this initially and it only provided three of
CSV Excel and Json which is pretty neat that it does all of
these things um but I knew that it could import more so you
have to always be very specific and I provided at another
prompt to then provide me a more thorough list of the file
types and it listed a lot more so just databases uh SPSS SAS
files HTML so it takes a lot of different files and this is great
for us data analyst so let's get into uploading some data and
then analyzing it I think I have the perfect data set for this so
if you go to the link below it links you to my kaggle site
where I've hosted a data set on data analyst job postings
kaggle is a great site in order to get data sets because you can
go through and search different ones then also it tells you a
description and shows you some overall summary statistics
about the data set itself so it's it's really useful and you can
also see some stuff around uh what other people are doing
right so we're going to download this data set and after we
do that we're going to find that it downloads it into a zip file
zip file just means that it's a file that they compress down
and so zip file is fine it's actually better because it makes it
smaller we're we going to upload this file into the Advanced
Data analysis plugin so I'm not even going to provide any
instructions I'm just going to press enter and have it upload
and see what chat gbt says back and it identified that it's a
zip file as it should and it extracted the contents of that in it
it found that we have a CSV or basically like a text file
where everything's separated by commas and so now it's
asking what we want to do next for data analysis and I want
to find out more about this data set specifically I just want to
find out what are The Columns of the data set maybe a
description of each one of these columns and so because
we've already provided that context VI our custom
instructions I then provided the task of tell me more about
this data set for each column give a brief description so now
it's providing each of these columns along with a brief detail
and as mentioned before this is job postings and so it has a
lot of key information from that job posting such as the
company name name the location description or job
description and then most notably things like salary where
we have like hourly Sly yearly they also have minmax
average and we'll get into all that in a little bit so your task
now is to go to kaggle download that data set and then
upload it into the Advanced Data analysis plugin from there
ask it about the columns and the data set and we're going to
be jumping into some descriptive statistics next so feel free
to also jump into that and start looking around at different
statistics of the columns all right see you in the next one in
this video we're going to be exploring that data set that you
should have downloaded from kaggle and then uploaded into
chat gbt via the Advanced Data analysis plugin for this
analysis we're going to be doing some uh analysis with
descriptive statistics and then also with exploratory data
analysis so I'm just going to start with a simple prompt of
perform descriptive statistics on each column so in my case
it initially tried to provide some of these descriptive statistics
and what I mean by that is things like the count how many
rows it has the mean or average standard deviation what's the
minimum value what's the maximum value that's for
numerical columns for categorical columns such as like the
job title it has things like how many values are unique so
there's 11,000 different unique ones with a top result of data
analyst um as we' expect from this data now it's only able to
do a little bit and so I prompted it further to do the entire
data set and it says it needs to do smaller parts for easier
viewing and so I'm actually going to refine this prompt
further to get the data better how I want because right now
it's providing it in a bullet format I don't really like that I
think it'd be better to have a table format so I prompt it to
still perform descriptive statistics on each column but also
for this group numeric and non-numeric columns such as
those categorical columns into different tables with each
column as a row this hack to get these values in a table value
makes it to where you can actually see and better understand
these results and it was it's something that I was expect to get
as a data analyst so for these numerical columns we have
quite a few we can see it is has a lot of data around the salary
average min max hour L early we'll dive in that further but I
want to call out this first if you're not familiar with python is
that the first one called unnam zero whenever there's not a
column title python will give it this name of unnamed zero
so that's basically like the index we already have an index in
it both those columns aren't really useful for us in our case
for the non-numerical columns it looks like it went into a lot
of the different ones that I really care about title company
name uh the job platform and description but it didn't do all
of them so I'm actually going to prompt it to go further in
those all right so now I can go through and actually see each
one of these non-numerical columns get a better idea of how
many counts they have if they have any missing values such
as the salary column it looks like only about 5,000 values are
there while there are a total of around 29.5 th000 job
postings so that's just something to note with this data set um
we can see all these different top things and frequency so
this is some really good descriptive statistics that's provided
in a very convenient way to see it after descriptive statistics
the next thing that I'd like to get into is exploratory data
analysis and exploratory data analysis is a way to visualize a
lot of these descriptive statistics in a way that I can actually
see visually via graphs such as histograms or bar charts so
I'm going to prompt chat gbt to perform some of this Eda
and I provide it with perform exploratory data analysis on
each of these columns provide an appropriate visualization
to repr present the content of each column for example use a
histogram for numerical columns and the results from this
are really interesting because now we get a dive and to see
like what's in this data set itself the first one that gives us is
the title so what is the job title itself that's being presented in
this job posting and for data analysts in the United States we
expect to see data analyst number one but also maybe some
data scientists um and it looks like data Engineers even F
this as well um other things have like company upwork look
like they're going crazy with job postings job locations
anywhere looks to be like a very common one along with
United States um also looks like we probably will need to do
some data cleaning for this location and then the Via which
is like the job platform has things like oh looks like LinkedIn
is like the major provider of job postings for this data set
then we have upwork and BB um and then it asks us to dive
deeper into more columns all right so now it's time for your
task you're going to go in and similar to me you're can
perform those descriptive statistics I recommend having it
output in that table like format and then move it into
exploratory data analysis it's probably going to do the same
where it only provides you a few charts at a time but keep
iterating through to get more familiar with this data set and
understand what we're working with in the next video we're
going to get into cleaning up these values before we get into
further visualizing all right see you in the next one hey hope
you're liking the tutorial if so you may like my cheat sheet
on using chat gbt for data analytics which you can get by
going to my website and signing up for my newsletter all
right that let's get back into it in this video we're going to be
going over data cleanup so previously you should have done
the descriptive statistics to find out more about the data set
itself and then jumped into an exploratory data analysis of
each one of those columns to understand what's actually in
this data set and with that in mind of going through it we
wanted to find what type of columns we need to focus on for
the data clean up right now there's two main ones that came
to mind that we identified in the last video that we're going
to clean up in this video the first is job location and this one
has a space randomly and it looks like sometimes like after
United States there's multiple spaces and then for like
anywhere there's just like one space so what we're going to
have chachu PT do is go in and remove these spaces so I
prompted for the location column it appears that some values
have unnecessary spaces we need to remove these spaces to
better categorize this data nice nice and so it went through
and re and it actually did it on its own it generated this new
updated bar graph showing these locations once it cleaned it
out and now we don't have any duplicated anywhere or
United States it's pretty awesome the next column I want to
clean up is the Via column which technically is the job
platform column and you can see from these values that it's
like via LinkedIn via upwork it's sort of unnecessary to have
that so I want it to remove that via space at the beginning
and rename that column so I prompted with let's clean up
this column by removing the Via and rename the column to
job platform and once again did it flawlessly so now we have
all of these cleaned up data that we need we're now going to
move into visualizing this data your task is to clean these
things up specifically focusing on those job platforms and
also on that location found any other ones to clean up feel
free to jump into those as well all right see you next one in
this video we're going to be looking at doing more complex
visualizations specifically looking at that salary column and
analyzing it how it relates to other columns in the data set
previously we had gone through and cleaned up both the job
location and job platform forms columns we're going to be
integrating this with the salary data so we need to make sure
that was cleaned up so let's look at the salary data going back
to those descriptive statistics that were provided we can see
we have about six columns for salary in it we have things
like salary average which provides the average salary salary
men which is like the minimum value of a job posting
sometimes it has a range salary Max which is the higher end
of the range hourly and yearly and that is whether it's an
hourly rate or a yearly rate we put them into separate
columns and then the standardized is a combination of
correcting the hourly Sly rate to the yearly don't worry about
it too much if you don't understand what's going on with the
standardize we're going to be focusing on that salary yearly
column one thing to note is there is a column in there on
salary rate whether it's hourly yearly and then we even have
a few values on monthly pay but like I said we're going to
focus on the yearly salary for this just to show it visually to
better understand that salary yearly column has this is the
histogram for it and we can see that it's distributed between
around $50,000 to $150,000 which what what we expect for
a data analyst salary as far as the hourly rate we're seeing it
all the way from a low maybe around $10 up to around $100
for its distribution that standardized salary column then
combines those values from the all hour salary with the
annual correcting it to a yearly rate based on how many
hours or in a year and so we get this distribution which is
actually very similar to our other Distribution on the yearly
salary just more values but don't worry if you don't
understand that standardized salary we're going to be just
focusing on the yearly salary for now specifically we're
going to be looking at plotting the top 10 job platforms based
on average yearly salary and that's why we need to make
sure that this column was clean so this is where you have to
be very careful what you tell jbt and based on what I said it
plotted correct thing that it should have right it has the top 10
job platforms but this is based on the top 10 average yearly
salary and really I was looking for the 10 most common job
platforms what are the average salaries for those not
necessarily what are just the highest because some of these
aren't going to have a lot of values in it and I know this
because when we go back to the top Town job platforms that
I did with the Eda I can see that LinkedIn upwork and BB
are the top three yeah whenever I scroll down here they're
not even in here so that's why I knew it plotted it not really
how I wanted it so I'm going to update my prompt to say plot
the top 10 most common job platforms that include yearly
salary data plot this as a bar graph for the average salary and
with this one I'm being very specific specific about that I
want the top 10 most common job platforms and we get this
visualization which then shows us the salaries for these top
10 platforms now you may look at it and find that okay we
had LinkedIn but what about upwork and BB both of those
are more of freelance websites so expect hourly rates to be
on there I'm also assuming that BB is a freelance site
because it's not on here probably need to Google that but we
do see LinkedIn on here right and so that has I as would
expect some sort of yearly salary and we can see it ranks in
the middle and it looks like this AI jobs.net has a lot higher
so AI jobs paying the bills a little bit more all right it's your
turn now to perform the same analysis on these job platforms
I don't want you to stop there though I want you to also go
into visualize this for both the job titles and the job locations
and I want the similar results of the top 10 job titles and top
10 most common job locations all right see in the next one
all right in this video we're going to get into predicting data
specifically around that salary column let's recap real quick
about those visualizations that you should have built first
you should have done an analysis for the top 10 most
common job titles and in this we can see that lead data
analysts and data scientists have some of the highest salaries
along with senior data analyst which I expect and D data
analyst looks like it's at the lowest point of the list because
most of these are senior positions so this is like making sense
now as far as the top 10 locations they have at United States
and anywhere looking like the highest and then it looks like
we have for the top 10 locations we have a lot of stuff from
Kansas Oklahoma and Missouri once again this data sets on
the United States only so this is I I expect this but since these
are the most common locations it doesn't include things like
New York and California which it does note down here that
they have higher salaries in these locations so it's good that it
has these kind of notes to to let you know of this I could this
take this visualization a step further and start exploring what
are the highest based on not caring about the top 10 most
common locations but we'll do that another time one quick
note is I did take a break during this and if you find that
you're going through and it has a problem compiling what
your request is so I initially tried to prompt it to provide me
with those visualizations for the top 10 job titles and it got
caught up and I had to reload the data I reloaded the data and
it got right back into the task of plotting the job titles along
with the location which had the cleaned up location so it kept
track of the previous work that we did so if we count we
have three different visualizations showing how salary could
fall one is on the job platforms the second is on the job title
and the third is on the location well this isn't really
convenient if we want to have multiple conditions say we
wanted to provide location and job title we can't really do
that or see anything extracted from the visualizations but this
is where predicting data or machine learning comes in
specifically we could use some sort of machine learning
model in order to predict what the salary would be based on
all this data and be able to put it into chat GPT and get it so
let's actually build something for this so I'm going to prompt
jat GPT to build a machine learning model to predict yearly
salary use job title job platform and location as inputs into
this model and I have at the end to suggest what models do
you suggest using for this so it suggests three models
random Forest gradient boosting and linear regression I'm
comfortable with using any one of these but I'm actually
curious which one chat gbt recommends based on its
knowledge of the data set so I prompted which one do you
recommend for this data and it gesting random forest and
makes a lot of good points about it's good for both numerical
and categorical values which we have a lot of categorical
values in this and it's less sensitive to outliers and with the
salary we're going to see some outliers such as having you
know a high salary like $900,000 so I think this is a great
model to go with we're going to proceed forward with this all
right so the model is built and it's providing some statistics
around the errors um specifically I like looking at things like
the root mean square error and it says it's around 22,000 if
you're unfamiliar with stuff like this one we're going to go
into it in a little bit more detail in a follow on chapter but
you can just ask chat gbt this so I asked it how would you
judge these errors and it provides a description specifically
for rmse that this means the models predictions are on
average off for about 22,000 from the actual year salary so
there's like a 22,000 amount swing that it possibly could
have so this is really good to know from our site of how
accurate this model is now we could go forward with fine-
tuning the model but I want to actually just go into actually
testing it so let's actually use chat gbt to run this model so
let's actually run this model within chat GPT and I ask it how
and it says hey just provide me with the location title and
platform so that's what I did we're going to start first with
data analyst in the United States for LinkedIn job postings to
see what we would expect for the salary and it looks like the
predicted yearly salary is around $94,000 which isn't too bad
because if we go to something like glass store which is a
website that Aggregates salaries we can see that the expected
annual salary is around $80,000 so this $94,000 that it's
providing is actually within that 22,000 that are provided for
that rmse so that's pretty cool now I want to see how it
actually trends for more senior roles remember from our
previous visualization we would expect data analysts would
be at the lower end and Senior data analysts would be around
the higher end of the pay so providing it updated details for
still in the United States in LinkedIn but for a senior data
analyst it predicts that the salary is around $117,000 Which
is higher which is pretty awesome and then when we go to
glass door for senior data analysts we're seeing that the
salaries correlate a lot closer in this case they saying it
should be around 121,000 which is really close to 117,000
that we got here with our model and this is all pretty
amazing I don't know if you're familiar with machine
learning but you just used it in order to predict salary also
you were able to use things like rmsse to verify how accurate
these models are what we're finding from this is that the data
analyst prediction is not as accurate as things like the senior
data analyst based on the number of roles the data analyst
has and how it's significantly more than the others I think we
have problems with how these jobs are classified and a lot of
these data analyst positions that are just classifies our data
analyst are probably also including senior roles as well so it's
skewing them up um we could build the model out further in
order to correct for this but I think this is good for now all
right it's your turn to now give it a try I want you to go in
and prompt chat GPT in order to build a model similar to
this you can use these three attributes that I used of location
job title and platform or feel free to use your own once this
model's built then go test it out actually give it those inputs
that you specified and then go to sites like glass door and see
if you can verify how accurate your model is compared to
that one all right so that's the major steps that we're taking
for this chapter after you do this I'd be pretty proud of
yourself we went through a complete data analytics peline all
the way from collecting data performing VA cleaning it up
analyzing it and then building a model to help predict some
data this is all a lot of work and we did this with not a single
line of code so it's pretty awesome all right with that I'll see
you in the next one all right in this video we're going to be
talking about three major limitations of chat gbt I expect
similar to my last limitations video that this video will
eventually be removed from this this course because chat gbt
and open AI should fix this but these three things range
around connecting to the internet data limitations as far as
how much data we can import into chat gbt and then also
security concerns the first limitation is internet access and
for security reasons they don't allow Advanced Data analysis
to connect to any online sources that have data specifically
for me I'm usually connecting to things like databases that
are in the class apis that stream data or even to just online
data sources on like Google Sheets and these three examples
it can't connect any of these if I wanted to use any one of
these locations I would have to download that data and then
import it to chat GPT this actually brings us to our second
limitation so say I have something like data in a database and
I've downloaded to a CSV file which I have right here
depending on the size of the data it may not fit into chat gbt I
try to upload the file and I get this message saying the file is
too large maximum file size is 512 megabytes and that was
around 250,000 rows of data now one trick you can take
with this if you're really close to that 512 megabytes is to
compress it into a zip file in my case I got to 545 so it just
missed it so I'm not able to actually use this and actually up
upload it the other option is taking your data and splitting it
up even even smaller files because although you have this
file size limit of 512 megabytes you actually have a total
data set size of 2 GB so if you break it up in our case into
five separate csvs I can then import them in for both of these
limitations of internet access and file siiz limitations I have a
workaround for it in a future chapter where we're going to be
talking about the notable plugin and this is super powerful at
connecting to online data and also uploading or connecting
to large data sets so we have a work for this but I wanted to
make it apparent about this Advanced Data analysis plug-in
the limitations with that and the final thing to note is on data
security so we talked about previously within chat gbt how
you could turn off chat history so your data is not used to
train chat gbt models so I think that's a good way of
protecting yourself if you're unsure whether data can go into
chat gbt and the next thing to talk about is data security so
recall that that we have multiple different versions of chat
gbt free plus an Enterprise if you're using plus your data
necessarily isn't fully secure in that that your data can be
used to actually train the models so if you have proprietary
data confidential data or say you work in the healthcare
industry and it may be hippo protected you may not want to
or I do not suggest actually uploading this type of data if
you're under the chat gbt plus plan it may not be the most
secure option even if you go through all those steps of
actually turning off your chat history and removing yourself
from being trained in the models it's still unclear if your data
is actually safe in that case I would not actually upload any
secure data but chat PT now has Enterprise Edition and this
is for secure confidential or even that hippo producted data
as none of this data is used for training and it has a lot of
different certifications like sock 2 in order to ensure that
your data is maintained secure so in those cases if you're
within this Enterprise and your company has provided
guidance that you're okay with putting your data into this in
that case I think you're fine and you think that it'd be
valuable reach out to your manager and supervisor and see if
they'd be open to actually implementing a tool like this I find
it super powerful and I find that it's more than worth its
money all right I'll see you the next one thata nerds awesome
job on wrapping up this chapter on the Advanced Data
analysis plugin I think you should be super proud of yourself
especially with the project that we just accomplished you
could basically turn what the work we just did in into a
portfolio project and present it to an employer as work and
evidence that you have experience to use this tool in your job
so I think you should be super excited about that now I use
all these tricks on a very routine basis especially when I have
co-worker or friends give me data that they want me to
explore quickly in the past usually something like this would
have taken me all day to do now you've seen that we did this
in a matter of minutes of jumping in diving into the data set
getting visualizations and also predicting it so I think this is
such a powerful tool to implement in your workflow and I
just wanted to stress that this is mainly used by me for that
ad hoc analysis so quick insights if I need to do ongoing
analysis or deeper analysis I'm going to be using different
plugins within chat gbt and still being able to capture a lot of
the value out of chat gbt but it's going to provide Extra Value
using these plugins that we're going to use such as like the
notable plugin that allows us to connect even larger data sets
and also provide an environment to actually store all of our
different analysis and results to then share with others de
