Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

1/2/2021 How To Load Machine Learning Data in Python

 Navigation

Click to Take the FREE Python Machine Learning Crash-Course

Search... 

How To Load Machine Learning Data in Python


by Jason Brownlee on May 11, 2016 in Python Machine Learning

Tweet Share Share

Last Updated on August 21, 2019

You must be able to load your data before you can start your machine learning project.

The most common format for machine learning data is CSV files. There are a number of ways to load a
CSV file in Python.

In this post you will discover the different ways that you can use to load your machine learning data in
Python.

Kick-start your project with my new book Machine Learning Mastery With Python, including step-by-
step tutorials and the Python source code files for all examples.

Let’s get started.

Update March/2017: Change loading from binary (‘rb’) to ASCII (‘rt).


Update March/2018: Added alternate link to download the dataset as the original appears to have
been taken down.
Update March/2018: Updated NumPy load from URL example to work wth Python 3.

https://machinelearningmastery.com/load-machine-learning-data-python/ 1/25
1/2/2021 How To Load Machine Learning Data in Python

Start Machine Learning ×


You can master applied Machine Learning
without math or fancy degrees.
Find out how in this free and practical course.

Email Address

START MY EMAIL COURSE

How To Load Machine Learning Data in Python


Photo by Ann Larie Valentine, some rights reserved.

Considerations When Loading CSV Data


There are a number of considerations when loading your machine learning data from CSV files.

For reference, you can learn a lot about the expectations for CSV files by reviewing the CSV request for
comment titled Common Format and MIME Type for Comma-Separated Values (CSV) Files.

CSV File Header


Does your data have a file header?

Start Machine Learning


If so this can help in automatically assigning names to each column of data. If not, you may need to
name your attributes manually.

Either way, you should explicitly specify whether or not your CSV file had a file header when loading
your data.

Comments
Does your data have comments?

Comments in a CSV file are indicated by a hash (“#”) at the start of a line.

https://machinelearningmastery.com/load-machine-learning-data-python/ 2/25
1/2/2021 How To Load Machine Learning Data in Python

If you have comments in your file, depending on the method used to load your data, you may need to
indicate whether or not to expect comments and the character to expect to signify a comment line.

Delimiter
The standard delimiter that separates values in fields is the comma (“,”) character.

Your file could use a different delimiter like tab (“\t”) in which case you must specify it explicitly.

Quotes
Sometimes field values can have spaces. In these CSV files the values are often quoted.

The default quote character is the double quotation marks “\””. Other characters can be used, and you
must specify the quote character used in your file. Start Machine Learning ×
You can master applied Machine Learning
without math or fancy degrees.
Find out how in this free and practical course.
Need help with Machine Learning in Python?
Emailprep,
Take my free 2-week email course and discover data Address
algorithms and more (with code).

Click to sign-up now and also get a free PDF Ebook version of the course.
START MY EMAIL COURSE

Start Your FREE Mini-Course Now!

Machine Learning Data Loading Recipes


Each recipe is standalone.

This means that you can copy and paste it into your project and use it immediately.

If you have any questions about these recipes or suggested improvements, please leave a comment
and I will do my best to answer.

Start Machine Learning


Load CSV with Python Standard Library
The Python API provides the module CSV and the function reader() that can be used to load CSV files.

Once loaded, you convert the CSV data to a NumPy array and use it for machine learning.

For example, you can download the Pima Indians dataset into your local directory (download from
here).

All fields are numeric and there is no header line. Running the recipe below will load the CSV file and
convert it to a NumPy array.

1 # Load CSV (using python)

https://machinelearningmastery.com/load-machine-learning-data-python/ 3/25
1/2/2021 How To Load Machine Learning Data in Python

2 import csv
3 import numpy
4 filename = 'pima-indians-diabetes.data.csv'
5 raw_data = open(filename, 'rt')
6 reader = csv.reader(raw_data, delimiter=',', quoting=csv.QUOTE_NONE)
7 x = list(reader)
8 data = numpy.array(x).astype('float')
9 print(data.shape)

The example loads an object that can iterate over each row of the data and can easily be converted into
a NumPy array. Running the example prints the shape of the array.

1 (768, 9)

For more information on the csv.reader() function, see CSV File Reading and Writing in the Python API
documentation.

Start Machine Learning ×


Load CSV File With NumPy
You can master applied Machine Learning
You can load your CSV data using NumPy and the numpy.loadtxt() function.
without math or fancy degrees.
Find out how in this free and practical course.
This function assumes no header row and all data has the same format. The example below assumes
that the file pima-indians-diabetes.data.csv is in your current working directory.
Email Address
1 # Load CSV
2 import numpy
3 filename = 'pima-indians-diabetes.data.csv' START MY EMAIL COURSE
4 raw_data = open(filename, 'rt')
5 data = numpy.loadtxt(raw_data, delimiter=",")
6 print(data.shape)

Running the example will load the file as a numpy.ndarray and print the shape of the data:

1 (768, 9)

This example can be modified to load the same dataset directly from a URL as follows:

Note: This example assumes you are using Python 3.

1 # Load CSV from URL using NumPy


2 from numpy import loadtxt
3 from urllib.request import urlopen
4 url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data
5 raw_data = urlopen(url)
6 dataset = loadtxt(raw_data, delimiter=",")
7 print(dataset.shape) Start Machine Learning

Again, running the example produces the same resulting shape of the data.

1 (768, 9)

For more information on the numpy.loadtxt() function see the API documentation (version 1.10 of
numpy).

Load CSV File With Pandas


You can load your CSV data using Pandas and the pandas.read_csv() function.

https://machinelearningmastery.com/load-machine-learning-data-python/ 4/25
1/2/2021 How To Load Machine Learning Data in Python

This function is very flexible and is perhaps my recommended approach for loading your machine
learning data. The function returns a pandas.DataFrame that you can immediately start summarizing
and plotting.

The example below assumes that the ‘pima-indians-diabetes.data.csv‘ file is in the current working
directory.

1 # Load CSV using Pandas


2 import pandas
3 filename = 'pima-indians-diabetes.data.csv'
4 names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
5 data = pandas.read_csv(filename, names=names)
6 print(data.shape)

Note that in this example we explicitly specify the names of each attribute to the DataFrame. Running
the example displays the shape of the data: Start Machine Learning ×
1 (768, 9) You can master applied Machine Learning
without math or fancy degrees.
We can also modify this example to load CSV data directly from a URL.
Find out how in this free and practical course.
1 # Load CSV using Pandas from URL
2 import pandas
3 Email Address
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data
4 names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
5 data = pandas.read_csv(url, names=names)
6 print(data.shape) START MY EMAIL COURSE

Again, running the example downloads the CSV file, parses it and displays the shape of the loaded
DataFrame.

1 (768, 9)

To learn more about the pandas.read_csv() function you can refer to the API documentation.

Summary
In this post you discovered how to load your machine learning data in Python.

You learned three specific techniques that you can use:

Load CSV with Python Standard Library.


Load CSV File With NumPy. Start Machine Learning
Load CSV File With Pandas.

Your action step for this post is to type or copy-and-paste each recipe and get familiar with the different
ways that you can load machine learning data in Python.

Do you have any questions about loading machine learning data in Python or about this post? Ask your
question in the comments and I will do my best to answer it.

Discover Fast Machine Learning in Python!


https://machinelearningmastery.com/load-machine-learning-data-python/ 5/25
1/2/2021 How To Load Machine Learning Data in Python

Develop Your Own Models in Minutes


...with just a few lines of scikit-learn code

Learn how in my new Ebook:


Machine Learning Mastery With Python

Covers self-study tutorials and end-to-end projects like:


Loading data, visualization, modeling, tuning, and much more...

Finally Bring Machine Learning To


Your Own Projects
Skip the Academics. Just Results.

SEE WHAT'S INSIDE


Start Machine Learning ×
You can master applied Machine Learning
without math or fancy degrees.
Find out how in this free and practical course.
Tweet Share Share

Email Address
About Jason Brownlee
Jason Brownlee, PhD is a machine learning specialist who teaches developers how to get results
START MYtutorials.
with modern machine learning methods via hands-on EMAIL COURSE

View all posts by Jason Brownlee →

 Introduction to Python Deep Learning with Keras


How to Train Keras Deep Learning Models on AWS EC2 GPUs (step-by-step) 

105 Responses to How To Load Machine Learning Data in Python

REPLY 
ML704 January 17, 2017 at 7:17 pm #
Start Machine Learning
Hi!
What is meant here in section Load CSV with Python Standard Library. You can download the Pima
Indians dataset into your local directory.
Where is my local directory?
I tried several ways, but it did not work

REPLY 
Jason Brownlee January 18, 2017 at 10:13 am #

It means to download the CSV file to the directory where you are writing Python code. Your
project’s current working directory.

https://machinelearningmastery.com/load-machine-learning-data-python/ 6/25
1/2/2021 How To Load Machine Learning Data in Python

REPLY 
ML704 January 18, 2017 at 2:56 pm #

Thank you, I got it now!

REPLY 
V ABISHEK HEYER KRUPALIN October 21, 2020 at 4:17 pm #

thanks budddy

Start Machine Learning ×


REPLY 
Jason Brownlee October 22, 2020 at 6:36 am #
You can master applied Machine Learning
You’re welcome.
without math or fancy degrees.
Find out how in this free and practical course.

ruby July 17, 2017 at 2:19 pm # Email Address REPLY 

hi
START MY EMAIL COURSE
how can load video dataset in python?? without tensorflow, keras, …

REPLY 
Jason Brownlee July 18, 2017 at 8:40 am #

I googled “python load video” and found this:


http://opencv-python-
tutroals.readthedocs.io/en/latest/py_tutorials/py_gui/py_video_display/py_video_display.html

REPLY 
Priyanshi December 15, 2020 at 3:22 pm #

Is it possible to store the dataset in E drive while my python files are in C drive?
Start Machine Learning

REPLY 
Jason Brownlee December 16, 2020 at 7:43 am #

I don’t think Python cares where you store files.

REPLY 
constantine July 30, 2017 at 4:23 am #

Hello,

https://machinelearningmastery.com/load-machine-learning-data-python/ 7/25
1/2/2021 How To Load Machine Learning Data in Python

I want to keep from a CSV file only two columns and use these numbers, as x-y points, for a k-means
implementation that I am doing.

What I do now to generate my points is this:


” points = np.vstack(((np.random.randn(150, 2) * 0.75 + np.array([1, 0])),
(np.random.randn(50, 2) * 0.25 + np.array([-0.5, 0.5])),
(np.random.randn(50, 2) * 0.5 + np.array([-0.5, -0.5])))) “,
but I want to apply my code on actual data.

Any help?

REPLY 
Jason Brownlee July 30, 2017 at 7:52 am #
Start Machine Learning ×
Sorry, I don’t have any kmeans tutorials in Python. I may not be the best person to give you
advice. You can master applied Machine Learning
without math or fancy degrees.
Find out how in this free and practical course.

REPLY 
constantine July 30, 2017 at 7:51 pm #
Email Address
I don’t want anything about k-means, I have the code -computations and all- sorted out.
I just want some help with the CSV files. START MY EMAIL COURSE

REPLY 
Steve August 3, 2017 at 11:54 am #

Thank you for explaining how to load data in detail.

REPLY 
Steve August 3, 2017 at 11:55 am #

They work perfectly.

REPLY 
Jason Brownlee August 4, 2017 at 6:47Start
am # Machine Learning

I’m glad to hear it!

REPLY 
Jason Brownlee August 4, 2017 at 6:47 am #

I’m glad it helped Steve.

REPLY 
Fawad August 8, 2017 at 6:20 pm #

https://machinelearningmastery.com/load-machine-learning-data-python/ 8/25
1/2/2021 How To Load Machine Learning Data in Python

Thanks you very much…really helpful…

REPLY 
Jason Brownlee August 9, 2017 at 6:24 am #

I’m glad to hear that Fawad.

REPLY 
komal September 5, 2017 at 7:18 pm #

how to load text attribute ? I got error saying could not convert string to float: b’Iris-setosa’

Start Machine Learning ×


You can master applied Machine Learning REPLY 
Jason Brownlee September 7, 2017 at 12:43 pm #
without math or fancy degrees.
Findthen
You will need to load the data using Pandas out how in this
convert free
it to and practical course.
numbers.

I give examples of this.


Email Address

START MY EMAIL COURSE


REPLY 
R October 10, 2017 at 3:21 am #

I was just wondering what the best practices are for converting something in a Relational
Database model to an optimal ML format for fields that could be redundant. Ideally the export would be
in CSV, but I know it won’t be as simple as an export every time. Hopefully simple example to illustrate
my question: Say I have a table where I attribute things to an animal. The structure could be set up
similarly to this:
ID, Animal, Color,Continent
1,Zebra,Black,Africa
2,Zebra,White,Africa
With the goal of being able to say “If the color is black and white and lives in Africa, it’s probably a
zebra.” …so each line represents the animal with a single color associated with it, and other fields as
well. Would this type of format be a best practice to feed into the model as is? Or, would it make more
sense to concatenate the colors into one line with a delimiter? In other words, it may not always be a 1:1
relationship, and in cases where the dataset is like that, what’s
Start the best
Machine way of formatting?
Learning
Thanks for your time.

REPLY 
Jason Brownlee October 10, 2017 at 7:50 am #

Great question. There are no hard rules, generally, I would recommend exploring as many
representations as you can think of and see what works best.

This post might help to give you some ideas:


http://machinelearningmastery.com/discover-feature-engineering-how-to-engineer-features-and-how-
to-get-good-at-it/

https://machinelearningmastery.com/load-machine-learning-data-python/ 9/25
1/2/2021 How To Load Machine Learning Data in Python

REPLY 
Hemalatha S November 17, 2017 at 6:52 pm #

can you tell me how to select features from a csv file

REPLY 
Jason Brownlee November 18, 2017 at 10:14 am #

Load the file and use feature selection algorithms:


https://machinelearningmastery.com/feature-selection-in-python-with-scikit-learn/

Start Machine Learning ×


REPLY 
Disha Umarwani November 28, 2017 at 12:41 pm #
You can master applied Machine Learning
Hey, without math or fancy degrees.
I am trying to load a line separated data. Find out how in this free and practical course.
name:disha
gender:female
Email Address
majors:computer science

name:
START MY EMAIL COURSE
gender:
majors:

Any advice on this?

REPLY 
Jason Brownlee November 29, 2017 at 8:13 am #

Ouch, looks like you might need to write some custom code to load each “line” or entity.

REPLY 
Hemalatha S December 1, 2017 at 2:17 am #

can you tell me how to load a csv file and apply feature selection methods?? can you post code
Start Machine Learning
for grey wolf optimizer algorithm??

REPLY 
Jason Brownlee December 1, 2017 at 7:40 am #

Yes, see this post:


https://machinelearningmastery.com/feature-selection-in-python-with-scikit-learn/

REPLY 
fxdingscxr January 17, 2018 at 4:42 pm #

https://machinelearningmastery.com/load-machine-learning-data-python/ 10/25
1/2/2021 How To Load Machine Learning Data in Python

I have loaded the data into numpy array. What is the next thing that i should do to train my
model?

REPLY 
Jason Brownlee January 18, 2018 at 10:04 am #

Follow this process:


https://machinelearningmastery.com/start-here/#process

REPLY 
Ajinkya January 30, 2018 at 6:29 pm #

Hey, Start Machine Learning ×


I want to use KDD cup 99 dataset for the intrusion detection project. The dataset consist of String &
You can master applied Machine Learning
numerical data. So should I convert entire dataset into numeric data or should I use it as it is?
without math or fancy degrees.
Find out how in this free and practical course.

Jason Brownlee January 31, 2018 at 9:39 am Email


# Address REPLY 

Eventually all data will need to be numbers.


START MY EMAIL COURSE

REPLY 
Bipin February 2, 2018 at 5:11 pm #

Hey Jason,
I have a dataset in csv which has header and all the columns have different datatype,
which one would be better to use in this scenario: loadtxt() or genfromtxt().
Also, is there any major performance difference in these 2 methods?

REPLY 
Jason Brownlee February 3, 2018 at 8:34 am #

Use whatever you can, consider benchmarking the approaches with your data if speed is an
issue. Start Machine Learning

REPLY 
ML Beginer February 15, 2018 at 3:41 pm #

I got a ValueError: could not convert string to float


while reading this data :

http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-
wisconsin.data

Can you please reply where I am doing wrong?

https://machinelearningmastery.com/load-machine-learning-data-python/ 11/25
1/2/2021 How To Load Machine Learning Data in Python

REPLY 
ML Beginer February 15, 2018 at 3:45 pm #

1 import urllib
2 import numpy as np
3 url='http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsi
4 raw_data=urllib.urlopen(url)
5 ds=np.loadtxt(raw_data,delimiter=',')
6 print ds.shape

REPLY 
Jason Brownlee February 16, 2018 at 8:31 am #

You might have some “?” values. Convert them to 0 or nan first.
Start Machine Learning ×
You can master applied Machine Learning
without math or fancy degrees. REPLY 
ro May 8, 2018 at 4:25 am #
Find out how in this free and practical course.

filename = ‘C:\Users\user\Desktop\python.data.csv’
raw_data = open(filename, ‘rt’) Email Address
names = [‘pixle1’, ‘pixle2’, ‘pixle3’, ‘pixle4’, ‘pixle5’, ‘pixle6’, ‘pixle7’, ‘pixle8’, ‘pixle9’, ‘pixle10’, ‘pixle11’,
‘pixle12’, ‘pixle13’, ‘pixle14’, ‘pixle15’, ‘pixle16’, ‘pixle17’, ‘pixle18’, ‘pixle19’, ‘pixle20’, ‘pixle21’, ‘pixle22’,
START MY EMAIL COURSE
‘pixle23’, ‘pixle24’, ‘pixle25’, ‘pixle26’, ‘pixle27’, ‘pixle28’, ‘pixle29’, ‘pixle30’, ‘class’]
data = numpy.loadtxt(raw_data, names= names)

REPLY 
Jason Brownlee May 8, 2018 at 6:16 am #

Well done!

REPLY 
AJS June 1, 2018 at 1:22 pm #

I have multiple csv files of varying sizes that I want to use for training my neural network. I have
around 1000 files ranging from about 15000 to 65000 rows of data. After I preprocess some of this data,
one csv may be around 65000 rows by 20 columns array. StartMy
Machine
computerLearning
starts running out of memory
very quickly on just 1 of the 65000 by 20 arrays, so I cannot combine all the 1000 files into one large csv
file. Is there a way using keras to load one of the csv files, have the model learn on that data, then load
the next file, have the file learn on that, and so on? Is there a better way to learn on so much data?

REPLY 
Jason Brownlee June 1, 2018 at 2:47 pm #

I have a few ideas here:


https://machinelearningmastery.com/faq/single-faq/how-to-i-work-with-a-very-large-dataset

https://machinelearningmastery.com/load-machine-learning-data-python/ 12/25
1/2/2021 How To Load Machine Learning Data in Python

REPLY 
Hemant June 17, 2018 at 2:32 pm #

I have multiple 200 CSV files and labels files that contains 200 rows as output. I want to train,
but unable to load the dataset

REPLY 
Jason Brownlee June 18, 2018 at 6:39 am #

You may have to write come custom code to load each CSV in turn. E.g. in a loop over the
files in the directory.

Start Machine Learning ×


REPLY 
Aman July 12, 2018 at 4:10 am #
You can master applied Machine Learning
I got the error: without math or fancy degrees.
Traceback (most recent call last): Find out how in this free and practical course.
File “sum.py”, line 8, in
data= numpy.array(x).astype(float) Email Address
ValueError: setting an array element with a sequence.

why? START MY EMAIL COURSE

REPLY 
Jason Brownlee July 12, 2018 at 6:29 am #

It suggests that x is not an array or a list.

REPLY 
Kikio January 17, 2019 at 1:19 pm #

Hello,
I have a dataset which contains numbers like this: 3,6e+12, 2.5e-3…
when reading this dataset as a CSV file, I get the error: “Value error: cannot convert string to float”

Any solution please?? Start Machine Learning

REPLY 
Jason Brownlee January 17, 2019 at 1:47 pm #

The numbers are in scientific notation and will be read correctly.

Perhaps there are other non-number fields in the file?

REPLY 
Kikio January 17, 2019 at 11:54 pm #

https://machinelearningmastery.com/load-machine-learning-data-python/ 13/25
1/2/2021 How To Load Machine Learning Data in Python

No, there aren’t, and the error says :” cannot covert string to float in 3.6e+12″
thank you

REPLY 
Jason Brownlee January 18, 2019 at 5:40 am #

That is surprising, perhaps try a different approach to loading, e.g. numpy or


pandas?

Perhaps try posting to stackoverflow?

Start
Kikio January 19, 2019 at 11:14 am # Machine Learning ×
I’ll try , You can master applied Machine Learning
thanks without math or fancy degrees.
Find out how in this free and practical course.

Email Address REPLY 


Sandeep Nithyanandan January 23, 2019 at 6:00 pm #

Sir, START MY EMAIL COURSE


Suppose i have 3 csv files , each having a particular attribute in it. So a single row in the 3 csv file
correspond to a particular feature instance. So during the loading time can i load all the csv file together
and convert each row into numpy array,
thanks

REPLY 
Jason Brownlee January 24, 2019 at 6:41 am #

I recommend loading all data into memory then perhaps concatenate the numpy arrays
(e.g. hstack).

REPLY 
Sara January 26, 2019 at 7:11 am #
Start Machine Learning

If I have a data set with .data file extention how can I deal with it in python?

please help

REPLY 
Jason Brownlee January 27, 2019 at 7:36 am #

Perhaps use a text editor to open it and confirm it is in CVS format, then open it in Python
as though it were a CSV file.

https://machinelearningmastery.com/load-machine-learning-data-python/ 14/25
1/2/2021 How To Load Machine Learning Data in Python

REPLY 
francistien January 27, 2019 at 9:05 am #

I copy your codes as follows:

# Load CSV using NumPy


# You can load your CSV data using NumPy and the numpy.loadtxt() function.

import numpy
filename = ‘pima-indians-diabetes.csv’
raw_data = open(filename, ‘rt’)
data = numpy.loadtxt(raw_data, delimiter=”,”)
print(data.shape)

===============

However, I got an error message Start Machine Learning ×


ValueError Traceback (most recent call last)
You can master applied Machine Learning
in
without math or fancy degrees.
5 filename = ‘pima-indians-diabetes.csv’
Find out how in this free and practical course.
6 raw_data = open(filename, ‘rt’)
—-> 7 data = numpy.loadtxt(raw_data, delimiter=”,”)
8 print(data.shape) Email Address

~\Anaconda3\lib\site-packages\numpy\lib\npyio.py in loadtxt(fname, dtype, comments, delimiter,


converters, skiprows, usecols, unpack, ndmin, encoding)START MY EMAIL COURSE
1099 # converting the data
1100 X = None
-> 1101 for x in read_data(_loadtxt_chunksize):
1102 if X is None:
1103 X = np.array(x, dtype)

~\Anaconda3\lib\site-packages\numpy\lib\npyio.py in read_data(chunk_size)
1026
1027 # Convert each value according to its column and store
-> 1028 items = [conv(val) for (conv, val) in zip(converters, vals)]
1029
1030 # Then pack it according to the dtype’s nesting

~\Anaconda3\lib\site-packages\numpy\lib\npyio.py in (.0)
1026
1027 # Convert each value according to its column and Start
storeMachine Learning
-> 1028 items = [conv(val) for (conv, val) in zip(converters, vals)]
1029
1030 # Then pack it according to the dtype’s nesting

~\Anaconda3\lib\site-packages\numpy\lib\npyio.py in floatconv(x)
744 if ‘0x’ in x:
745 return float.fromhex(x)
–> 746 return float(x)
747
748 typ = dtype.type

https://machinelearningmastery.com/load-machine-learning-data-python/ 15/25
1/2/2021 How To Load Machine Learning Data in Python

ValueError: could not convert string to float: ‘Pregnancies’


========

I do not know what is wrong.

REPLY 
Jason Brownlee January 28, 2019 at 7:10 am #

I’m sorry to hear that, I have some suggestions for you here:
https://machinelearningmastery.com/faq/single-faq/why-does-the-code-in-the-tutorial-not-work-for-
me

Start Machine Learning ×


REPLY 
vinayak mr March 31, 2019 at 11:29 pm #
You can master applied Machine Learning
how to load the dataset from the working directory to math
without colab?
or fancy degrees.
Find out how in this free and practical course.

Email Address REPLY 


Jason Brownlee April 1, 2019 at 7:48 am #

Sorry, I have not used colab. START MY EMAIL COURSE

REPLY 
Jackson April 3, 2019 at 5:52 am #

When I click the “update: download from here” to download the CSV file, it takes me to a white
page with number on the left side which looks to be the data. How do I get / download this data into a
CSV file? Thanks!

REPLY 
Jason Brownlee April 3, 2019 at 6:50 am #

Here is the direct link:


https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv
Start Machine Learning

REPLY 
Jackson April 3, 2019 at 2:27 pm #

Thank you!

REPLY 
Oscar April 8, 2019 at 4:43 am #

Hi Jason,

https://machinelearningmastery.com/load-machine-learning-data-python/ 16/25
1/2/2021 How To Load Machine Learning Data in Python

I hope you can help me with the following preprocessed dataset.txt file. How can I load this dataset in
python? It contains a total of 54,256 rows and 28 columns. Can I use pandas?

[0.08148002361739815, 3.446134970078908e-05, 4.747197881944017e-05, 0.0034219001610305954,


0.047596616392169624, 0.11278174138979659, 0.0011501307441196414, 1.0,
0.09648950774661698, 0.09152382450070766, 0.0032736389720705384, 0.02231715511892242, 0.0,
-1.0, 0.0, -1.0, -1.0, -1.0, 0.0, -1.0, -1.0, -1.0, 0.0, 0.0, 0.0, -1.0, 1.0, -1.0]

[0.0816768352686479, 2.929466010613462e-05, 1.2086789450560964e-06, 0.6246987951807229,


0.04743433880824845, 0.11350265074251698, 0.0011614423285977043, 1.0, 0.0965330892767645,
0.0914339631118999, 0.003190342698832632, 0.022268885790504313, 0.0, -1.0, 0.0, -1.0, -1.0, -1.0,
0.0, -1.0, -1.0, -1.0, 0.0, 0.0, 0.0, -1.0, 1.0, -1.0]

[0.08226727022239716, 2.987144231823633e-05, 2.2329338947249727e-06, 0.047448165869218496,


0.04753095407349041, 0.11459941368369171, 0.0011702815567795678, 1.0, 0.0969906953433135,
Start Machine0.0,
0.09170354727832318, 0.003358412434012629, 0.022329898179060795, Learning ×
-1.0, 0.0, -1.0, -1.0,
-1.0, 0.0, -1.0, -1.0, -1.0, 0.0, 0.0, 0.0, -1.0, 1.0, -1.0]
You can master applied Machine Learning
.
without math or fancy degrees.
.
Find out how in this free and practical course.
.
.
. Email Address
.
START MY EMAIL COURSE

REPLY 
Jason Brownlee April 8, 2019 at 5:59 am #

You can load it as a dataframe or a numpy array directly.

What problem are you having exactly?

REPLY 
Oscar April 8, 2019 at 6:54 am #

When I try to load it as a numpy array it returns the list again

I am using the following code after loading the dataset.txt file into memory:

import numpy as np Start Machine Learning


dataset = load_doc(‘dataset.txt’)

x= np.asarray(dataset)

print (x)

REPLY 
Jason Brownlee April 8, 2019 at 1:55 pm #

Try:

print(type(x))

https://machinelearningmastery.com/load-machine-learning-data-python/ 17/25
1/2/2021 How To Load Machine Learning Data in Python

Oscar April 9, 2019 at 6:51 am #

Thank you so much!

So my last question (hopefully) is that I have the dataset, the labels and a list of 28 titles
for the columns. I am trying to load them in python so I can split them and create my
training and testing datasets. I am not sure what to do with the titles. Do I need to load
them as well?

Jason Brownlee April 9, 2019 at 2:36 pm #


Start Machine Learning ×
You can use the column heading as the first line in the CSV file and load them
automatically with pandas. You can master applied Machine Learning
without math or fancy degrees.
Alternately, you can specify them as the columns in python, if needed.
Find out how in this free and practical course.
Or discard them completely.
Email Address

START MY EMAIL COURSE REPLY 


Shakil Ahmed May 4, 2019 at 9:39 pm #

hi
i am new .
please help me to convert image dataset to csv.

REPLY 
Jason Brownlee May 5, 2019 at 6:27 am #

You don’t, instead you load images as arrays:


https://machinelearningmastery.com/how-to-load-and-manipulate-images-for-deep-learning-in-
python-with-pil-pillow/

Start Machine Learning


REPLY 
sahil May 23, 2019 at 2:49 am #

how can i load data from parser

from parser import load_data #dataloading

REPLY 
Jason Brownlee May 23, 2019 at 6:07 am #

I don’t understand, sorry. Perhaps try posting to stackoverflow?

https://machinelearningmastery.com/load-machine-learning-data-python/ 18/25
1/2/2021 How To Load Machine Learning Data in Python

REPLY 
Akshay Varshney June 23, 2019 at 11:27 pm #

Hi, Jason, the dataset has been removed from the above link and I want to check that because
the whole of your book is based on that dataset only, so please provide us the dataset as it would
become easy for us to understand concepts from your book, please provide the dataset.
Thank You

REPLY 
Jason Brownlee June 24, 2019 at 6:33 am #

I provided an updated link directly in the post, here it is again:


https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv

Start Machine Learning ×


You can master applied Machine Learning
REPLY 
S M Abdullah Al Shuaeb August 31, 2019 at 4:30 am # math or fancy degrees.
without
Find out how in this free and practical course.
sir, pls help me

i just want , Email Address


how to classify categorical image by SVM and KNN alogrithm using python

START MY EMAIL COURSE

REPLY 
Jason Brownlee August 31, 2019 at 6:12 am #

Perhaps start here:


https://machinelearningmastery.com/spot-check-classification-machine-learning-algorithms-python-
scikit-learn/

REPLY 
Araz Sami September 6, 2019 at 6:00 am #

Hello,

Thank you so much for all the great Tutorials. I would like to use a multivariate time series dataset and at
first I need to make a similar format as of load_basic_motion data in Python. I have several text files
Start Machine
each representing one feature and each file has time series Learning
data for each observation. Do you have any
suggestions for preparing the data in the required format?

Thanks!

REPLY 
Jason Brownlee September 6, 2019 at 1:54 pm #

Perhaps this tutorial will provide a useful starting point and adapted to your needs:
https://machinelearningmastery.com/how-to-model-human-activity-from-smartphone-data/

https://machinelearningmastery.com/load-machine-learning-data-python/ 19/25
1/2/2021 How To Load Machine Learning Data in Python

Sudhanshu varun September 7, 2019 at 5:00 pm # REPLY 

Hello,
i successfully loaded my csv file dataset. Its basically a letter dataset and now i want to train my python
with this loaded dataset so that i can use this to recognise words later can you help me with is ?
thank you

REPLY 
Jason Brownlee September 8, 2019 at 5:14 am #

Yes, you can get started with text data in Python here:
https://machinelearningmastery.com/start-here/#nlp

Start Machine Learning ×


You can master applied Machine Learning REPLY 
JJ October 6, 2019 at 7:07 pm # without math or fancy degrees.
Find out how in this free and practical course.
Hi Jason,

One question here, may I know how can I load my non-csv data (a normal file instead) on spyder pyhton
Email Address
without converting to csv file dataset?

START MY EMAIL COURSE

REPLY 
Jason Brownlee October 7, 2019 at 8:28 am #

Yes, you can customize the call to read_csv() function for your dataset.

REPLY 
Nauman October 10, 2019 at 7:35 am #

X = list(map(lambda x: np.array(x), X))

X = list(map(lambda x: x.reshape(1, x.shape[0], x.shape[1]), X))


y = np.expand_dims(y, axis=-1)

I used Tcn model .when i run i got this error .Index out of Range please please help me how to solve this
error ..i also search from stackoverflow but not found
Start Machine Learning

REPLY 
Jason Brownlee October 10, 2019 at 2:15 pm #

This is a common question that I answer here:


https://machinelearningmastery.com/faq/single-faq/can-you-read-review-or-debug-my-code

REPLY 
Ipsita Dalai November 30, 2019 at 5:35 pm #

https://machinelearningmastery.com/load-machine-learning-data-python/ 20/25
1/2/2021 How To Load Machine Learning Data in Python

Thanks for this nice article.I want to know if we have a digit classification problem and the last column
contain the class.Then how to load and print the digits ignoring the last column.
I tried it and it is showing .

ValueError: cannot reshape array of size 257 into shape (16,16)

REPLY 
Jason Brownlee December 1, 2019 at 5:41 am #

This tutorial will show you how to load and show image data:
https://machinelearningmastery.com/how-to-load-and-manipulate-images-for-deep-learning-in-
python-with-pil-pillow/

Start Machine Learning ×


REPLY 
Ipsita Dalai December 2, 2019 at 3:39 pmYou
# can master applied Machine Learning
without math or fancy degrees.
Thanks .But the pixels of the image are inout
Find csvhow
format and
in this the
free last
and columncourse.
practical of the dataset
contains the label which I want to ignore.The dataset I am using is usps.csv to classify
digits.Thanks in advance. Email Address

START MY EMAIL COURSE


REPLY 
Jason Brownlee December 3, 2019 at 4:46 am #

That is very strange. Typically pixels are stored in an image format.

I’m not sure I have a tutorial that can help directly, you may have to write some custom code
to load the CSV and convert it to an appropriate 3d numpy array.

REPLY 
Ipsita Dalai December 2, 2019 at 5:32 pm #

Hi.I got my work done by keeping the data in the csv in numpy arrays and then slicing the
array.However your tutorials are very nice and helpful.Thanks.

Start Machine Learning


REPLY 
Jason Brownlee December 3, 2019 at 4:48 am #

Well done!

REPLY 
Ipsita Dalai December 3, 2019 at 5:50 pm #

Thanks

https://machinelearningmastery.com/load-machine-learning-data-python/ 21/25
1/2/2021 How To Load Machine Learning Data in Python

REPLY 
Jason Brownlee December 4, 2019 at 5:33 am #

You’re welcome.

REPLY 
Alam Noor December 4, 2019 at 3:30 am #

Dear Jason,
How I can load .rek dataset in python? please comment if possible. Thanks

Start
Jason Brownlee December 4, 2019 at 5:46 am # Machine Learning ×
REPLY 

I am not familiar with that file type, sorry. You can master applied Machine Learning
without math or fancy degrees.
Find out how in this free and practical course.

REPLY 
Alam Noor December 5, 2019 at 1:30 am #Email Address

Thanks Jason
START MY EMAIL COURSE

REPLY 
Jason Brownlee December 5, 2019 at 6:42 am #

You’re welcome.

REPLY 
hima December 26, 2019 at 3:33 pm #

how to load image dataset in python code

REPLY 
Jason Brownlee December 27, 2019 at 6:30 Start
am #
Machine Learning

Perhaps start here:


https://machinelearningmastery.com/how-to-load-and-manipulate-images-for-deep-learning-in-
python-with-pil-pillow/

And here:
https://machinelearningmastery.com/how-to-load-convert-and-save-images-with-the-keras-api/

REPLY 
nazm May 21, 2020 at 12:06 am #

https://machinelearningmastery.com/load-machine-learning-data-python/ 22/25
1/2/2021 How To Load Machine Learning Data in Python

hi jason, i am a fresher with no experience. how can i learn data science. can you suggest me a
roadmap? that will be helpful for me.

REPLY 
Jason Brownlee May 21, 2020 at 6:20 am #

Right here:
https://machinelearningmastery.com/start-here/

REPLY 
Aanya October 4, 2020 at 12:11 am #

hey jason, Start Machine Learning ×


i actually wanted to use some specific columns in a csv filecan
You formaster
loadingapplied
the data into aLearning
Machine machine learning
model. can you help me out. without math or fancy degrees.
Find out how in this free and practical course.

Email Address REPLY 


Jason Brownlee October 4, 2020 at 6:53 am #

Yes, load the data as normal, then select the columns


START you want
MY EMAIL to use, or delete the
COURSE
columns you do not want to use.

If you are new to numpy arrays, this will help:


https://machinelearningmastery.com/gentle-introduction-n-dimensional-arrays-python-numpy/

And this:
https://machinelearningmastery.com/index-slice-reshape-numpy-arrays-machine-learning-python/

REPLY 
Aanya October 4, 2020 at 6:22 pm #

Actually the data set i am using has data of two types of signals. i dont want to delete
the columns. i want to use “the columns of one type of signal” in one model the other in the
second one.
please do tell me if you can help me out
Start Machine Learning
thank you tho

REPLY 
Jason Brownlee October 5, 2020 at 6:51 am #

You can use the ColumnTransformer, for an example see this tutorial:
https://machinelearningmastery.com/columntransformer-for-numerical-and-categorical-data/

REPLY 
Dan October 7, 2020 at 4:50 pm #
https://machinelearningmastery.com/load-machine-learning-data-python/ 23/25
1/2/2021 How To Load Machine Learning Data in Python

Hi!! is it possible to cluster the similar rows of a csv file ( 2 columns) together using nlp. If yes
could you please guide me with a post to help with the code.

REPLY 
Jason Brownlee October 8, 2020 at 8:28 am #

Yes, sorry, I don’t have an example of clustering for text data.

Leave a Reply

Start Machine Learning ×


You can master applied Machine Learning
without math or fancy degrees.
Find out how in this free and practical course.

Email Address

START MY EMAIL COURSE


Name (required)

Email (will not be published) (required)

Website

SUBMIT COMMENT

Welcome!
I'm Jason Brownlee PhD
and I help developers get results with machine learning.
Start Machine Learning
Read more

Never miss a tutorial:

Picked for you:

Your First Machine Learning Project in Python Step-By-Step

https://machinelearningmastery.com/load-machine-learning-data-python/ 24/25
1/2/2021 How To Load Machine Learning Data in Python

How to Setup Your Python Environment for Machine Learning with Anaconda

Feature Selection For Machine Learning in Python

Python Machine Learning Mini-Course

Save and Load Machine Learning Models in Python with scikit-learn

Start Machine Learning ×


You can master applied Machine Learning
Loving the Tutorials?
without math or fancy degrees.
Find out how in this free and practical course.
The Machine Learning with Python EBook is
where you'll find the Really Good stuff.
Email Address
>> SEE WHAT'S INSIDE

START MY EMAIL COURSE

© 2020 Machine Learning Mastery Pty. Ltd. All Rights Reserved.


Address: PO Box 206, Vermont Victoria 3133, Australia. | ACN: 626 223 336.
LinkedIn | Twitter | Facebook | Newsletter | RSS

Privacy | Disclaimer | Terms | Contact | Sitemap | Search

Start Machine Learning

https://machinelearningmastery.com/load-machine-learning-data-python/ 25/25

You might also like