Lecture6 DataV

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 12

Acquiring Data

By Dr. Shaheera Rashwan


Where to Find Data

• The first tool in seeking data should be a good search engine.


• Effective searching is a matter of using the proper keywords. Think about
terms that will help specify the information you’re looking for, and the
format that it might be in.
• Government web sites are often useful sources of data because the
information collection is paid for by the public (by way of federal taxes),
and is therefore owned by the public and freely available, often without
copyright.
• On the other hand, some organizations make their data available through
publicly documented and supported APIs. The big search engines have
SDKs for running queries from all manners of programming languages.
2
Tools for Acquiring Data from the Internet-Wget and
cURL
• Processing provides the loadStrings( ), loadBytes( ), and loadImage( ) methods, all of which can
handle either local files or data from http:// addresses
• Two common utilities associated with grabbing data from the Web are GNU Wget
(http://www.gnu.org/software/wget) and cURL (http://curl.haxx.se).
• At the most basic level, each can be used to download the contents of a web page (or other online
objects, such as jpg or swf data) and save it to a file.
• The following syntax downloads the cover image for this book using each application:
• wget http://www.oreilly.com/catalog/covers/9780596515935_cat.gif
• curl http://www.oreilly.com/catalog/covers/9780596515935_cat.gif > image.gif
• Wget defaults to writing a file using its original name, whereas cURL sends output to the console, so
in this case, we write that to image.gif by redirecting it.

3
NcFTP and Links
• Other utilities such as NcFTP (http://www.ncftp.com/ncftp)
and Links (http://links.sourceforge.net) can be used to
download streams from URLs, or in the case of NcFTP,
efficiently download entire directories from an FTP server.
• Links is primarily a text web browser but can be used from
the command line as a replacement for Wget or cURL if
neither is available.

4
Locating Files for Use with Processing
• The most common data source is a file placed in the data
folder of a Processing sketch.
• For example:
• String[] lines = loadStrings("blah.txt");
• where blah.txt is a file that has been added to the data
folder.
• Files can also be located at specific Uniform Resource
Locators(URLs), for instance:
• loadStrings("http://benfry.com/writing/blah.txt");
• Loading from URLs is less useful when running as an applet.
5
Specifying Output Locations
• Global String variable named sketchPath specifies the absolute path to the
sketch folder. Like dataPath( ), this can be used to interface to other methods
that require a full path.
• The savePath( ) method operates like dataPath( ) and prepends the sketchPath
value to a filename or path supplied as a parameter. It also creates any
intermediate folders if they do not exist. The following example uses all
three:
• println(sketchPath);
• println(dataPath("filename.txt"));
• println(savePath("path/to/subfolder/item.txt"));
• which outputs:
• /Users/fry/sketchbook/path_example
• /Users/fry/sketchbook/path_example/data/filename.txt
• /Users/fry/sketchbook/path_example/path/to/subfolder/item.txt
6
Loading Text Data
• To read a file as lines of text, use the following:
• String[] lines = loadStrings("beverages.tsv");
• Because the loadStrings( ) method also automatically
handles loading files from URLs, the file could be
loaded directly online via:
• String[] lines =
loadStrings("http://benfry.com/writing/series/beverages.tsv");

7
• try {
For Large Files • // Get the file from the data folder.
• BufferedReader reader =
• When files are very large, it createReader("toobig.txt");
may be more useful to read • // Loop to read the file one line at a time.
one line at a time from the file
so that the data can be • String line = null;
processed into a more useful • while ((line = reader.readLine( )) != null) {
intermediate format
• The Processing • println(line); // Just print each line of the
createReader( ) function file
creates a BufferedReader
object from a file in the data •}
folder, an absolute path to a • } catch (IOException e) {
local file, or from a URL.
• This example loads a file
• e.printStackTrace( );
named toobig.txt and reads it •}
one line at a time:
8
Parsing Large Files As They Are Acquired
• Rather than read a large file into • try {
memory and then parse it, it’s often • // Get the file from the data folder.
better to parse the data while it’s • BufferedReader reader = createReader("manyfloats.txt");
being read. • // Loop to read the file one line at a time.
• In such cases, you can collapse the • String line = null;
Acquire and Parse steps of the • while ((line = reader.readLine( )) != null) {
process together for greater • // Split the line at TAB characters.
efficiency. • String[] columns = split(line, TAB);
• For instance, if a line of data is made • // Convert the String array to a float array.
up of a few dozen columns of • float[] numbers = float(columns);
numbers with decimals, each line • // ... do something here with the numbers array.
can be read (the Acquire step) and
• }
converted immediately to a float
• } catch (IOException e) {
array (the Parse step), allowing you
to discard the String for the line • e.printStackTrace( );
itself: • }
9
Listing Files in a Folder

• A common use of the File object is to list files in


a directory, which is handled with the list( )
method of the File class:
File folder = new File("/path/to/folder");
String[] names = folder.list( );
if (names != null) { // will be null if inaccessible
println(names);
}
10
References
• Book of the Course – Chapter 9

11
End of Lecture

12

You might also like