Impure Guide

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

a guide to impure

the human-internet interface


Unlocking, transforming, storing, distributing, and switching about are ways of revealing. from The Question Concerning Technlogy Martin Heidegger

Impure is an internet application that allows people to be part of the information revolution. With impure is possible to get information from very different sources; from user owned data to diverse feeds in internet, including social media data, real time or historical nancial information, images, news, search queries and many more. Impure is a tool to be in touch with data around internet, to deeply understand it. Within a modular logic interface you can quickly link information to operators, controls and visualization methods, bringing all the power of the comprehension of information and knowledge to the not programmers that want to work with information in a professional way. Among other possibilities, impure allows you to: - easily read data from diverse sources and repositories - load your own data(text les, excel, xml...) - visualize it in a wide range of ways (more than 100 visualization methods so far) - process it.. compare it... mix it.. lter it... (more than 300 controls and operations so far) - publish and share your projects

impure is intuitive; you dont have to type any code. Instead, you will link modules to create information ows, that begins with feeds or inputs and ends with processed data or visualizations. In the middle, interactive controls allow you to choose or modify parameters and to see the results immediately.

impure makes sense of internet / impure is your brain inside internet / the internet tool / talk with internet / the knowledge tool

Basic functionality
Impure is about modules that can be connected. Each module have receptors and some have an emitter. Each receptor or emitter receives or sends an specic type of data, represented by in icon. Data types are also called data structures.

Among the most simple data structures that can be shared between modules are: String, Number, List and NumberList. These are also the most common. There are different types of modules: data structures, operators, controls and visualizators. You can search them on the library, and drag them to the impure space.

You can search modules in the library by selecting tags. Modules that are more related with the selection (because they have all or some of the selected tags) will appear in rst positions.

data structures modules A module that contain information. They only have an emitter.

Though there exists many data structures, only some basic data structures can be placed as modules on an impure space. In the library you will recognize them by a small arrow on their icons.

operators Modules with one or more receptors that perform some operations and return the result on one emitter.

controls (under the tag control) Modules that allow some kind of interaction or executes some complex task (such a download data). Among controls a very useful type of modules are the apis, that search information in different places on internet and create data structures.

visualizators Modules that receive information and visualize it. They can also allow interaction and some visualizators have an emitter.

An impure le have modules that bring information (loaders, apis, data structures with typed information), modules that process information (operators, controls), and modules that visualize information (visualizators).

inheritance

Some data structures inheritate from other. For instance NumberList inheritate from List. That means that NumberList its also a List, and each receptor that works with List will also accept NumberList (and StringList, DateList, Table, ...) The example of the image shows how a NumberList is reversed. The operator reverseList receive a List and off course it can reverse any kind of List, including a NumberList, a StringList or a Table. Thats why sender and receiver icons doesnt match.

impure code Each impure le has a code that you can copy to clipboard by clicking the blue C button (top right). You can also open the code editor with the red C button. This is useful to easily share impure les. You can send the code by mail. To compile a code paste it into the code editor and click the compile arrow on the top. Some examples presented here come with code, so you can compile them easily.

universal operators (aka polymorphism)

Some operators receive and give Object as data structure. It actually means that the operator allow different types of data structures, and the data structure it receives determines the data structure it returns. The best example is the operator addition which is able to operate a lot of different data structures combination. Among them: Number Number Number Number NumberList NumberList (adds the rst number to all values on the NumberList) String String String (concatenation) String StringList StringList (concatenates the rst String to all the Strings on the StringList) Point Polygon2D Polygon2D (vector sum of the Point to all the Points on the Polygon2D) etc...

Another interesting universal operator is interpolation that will try to create an interpolated object with two objects and a number between 0 to 1. The data structure of the obtained object is off course the same that the two given objects.

required and optional parameters

Modules with receptors may have required and optional parameters. Only when required parameters are fullled the module will be able to work. Otherwise operators will return null value, and visulizators will not depict nothing. Optional parameters have grey icons.

remarkable tags

Some tags group interesting and useful modules and worth a look:

api

Modules under this tag allow communication with internet services such as google, twitter, wikipedia, xignite (nance and market info), delicious, etc... These modules are crutial because they are the main interface between impure and internet.

generator

Operators with this tag creates complex structures from simple parameters. They are useful to quickly introduce data to the space without needing of load it.

coder and decoder

lter

Operators that take an structure and creates a new one that contains less information without transformation.

conversion

Operators that transforms structures preserving information.

advanced

Operators that perform advanced tasks.

universal

Operators, controls and visualizator that deal with any (or many) data structures.

vector

Data structures under this tag are vectors. Under this tag you nd operators and controls that deal with vectors. Vectors are a special type of data structure, often numerical, and which allow operations such as multiplication of a vector by a number, or addition between vectors. Some vectorial operators are very useful such as interpolation. If an structure is a vector (a Rectangle or NumberList, for instance) you can take two of them and calculate a data model in between. You can also use the ExponentialConvergence to create continual transformation over data. Thatsparticular useful when applied to visualizators that read vectors.

How to quickly bring data to impure


There are three main ways to bring data to an impure space: # - using an api (under the tag api) Using an api is one of the fastest ways to obtain rich structured information from internet. For instance, you can download a delicious account, all the images -with their tags- from a ickr set, the historical market behavior of a company, the occurrences of some word on twitter accounts last month, etc...

- generating data (under the tag generator) Some operators under the tag generator build data structures, such a random network (you choose the number of nodes and relations), a NumberList with the same repeated number. Generators are useful to do quick tests.

- loading data from les (FileLoader module) Perhaps you have tables in excel les or csv format, or some text you want to analyze. Impure allows you to load the le using the FileLoader module and then you might decode it. If the text le you loaded is in csv format you have to link the FileLoader module to the decoder operator

How to quickly visualize data


If you already have placed data into an impure space, chances are that you can visualize it immediately. You have only to put in space the appropriate visualizator and link the data to it. Of course the visualizations you can choose depends on the type of data you bring to the impure space. Some visualizators require more than one data structure. Lets see some common information structures and some linkable visualizators.

NumberList #

! Histogram

two NumberLists #

! ColorScatter

NumberTable #

! SimpleNumberTableVisualizator

Network #

! Oracle

Tree #

! ZoomableTreeMap

How to process information


Sometimes you want to go further than picking a data le and visualize it using a specic visualization method. Impure allows much more complex processes with more interesting results: - to transform data (by operations, normalizations, ltering, parsing, comparing or combining different sources...) - to use interactive controls that allow parameters or data changes - to combine controls and visualizators to build an integral interactive space

Working with numbers


basic numeric structures: Number, NumberList, NumberTable

Numbers, besides being the most common data raw material, are highly important because they are used as parameters for many operators, controls and visualizators. You will often use the Number module. NumberLists are Lists (see inheritance, above), and that means it has a lot of available operators besides the specic ones for these data structure. Among the most important operators you can apply to NumberLists are all the normalizations (to maximum, to sum...) that creates a new NumberList, average, mean and standard deviation whose results are numbers. You may might want to compare two NumberLists (two companies market behaviors, trends, country indexes) in order to try to gure out if their are correlated. Yo can then use a Scatter or a ColoredScatter take a glance to correlation. You can also measure its Pearson correlation value, the most common index of statistical correlation. NumberTables are very common, its easy to obtain one from an excel le or a .csv le (this le format is pretty common nowadays because it is often the one to be used to share tables data in open data repositories). But not every Table is made of numbers, some of them are a mix of different kinds of data. You can choose the numerical lists and assemble a NumberTable, but much more easy is to use the NumberTableFromTable operator that will lter all the non NumberLists of the Table. A NumberTable is depictable in several ways.

Working with texts


basic string structures: String, StringList basic operators: splitString, occurrences

Handle text operations is quite useful, because a lot of queries are made of text. On the other hand text analysis is the basis for social media analysis. What people are saying about something? How often? How to know which assets are being related with some product in the social media ecology? After reading contents from blogs, news, twitter and so on, you might want to analyze the obtained result. Some options are: count the occurrences of a word, count how much two words occur in the same sentence or paragraph, compare the entire lexicon used by two different sources, visualize the common words used in different resources,... Text analysis is also the key for data mining, parsing and ltering. And when it takes to parse data, these operators become very useful:

stringTransformations firstTextBetweenStrings getAllStringsBetweenTwoStrings replaceSubstring

..building a network from a StringList... ...rematkable works...

Working with lists


basic lists structures: List, Table, NumberList, StringList

A lot of data structures inheritates from List. And that means that all these structures are able to be operated as Lists. Thats why Lists are so important. Almost for every data structure it exists a type of List that contain that kind of elements. Different types of List have different functionalities. For instance its possible to seacrh Nodes in ListNode using the id of the Node. NumberLists have a lot of statistical and math operators. Things you will want to do with Lists: - assamble a List (that is: take some data structures and create a List with them) - add elements on a List - remove elements of a List - concatenate two Lists - obtain a single element from a List - obtain a group of elements of List - sort a List (using different sorting criteria)

Some of the basic operators for a List are: getElementFromList getSubList getSubListFromPositions pushToList

Working with tables


basic lists structures: Table, NumberTable

Tables are Lists of Lists. By convention Tables lists are interpreted as columns. The fact of a Table being a List might be confusing for those who are used to work with tables as excel sheets. In excel, columns and rows are treated as equals. In Impure Tables, rows doesnt exists a priori! (off course they can be obtained, in fact generated). So, if you want to obtain a specic list (a column) theres no getListFromTable operator, instead you have to use getElementFromList, because Tables are Lists! This scheme reveals the structure of a Table, and put it clear that a Table is a List of Lists:

The image clearly shows that a Table may have Lists of different lengths. This might not be the most common case, but could happen. Use the TableAtAGlance for quickly visualize the structure of a Table and its contents.

Some basic and useful operators for Tables: getElementFromList to obtain a List (column) from a Table getSubList to obtain a sub-Table and any List opertor more specic for Tables: getRow getElementFromTable

Working with networks


basic network structures: Node, NodeList, Network, Tree

Networks are complex structures, at least more complex than lists or tables. But they extremely important. Neurons, genes, phones, people are structured by networks not by lists. One issue when working with networks is that its not easy to nd network data, in opposition to numerical or text data. Major data repositories (see Where to nd interesting and specic data? below) offer data in table formats such as .csv. Formats for networks, instead are rare. We expect the arise of social networks and apis for services such as Facebook, will make data networks standards more common. In the meanwhile impure allows you to load the most common and simple network data formats: .gdf and grapML. But what is much more interesting is that impure comprehend several ways to create networks. Fists of all, when you download a RSS feed, the generated data structure is a Network instead of a table or a list. The nodes of this Network are contents and tags. From this Network you can create a new Network of contents (creation relations whenever two contents share tags) or a Network of tags (creation relations whenever two tags share contents). With a table with two lists and a NumberLists that somehow express the proximity between the pairs of the list you can build a Network. From a NumberList its also possible to create Networks: comparing each pais of NumberLists and calcultaing its statistical correlation. or its distance, and creating relations when these values surpass a threshold. In general terms, when you have a distance criteria for certain type of elements you can build a Network from a List of these elements. See working with texts for an exampe on how to build a Network from a StringList.

Working with graphic elements


basic graphic structures: BitmapData, ColorList, ColorScale

Among graphic elements colors are the most important, because they are useful to represent quantities. Some visualizators such a SimpleMap use colors to represent values. In many cases its optional. Colors help also to element differentiation. Its easy to create a ColorList or a ColorScale for a NumberList.

Working with geographic information


basic geographic structures: PointGeo, Region, RegionList, FrameGeo, Country

Working with time information


basic dates structures: Date, IntervalDate, DateList, IntervalDateList

How to use internet search to obtain valuable data


Multi-search is a powerful technique to obtain information from internet. Search engines such as google give you a list of webpages with occurrences of the string you typed. Sometime you want to obtain more information, and also its possible that you dont want to nd information about an specic concept, word or idea, but about a set os related concepts. Imagine you want to know about the similarities and differences between each pair of internet browser applications. The conventional way is to search a webpage in which someone else already did the task and made this comparison. You have to be lucky. In the other hand it is much more likely that a lot of people made specic pair comparisons. If we can ask google to return all the pages in which two browser are compared we will obtain much more valuable information. But how to do that? It would be great if we can type something like this: [browser_brand_here] compaired with [browser_brand_here] or: [browser_brand_here] is faster than [browser_brand_here]

Or, you can type all the possible combinations. With 5 browsers its not a huge task (only 20 possible pairs without comparing browser with their selfs). But results come in separated windows.. and what if you want to compare 50 car models (2450 different search operations)?. Off course you can do that automatically with impure. First you load or type the list of browsers. Then you combine this List, creating a table with two Lists in which it exists all possible combination of pairs. You take the rst StringList and you add (addition operation) it with the String compaired to, you obtain a new StringList that you will add to the second StringList of the Table. You add a quotation to the StringList, and yo add the result to another quotation mark. Now you have a StringList with all the sentences of the kind: [browser_brand_here] is faster than [browser_brand_here] Quotation marks are important because they mean tha the seach will be strict, and google will return webpages in which the complete sentences are nded. You link this StringList to the InternetMultiNSearchResults.

advanced concepts: internet perception internet distance externality remarkable words )

This great cartoon by http://xkcd.com depict some of the questions its possible to ask to internet using google search.

Working with urls


urls are strings, so you will work with all the string operators.

Where to nd interesting and specic data?


Available well structured and commented data is increasing dramatically. 2010 saw the appearance of many great data repository, most of them being part of transparency policies lead by some governments. This is the open data revolution we are witnessing. Since its easy to nd tons of interesting data, if you have a specic need it could be very hard, even impossible. The coming years we will see the appearance, growing and enhancement of data repositories with relevant information about many realties (not only country specic information). This a list of increasing factors, with positive interaction between them: - people producing information - people needing to share information - people and institutions needing information - people and institutions needing tools to understand and add value and knowledge to information - people warned about the relevance and they rights of information being public - people learning new tools to understand, process, visualizae and publish information - institutions sharing and publishing information (by ethical or political means, by necessity and strategy, or by social pressure)

This is a list of great data repositories: http://delicious.com/moebio/+%5BdataRepository%5D A good place to explore when it takes to nd interesting data to play with it on impure is the Guardian data blog: http://www.guardian.co.uk/news/datablog At Bestiario we are creating a modest but diverse and well organized collection of data les: http://delicious.com/datarepository Some of these are used in the impure examples.

understanding examples 1

You have always access to a list of examples. Some are very simple other are quite hard to understand from a rst glance and require very attentive information ow following. You will open and try to understand all the examples except for the ones contained in the advanced folder, following this order:

- basic - visualizators - apis - controls - operators

In each case it will be easy to follow the data ow, from the loaders, apis or generators, continuing with operators and controls, and nishing with visualizators.

Exercices

exercise 1: from an excel le

Open a excel le. It should have a simple structure: a table with rows and columns. For this exercise purposes it should contain at least two columns lled with numbers (except for the rst cell that could be the header).

Export this table to a CSV le, and put it on a server or on folder. You have to identify the path to this le. Create a new impure space, add a String module and paste the path to the .csv le.

Now place a FileLoader (tags: control, loader), and link the Strong Module to it. Once the le is loaded plug a csvToTable operator (tags: decoder, tables). If the rst row of the table are headers, link a Boolean module to the second parameter with true as value. So far the impure space is something like that:

In the near future it will exist table visualizators, so far you can only visualize NumberTable, with is a Table lled with numbers. So, you have to lter the table, taking only the number lists and rebuild a new table. There is an operator that do all the job: getNumberTableFromTable (tags:tables, numeric, lter). Finally, place a SimpleNumberTableVisualizator (tags: tables, numeric) and like getNumberTableFromTable to it. Place Boolean modules to change the optional parameters of the visualizator.

exercise 2: your facebook friends graph Follow these directions in order to obtain a .gdf le that contains your facebook graph (friends and relations): http://thepoliticsofsystems.net/2010/03/22/netvizz-facebook-to-gephi/. You can load a .gdf of a facebook group too.

(about the .gdf standard: http://gephi.org/users/supported-graph-formats/gdf-format/ and by the way you can also download gephi application wich is great for networks visulization, and is free )

Simple load the le to the impure space and visualize it with some of the network visualizations impure have. The most impressive is the ForceDirectedNetworkVisuallizator, although it allows representation up to 300 nodes.

exercice 3: assets perception on blogs

As commented above, impure could be very useful to read the web, in particular to understand what people perception is about a lot of things. This exercice is an example on how to read and understand the social media. Choose two comparable concepts, objects names, or people names. Choose a list of adjectives you expect to t in these concepts, objects or persons -called items from now.. We gonna ask internet what people are saying abut these items, which of the adjectives are being used with each of them and how much it is used. Place a string module, with the name of the item, a blank space, the world is and a nal blank space. Something like this:

(be aware of the space after the is)

In other String module write a list of adjectives, or text that complete the sentence xxx is . For instance: greed, an inspiration, great, the best company, the future, innovative, falling, growing

Each part is separated by a blank space and a choma. Split this string using the splitString operator (tags: basic, strings, lists). By the default the separator string is ,, which is the one we need here. The result is a StringList with all the adjectives or objects.

Then use the addition operator (tags: numeric, universal, basic), and add the String xxx is to the new StringList. This operator will concatenate the rst string to each string in the StringList creating a new StringList in which every string has the structure x is y. This StringList is a list of queries; for each query we want to know how many occurrences there are in internet. The proportion of this occurrences will give you an idea of what people (at least internet content-publisher people) thinks about something. Of course it depends a lot of what adjectives and objects you had choose. Connect the StringList to the InternetMultiNSearchResult control (tags: control, loader, search, strings). It will search each string and create a NumberList with the occurrences of each one. This numbers are the same you see as search results below the search bar when you perform a search task with google.

Finally use a CirclesTagCloud or a Quadrication to visualize the results (link the StringList and the NumberList).

exercice 4: number lists comparison

A very common structure is Table with a rst column (list) lled with names or objects, and the others with numbers. If the Table have only two lists: names and values, you can use the Histogram, the CirclesTagCloud or the Quadrication to visualize it, in this cases you have to use the getListElement (remember that Table is a List o Lists, so if you read the rst element of the Table it will be the rst List, and so on). Once you have a Table with objects or names in the rst List, and two or more NumberLists, you may want to use some kind of comparative visualization method. In order to perform this exercise you have to nd a table with a list of names and two or more number lists. This data could be interesting: http://www.guardian.co.uk/news/datablog/2010/apr/12/maternal-mortality-rates-millenniumdevelopment-goals

The main idea in this case is to try to gure out if budget and revenues are correlated or not. You load and decode the table. The result should be a Table with at least three Lists. Then you place three getElementFromList operators on the space, each one picking a different List, the rst one being a StringList and the other two NumberLists. Now you have two interesting an immediate options. If values determine a sorting, you may be interested in see how each NumberList dene a different sorting and to compare the two sorting lists. In the example of maternal mortality its interesting to see the list of countries sorted from lees to more mortality in each year, and try to visualize which countries remained in the same positions and which ones changed. For that purpose use the ComparativeSort visualizator.

For instance, from a csv le:

title;classication;year;budget;gross;duration;audience Aliens;1;1986;18.5;81.843;137;8.2 Armageddon;2;1998;140;194.125;144;6.7 As Good As It Gets;2;1997;50;147.54;138;8.1 Braveheart;1;1995;72;75.6;177;8.3 Chasing Amy;1;1997;0.25;12.006;105;7.9 Contact;2;1997;90;100.853;153;8.3 Dante's Peak;2;1997;104;67.155;112;6.7 Deep Impact;2;1998;75;140.424;120;6.4 Executive Decision;1;1996;55;68.75;129;7.3 Forrest Gump;3;1994;55;329.691;142;7.7 Ghost;3;1990;22;217.631;128;7.1 Gone with the Wind;3;1939;3.9;198.571;222;8 Good Will Hunting;1;1997;10;138.339;126;8.5 Grease;3;1978;6;181.28;110;7.3 Halloween;1;1978;0.325;47;93;7.7 Hard Rain;1;1998;70;19.819;95;5.2 I Know What You Did Last Summer;1;1997;17;72.219;100;6.5 Independence Day;2;1996;75;306.124;142;6.6 Indiana Jones and the Last Crusade;2;1989;39;197.171;127;7.8 Jaws;2;1975;12;260;124;7.8 Men in Black;2;1997;90;250.147;98;7.4 Multiplicity;2;1996;45;20.1;117;6.8 Pulp Fiction;1;1994;8;107.93;154;8.3 Raiders of the Lost Ark;2;1981;20;242.374;115;8.3 Saving Private ryan;1;1998;70;178.091;170;9.1 Schindler's List;1;1993;25;96.067;197;8.6 Scream;1;1996;15;103.001;111;7.7 Speed 2:Cruise Control;2;1997;110;48.068;121;4.3 Terminator;1;1984;6.4;36.9;108;7.7 The American President;2;1995;62;65;114;7.6 The Fifth Element;2;1997;90;63.54;126;7.8 The Game;1;1997;50;48.265;128;7.6 The Man in the Iron Mask;3;1998;35;56.876;132;6.5 Titanic;2;1997;200;600.743;195;8.4 True Lies;1;1994;100;146.261;144;7.2 Volcano;2;1997;90;47.474;102;5.8

I took the rst, the fourth and the fth lists, and the result was:

Other approach is to try to depict whether or not two NumerLists are correlated. A classic method to do that is the Scatter. But if the two NumberLists are sorted (chronologically, for instance) you can visualize the relation between values with ColoredScatter.

----exercise 5: comparing twitter trends [!] twitter apis doesnt work yet

understanding examples 2
Now you are prepared to understand the examples in the advanced folder. Some of them have tricky operations, but in general terms you will be able to understand the general structure, and to see how complex operations could be performed combining the existing ones. reverse engineering is not that easy with impure. But you can always save the examples as new les and modify them. You will recognize some of this examples becuase they are evolutions of some the exercises you already did.

You might also like