Digital Assignment1 - Openrefine DC

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

Digital Assignment – 1

Distributed Computing

Name : M .Ramya Reg No : 17mis1148

Course : Distributed Computing Course Code : SWE4003

Faculty : Prof .Maheshwari N Semester : WIN2020

Choose a tools relevant to distributed system in any components in domain like


networking/analytics concept. Tool should be open source, download and installing
tools with procedure of steps to be maintained.

OpenRefine :

OpenRefine (previously Google Refine) is a powerful tool for working with messy
data: cleaning it; transforming it from one format into another; and extending it
with web services and external data.

OpenRefine always keeps your data private on your own computer until us want to
share or collaborate. Your private data never leaves your computer unless you want
it to. (It works by running a small server on your computer and you use your web
browser to interact with it)

Big Idea of OpenRefine:

What -A messy, unstructured, inconsistent dataset can be explored using open


refine. In general, it will be very difficult to explore data through redundancies and
inconsistencies.But, OpenRefine gives several functions through which one can
filter the data, edit the inconsistencies, and view the data. It’s a tool to clean the
data.

Why -Spreadsheets can also refine a dataset but they are not the best tool for it as
Openrefine cleans data in a more systematic controlled manner. While using
historical data, we come across issues like blank fields, duplicate records,
inconsistent formats and using Openrefine tool can help to resolve such issues.

When -Now data analysis play an important role in business. Data analysts
improve decision making, cut costs and identify new business opportunities.
Analysis of data is a process of inspecting, cleaning, transforming, and modelling
data with the goal of discovering useful information, suggesting conclusions, and
supporting decision making. So, to ensure the accuracy of our analysis, we have to
clean our data

What can OpenRefine do:

 Cleaning messy data: for example if working with a text file with some
semi-structured data, it can be edited using transformations, facets and
clustering to make the data cleanly structured.[8]
 Transformation of data: converting values to other formats, normalizing and
denormalizing.
 Parsing data from web sites: OpenRefine has a URL fetch feature and jsoup
HTML parser and DOM engine.[9]
 Adding data to dataset by fetching it from webservices (i.e. returning json).
[10] For example, can be used for geocoding addresses to geographic
coordinates.[11]
 Aligning to Wikidata (formerly Freebase[12]): this involves reconciliation -
mapping string values in cells to entities in Wikidata.
 Data Normalization
 Column Reorganization
 Faceting and Clustering
 Tracking Operations
 Exporting Data
Why OpenRefine is a better tool?

OpenRefine Strengths and Weaknesses:

Strengths:

1. OpenRefine is a desktop application. It opens in the browser as a Local


Webserver. So, the data is safe and it doesn’t get uploaded to the Google
server.
2. It has facets which is used to filter the data into subsets and these clusters
can be customized and organised into meaningful data.

3. It has a Browser based interface, and so can handle more data efficiently.

4. Openrefine has a strong feature in extending data – user can use it to find
Meta Data and it can be used to correlate with it.

Weakness:

1. The UI of Openrefine is not user friendly. Although the features and


functions are strong, the UI make Openrefine looks boring. Besides, in the
visualization, the function is not scalable. For instance, Openrefine give user
a view of data, but the image is not big enough to figure out complex
distribution.

2. Unfortunately Google has removed support for this tool, making few of its
features redundant.

How to install and run Refine

OpenRefine is a desktop application in that you download it, install it, and run it on
your own computer. However, unlike most other desktop applications, it runs as a
small web server on your own computer and you point your web browser at that
web server in order to use Refine. So, think of Refine as a personal and private
web application.

Requirements

1. Java JRE/JDK installed (If you are running a 64 bit operating system, then
it's recommended that you install 64 bit Java)
2. A Supported OS: Windows, Linux, macOS

NOTE: On Windows we do NOT support Cygwin, MSYS2, or Git Bash for


running OpenRefine, instead just use Windows Terminal

Release Version

 OpenRefine requires you to have a working Java JRE, otherwise you will
not be able to start OpenRefine. (the commmand window will just open and
close quickly after you double click on OpenRefine.exe)
 Download OpenRefine here.
 Install it as detailed below for your operating system
o Windows
o macOS
o Linux
 As long as OpenRefine is running, you can point your browser at
http://127.0.0.1:3333/ to use it, and you can even use it in several browser
tabs and windows.
 If you're running a proxy or get a BindException, you can change the IP
configuration with -i and -p, see Running & Configuration below, or use
refine -help for options.

Windows

Install: Once you have downloaded the .zip file, uncompress it into a folder
wherever you want (such as in C:\Open-Refine).

Run: Run the .exe file in that folder. You should see the Command window in
which OpenRefine runs. By default, the Command window has a black
background and text in monospace font in it.

Shut down: When you need to shut down OpenRefine, switch to that Command
window, and press Ctrl-C. Wait until there's a message that says the shutdown is
complete. That window might close automatically, or you can close it yourself. If
you get asked, "Terminate all batch processes? Y/N", just press Y.

Upgrading: If you upgrade to a new version of OpenRefine, you may need to


update your workspace (update reconciliation links, for example). Remove your
workspace.json file located at a data storage location. This file will be regenerated
when OpenRefine starts.

MacOS

Install via Disk Image: Once you have downloaded the .dmg file, open it, and
drag the OpenRefine icon into the Applications folder icon (just like you would
normally install Mac applications). If you get a message saying "Open Refine can't
be opened because it is from an unidentified developer" you will need to open
System Preferences and go to "Security and Privacy" and the General tab. Here
you will see a message indicating that "OpenRefine was blocked from opening
because it is not from an identified developer". Click the "Open Anyway" button to
complete the OpenRefine installation. (for details WHY you have to do this, see
Issue #2191. Note that in macOS Catalina the message shown has the additional
text "macOS cannot verify that this app is free from malware", but the reason for
the message and the solution is the same)

Install via Homebrew: Follow our detailed Homebrew installation guide, or


follow quick steps below:

1. Install Homebrew from here


2. In Terminal enter

brew cask install openrefine


3. Then find OpenRefine in your Applications folder.

Run: To launch OpenRefine, go to the Applications folder and double click the
OpenRefine app. You'll see the OpenRefine app appear in your dock.

Shut down: You can switch to the OpenRefine app (clicking on its icon in the
dock) and invoke its Quit command.

See also: Cannot install on Mac OS X 10.8 (Mountain Lion) - "Google Refine" is
damaged and can't be opened. You should move it to the Trash

If you use Yosemite you will need to install Java for OS X 2014-001 first.

Obtaining server logs on Mac

Sometimes it is useful to access the OpenRefine server logs to understand the


cause of an issue. Here are the steps to run OpenRefine in a terminal on MacOS:

 Find the OpenRefine app/icon in Finder


 Ctrl+Click on the icon and select "Show Package Contents" from the context
menu that displays
 This should open a new Finder menu showing a folder called "Contents" -
navigate into this folder then into the "MacOS" folder
 Ctrl+Click on "JavaAppLauncher"
 Choose 'Open With' from menu, and select "Terminal"

Linux

Install / Run: Once you have downloaded the tar.gz file, open a shell and type

tar xzf openrefine-linux-2.7.tar.gz


cd openrefine-2.7
./refine

This will start OpenRefine and open your browser to its starting page.

Shut down: Press Ctrl-C in the shell.

Running & Configuration


By default (and for security reasons) Refine only listens to TCP requests coming
from localhost (127.0.0.1 on port 3333). If you want to respond to TCP requests
coming to any IP address the machine has, run refine like this from the command
line:

./refine -i 0.0.0.0

On macOS, you can add a specific entry to the Info.plist file located within the app
bundle

(/Applications/OpenRefine.app/Contents/Info.plist):

<key>JVMOptions</key>
<array>
<string>-Drefine.host=0.0.0.0</string>

</array>

Windows Installation Procedure :

1.Make sure that java jdk and jre is installed with supported os like windows
,linux,mac.

2. go to firefox or chrome in https://openrefine.org


3.Download for windows kit
4.Once you have downloaded the .zip file, uncompress it into a folder wherever
you want (such as in C:\Open-Refine).

5.Run the .exe file in that folder. You should see the Command window in which
OpenRefine runs. By default, the Command window has a black background and
text in monospace font in it
6.then in that specified web server the open refine will be opened .

So we create project or insert dataset and apply text filter and deleted all
unnecessary data or hide data for secure purpose and export data after data
cleaning which is useful for user.
Conclusion : Hence Openrefine can be easily been installed but When you need to
shut down OpenRefine, switch to that Command window, and press Ctrl-C. Wait
until there's a message that says the shutdown is complete. That window might
close automatically, or you can close it yourself. If you get asked, "Terminate all
batch processes? Y/N", just press Y. So by the following procedure we can
download and install the open refine successfully.

You might also like