INFS3603 Assignment Stuff

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Data Scientist profile results place Victor Dang and Rishikesh Akolkar in the Technologist

category, Shaira Rahman into the Modeller category and Gerry Zhang into the Data
Wrangler category. At a closer inspection on each of these categories:

- A Technologist could leverage programming knowledge and skills to collaborate with


other programmers to develop AI and predictive analysis machines. They could also
utilise a wide pool of resources and technologies at their disposal to mine for useful
data that could then be used for insights and then those insights could be used to
make important strategic decisions by the executive and higher level teams of an
organisation. Technologists could also help the team to understand what different
datasets mean, so that other members could collaborate and use their expertise to
use the datasets for their own work. For example, technologists can use their
advanced JAVA or Python programming skills to create data pipelines (Anderson
2018), which will enable the amalgamation of data from different sources and then
allow this data to then be used by multiple users for analytic purposes as well as the
creation of visualisations (Alley 2018).
- A Modeller could use their attention to detail skills to read and analyse data
extracted by the Technologists, and then utilise their expertise in
mathematic/graphical techniques and various software (e.g. SAS Visual Analytics) to
develop visualisations. From these visualisations, insights can be extracted and then
used to analyse consumer behaviour and make strategic impacts/decisions. One
challenge that might arise for Modellers would be how to assess the volume, variety,
veracity and variety of the data extracted. Within a vast pool of data, it may be the
case that a lot of time is spent cleaning and maintaining data rather than producing
valuable solutions (IBM 2019).
- A Data Wrangler can collaborate in the data analytics team through their ability of
gathering requirements, defining the dataset as well as making sure the data is set
out and accessible in a readable format by all the other users. This is when expert
problem solving skills come in handy to generate and combine readable data from
many different sources that will fulfil and satisfy the “veracity” of the end product
(IBM 2019). Similar to the Modellers, assessing the volume, variety, veracity and
variety of the data might pose a challenge within the data analytics team.

(Anderson 2018) - https://www.oreilly.com/ideas/data-engineers-vs-data-scientists


(Alley 2018) - https://www.alooma.com/blog/what-is-a-data-pipeline
(IBM 2019) - https://www.ibmbigdatahub.com/infographic/extracting-business-value-4-vs-
big-data

Alley, G 2018, ‘What is a Data Pipeline’. Alooma, accessed 30 March 2019,


<https://www.alooma.com/blog/what-is-a-data-pipeline>

Anderson, J 2018, ‘Data engineers vs. data scientists’. O’Reilly, accessed 30 March 2019,
<https://www.oreilly.com/ideas/data-engineers-vs-data-scientists>

IBM 2019, ‘Extracting business value from the 4 V’s of big data’. IBM: Big Data &
Analytics Hub, accessed 30 March 2019,
<https://www.ibmbigdatahub.com/infographic/extracting-business-value-4-vs-big-data>
Morris, A 2004, 'Is this racism? Representations of South Africa in the Sydney
Morning Herald since the inauguration of Thabo Mbeki as president'. Australian
Humanities Review, no. 33, accessed 11 May 2007,
<http://www.australianhumanitiesreview.org/archive/Issue-August-2004/morris.html>.

Rowland, TA 2015, 'Feminism from the Perspective of Catholicism', Solidarity: The


Journal of Catholic Social Thought and Secular Ethics, vol. 5, no. 1, accessed 12
December 2015, <http://researchonline.nd.edu.au/solidarity/vol5/iss1/1>.

Cite the following information:

 author(s) name and initials


 title of the article (between single quotation marks)
 title of the journal (in italics)
 available publication information (volume number, issue number)
 accessed day month year (the date you last viewed the article)
 URL or Internet address (between pointed brackets) 

Gerry Zhang
You are a Data Wrangler @mangothecat All great analysis starts with a dataset. Or rather
starts with data in multiple locations, in different formats, languages and timezones. You
understand that defining the question and the approach to creating insight stems from
getting the data into a useable format. You extract, manage and combine data from a
variety of sources in a highly efficient manner. It is a bottom-up approach that fully
immerses you in problem solving, as it is in the detail where understanding of a system can
be gained.

Shaira Rahman
You are a Modeller. By creating quantitative descriptions of your data, you create insight
that is a key deliverable for your team. You interpret the meaningful reasons for features in
a dataset. You also pay attention to the detail of underlying assumptions, limits and
exceptions when describing a system. You are familiar with a variety of mathematical
methods for describing dynamic systems and are highly skilled in using software that
implements these. You use a variety of graphical and numeric techniques to verify that you
are delivering a high-quality result that can be used to predict and optimise future
performance.

Convert data gathered to insights which can then be used for strategic decision making

Rish Akolkar and Victor Dang


Data science profile: Technologist. Never satisfied with good enough, you find the best tool
to aid with every challenge. Since every challenge is different, it is often faster and more
efficient to use technologies that have been created elsewhere, rather than reinventing the
wheel. You are continually interested in exploring how evolving tools and techniques can
add value to the data science workflow. Modern multi-purpose programming languages
provide the perfect environment to stand on the shoulders of giants and truly see further
than others. You know how to use many different technologies, which allows you to educate
your team on possible ways to interrogate a dataset. It is the value this approach provides
to the team that really allows you to focus on using novel approaches to problem solving
that would be impossible building tools from the ground up.

The relationship between various variables such as price and host review rating i.e. What
factors most influence the quality of the customers stay as measured through stay reviews.

The above Geo Map visualisation displays the concentration of higher star reviews/ratings in
the Dupont Circle, Langston, and Capitol Hill vicinities and a concentration of lower star
reviews in areas that are further away from the DC Capital region. This shows that AirBNB
customers tend to gravitate towards and prefer to stay in areas that are closer to the
Washington DC Capital city tourist attractions e.g. The United City Capitol, Lincoln Memorial,
Washington Monument, and the famous White House building itself. From a strategic point
of view, this can enable AirBNB to target and invest more marketing funds towards areas
highlighted in the green bubbles on the visualisation (as above).

From the above visualisation, it can be seen that review ratings are more “hot” for
properties that are priced between $900-$1100. This could be due to the fact that most
tourists who rent properties through AirBNB tend to have a budget within that range and it
would not be worth going for the highly prices properties as it is possible that they already
offer the same amenities as the properties within mid-range prices ($600-$1200). From a
strategic point of view, this can allow AirBNB to understand customer behaviour and quality
of their stay based on pricing standards—provided that they are also aware of the value of
properties with different amenities that are offered e.g. air conditioning, heating, view and
elevation from the property, etc.
From the above visualisation, it can be seen that “Apartment” is the most popular property
in the “entire home” room type category and “House” is the popular property choice for
“private room” room type followed closely by townhouse in both sub-categories. The
reasons for these results could relate to the higher elevation (for the view) for apartment-
type properties and the provision of sufficient amenities (at an affordable price range
compared to houses). From a strategic point of view, this can enable AirBNB to invest
marketing funds to attract more customers towards Apartments, Townhouses and Houses.
This insight could also prove useful in the case of incentivising more Apartment
hosts/owners to rent their properties to AirBNB tenants/tourists.
At a closer look into the “Rating Based on Property Type grouped by Room Type”
visualisation (reference marker set at 500 for Review Rating score variable), “Condominium”
and “Guest Suite” are mid-tier choices for properties.

You might also like