Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

COMP1682: Final Year Project

An investigation into whether a Company that is more active on Social Media is more
‘successful’ than one with less of an online presence

Student Name

000111111

Supervisor: Supervisor's Name

Word count: 12,104

BSc Business Computing

A dissertation submitted in partial fulfilment of the University of Greenwich

Abstract

The Project aims to discover whether Companies that use Social Networking as a marketing tool for connecting
with Customers have a better reputation and/or are more profitable than a Company without a, or with a very
small, online presence.

The report will focus on researching the development of Customer Relationship Marketing throughout the last
forty years, which has resulted in Sentiment Analysis through Data Scraping and Text Mining, and
acknowledges the various programs a Developer could use to visualise the collected Data through these
methods in the 21st Century. Moreover, the report will examine methodologies used in the planning and
development stages of the Product Lifecycle.

1
Acknowledgements

The project created wouldn’t have been possible without many great people that have been encouraging me
throughout the last four years, thank you.

Special thanks to my supervisors for encouraging and helping me throughout the development of this
project, you are both very powerful female role models, and I am grateful to have been in your class this
last year.

Also a ginormous thank you to my family for bearing with me as the first person to go to University: my
Mum for always ensuring I was okay and offering snacks where necessary, my Dad for always believing in
me from the moment I told him I wanted to go into Computer Science, telling me I could do anything from
the moment I started school.

My sisters also require extra recognition for providing trips to Byron Burger and McDonalds, and
grammar/spelling checks whenever I’ve needed them.

2
Table of Contents
1 SECTION 1: INTRODUCTION............................................................................................................................. 5
1.1 BACKGROUND ................................................................................................................................................... 5
1.2 CLIENT INFORMATION ....................................................................................................................................... 5
1.3 APPROACH ......................................................................................................................................................... 5
2 SECTION 2: LITERATURE REVIEW ................................................................................................................ 6
2.1 APPROACH TO LITERATURE SEARCHING ............................................................................................................ 6
2.2 IDENTIFYING THE PROBLEM ............................................................................................................................... 6
2.3 INITIAL DISCUSSION .......................................................................................................................................... 7
2.4 CUSTOMER RELATIONSHIP MANAGEMENT ....................................................................................................... 7
2.4.1 Benefits of CRM with Social Media Marketing ........................................................................................... 8
2.5 SOCIAL CRM..................................................................................................................................................... 8
2.6 CURRENT LEADERS WITHIN SOCIAL MEDIA MARKETING ................................................................................. 9
2.7 SOCIAL MEDIAS............................................................................................................................................... 10
2.8 BEST POTENTIAL TIMES OF YEAR TO INVESTIGATE ........................................................................................ 11
2.9 TEXT MINING AND SENTIMENT ANALYSIS ...................................................................................................... 11
2.10 CONCLUSION ................................................................................................................................................... 13
3 SECTION 3: PRODUCT RESEARCH................................................................................................................ 14
3.1 TWITTER .......................................................................................................................................................... 14
3.2 HOOTSUITE INSIGHTS ...................................................................................................................................... 14
3.3 SEMANTRIA FOR EXCEL BY LEXALYTICS ........................................................................................................ 15
3.4 WEIGHTED SCORING MODEL .......................................................................................................................... 16
4 SECTION 4: LEGAL, SOCIAL, ETHICAL AND PROFESSIONAL ISSUES AND CONSIDERATIONS 17
5 SECTION 5: REQUIREMENTS AND METHODOLOGY .............................................................................. 18
5.1 REQUIREMENTS ANALYSIS .............................................................................................................................. 18
5.2 COMPARISON OF SYSTEMS ............................................................................................................................... 18
5.3 FUNCTIONAL REQUIREMENTS ......................................................................................................................... 18
5.4 NON-FUNCTIONAL REQUIREMENTS ................................................................................................................. 18
5.5 METHODOLOGY ............................................................................................................................................... 20
5.6 JUSTIFICATION OF THE SUITABILITY OF A METHODOLOGY OR A FRAMEWORK FOLLOWED ............................. 21
6 DESIGN................................................................................................................................................................... 22
6.1 UML USE CASE DIAGRAM .............................................................................................................................. 22
7 DEVELOPMENT PROCESS ............................................................................................................................... 23
7.1 STAGE 1: INITIAL FETCHING PYTHON CODE ................................................................................................... 23
7.2 STAGE 2: IMPLEMENTING THE SEMANTIC ANALYSIS ...................................................................................... 27
7.2.1 Iteration 1 .................................................................................................................................................. 27
7.2.2 Iteration 2 .................................................................................................................................................. 27
7.2.3 Iteration 3 .................................................................................................................................................. 28
7.3 STAGE 3: COLLATING THE DATA IN EXCEL ..................................................................................................... 28
7.4 STAGE 4: CREATING DATA VISUALISATIONS .................................................................................................. 29
7.5 TIME BOX ........................................................................................................................................................ 32
7.6 POTENTIAL ALTERNATE APPROACHES ............................................................................................................ 33
8 TESTING ................................................................................................................................................................ 33
9 EVALUATION ....................................................................................................................................................... 34
9.1 EVALUATION OF PRODUCT .............................................................................................................................. 34
9.2 SELF-EVALUATION .......................................................................................................................................... 35
10 CONCLUSION ....................................................................................................................................................... 37
10.1 FINDINGS REGARDING REPUTATION ................................................................................................................ 37
10.2 FINDINGS REGARDING PROFITABILITY ............................................................................................................ 40
11 BIBLIOGRAPHY .................................................................................................................................................. 41
12 APPENDICES ........................................................................................................................................................ 43

3
12.1 APPENDIX A: FIGURES..................................................................................................................................... 43
12.2 APPENDIX B: CONTEXTUAL REPORT ............................................................................................................... 44
12.3 APPENDIX C: PYTHON JUPYTER NOTEBOOK CODE ......................................................................................... 46
12.4 APPENDIX D: SEMANTIC ANALYSIS CODE ...................................................................................................... 47
12.5 APPENDIX E: EXCEL SPREADSHEET FORMULAE .............................................................................................. 49
12.6 APPENDIX F: TABLEAU DATA VISUALISATIONS.............................................................................................. 51
12.7 APPENDIX G: TESTING .................................................................................................................................... 55

Tableau Public of Dashboards:


https://public.tableau.com/profile/georgia6424#!/vizhome/DataVisualisations/GatheringStats

4
1 Section 1: Introduction
1.1 Background

It has always been important for businesses to market their products, and within the last decade the public have
begun to realise how social media has become ever more important in the advertising of businesses. People
use Social Media to express any opinion, emotion, concern or complaint they have regarding the products and
services they use. Most of the time, consumers are posting these because they want a response of some kind
from the general public or from the company themselves.

This project will attempt to identify if there is a link between a business with a good online strategy, and
whether this has an effect on their overall reputation – and the amount of profits made – in comparison to the
profits of businesses who use social media less often, or perhaps not at all.

In the modern digital age, with many online shoppers belonging to the Millennial or Gen-Z generations,
businesses are attempting to increase their Social Media standings as they believe a good social media presence
leads to an increase in rapport, reputation, and therefore an increase in profits. The project will create a data
set from posts on social media, analyse the positivity or negativity of the sentence, and transform this data into
visualisations to display the emotions of various companies’ audiences.

1.2 Client Information

The client for this project and report is an up and coming high street-based clothing retailer. They are
determining whether to invest in an online presence in a bid to increase their profitability as this would
involve hiring customer service staff to monitor social medias, photographers to upload high quality posts,
which therefore would require a lot of time and money.

1.3 Approach

This report begins by discussing a literature review in Section 2, and proceeds to investigate similar products
that exist on the market in Section 3. This is followed by the possible Legal, Social, Ethical, and Political
considerations in Section 4, and a discussion of requirements and the methodology used for development in
Section 5. Section 6 concentrates on the design plans for the project, Section 7 will demonstrate the steps taken
during each stage of the development process. Section 8 reviews the testing and verification of the project’s
capabilities, Section 9 is centred around the evaluation of the project as well as the Developer. Section 10 then
concludes with the findings of the overall project.

5
2 Section 2: Literature Review
2.1 Approach to literature searching

The research undertaken for the project has been a combination of various online and physical sources, to
identify any trends in Social Media Marketing/Social Customer Relationship Management, or assumptions
that already exist within the market sector. As the area of social media marketing is a relatively new topic
within technology, there is not an abundance of broad physical texts supporting or negating any argument
towards it. In fact, many books/physical texts regarding the topic are focused on the formulae used to analyse
data found, as opposed to having an opinion for or against it being used. The online sources, such as journals,
articles, and other websites allow for much more modern material to be sourced, regarding a wider variety of
opinions and studies in recent years; hence, the majority of the sources and materials throughout the literature
review are online-based.

2.2 Identifying the problem

Within modern-day society, it has become a monthly ordeal to witness a brand make a business-faux pas.
Following this:
1) Most news outlets would discuss it in detail.
2) The public, customers or not, then share their opinions and outrage on their personal social medias, or
the social media of the company.
3) The business would most likely issue an apology and delete all evidence of said faux-pas.
4) The world would move on.

In 2018 alone, Snap Chat lost almost $2 billion from celebrities sharing their newfound distain for the company
regarding some UI changes as well as some un-ethical adverts being approved (Bullock, 2018). IHOP in the
US lost millions of dollars changing their branding back from IHOB to IHOP after the new name change to
“International House of Burgers” was received horrifically worldwide, resulting in customers and numerous
companies mocking them for attempting the change (Tobin, 2018) (Roberts, 2018). There was outrage in
January towards H&M’s model choices regarding animal themed sweatshirts in their kids collection, resulting
in a loss of profit, and protests/riots in some of their stores throughout South Africa (Fortin, 2018).

Since the rapid development of technology, information is significantly easier to discover in the modern day
than it was as little as a decade ago, which has enabled the global phenomenon of mob mentality to spread.
This has meant that a marketing mistake which may have been glossed over in the past, is placed under the
microscope and analysed in the present day. This mob mentality or outrage tends to have an ongoing effect on
a Business’s reputation, and therefore all businesses must be especially cautious with their marketing and
social media.

6
Figure 1. (YPulse, 2018)

Noticeably, according to a Survey on Brand Trust by YPulse, the brands that have the highest loyalty ratings
among Millennials/Gen-Z are ones that tend to advertise on social media and have a large online presence,
such as Nike (Joseph, 2017), Oreo (O'flynn, 2017), and M&M’s (Marketing Week, 2018), as well as the typical
household brands such as Amazon and Apple.

2.3 Initial Discussion

According to (Edosomwan, Kalangot Prakasan, Kouame, Watson, & Seymour, 2011), “social media can be
called a strategy and an outlet for broadcasting, while Social Networking is a tool and a utility for connecting
with others”, and aligns with the common business view that “If the goal for a business is to reach customers
where they are, a social media presence seems necessary.” (Adams-Mott, 2018).

This general consensus is what has led businesses to target their customers online, using social media
applications rather than traditional methods of advertising such as in a Newspaper, on the Radio, or on
Television. For example, research has shown that 17 to 35 year olds pick up their phones 50-75 times a day
(Eadicicco, 2015), while in a study released by the Music Business Association, only 12% of young-
Millennials listened to the radio, and 51% listened to streaming services such as Spotify or Apple Music
(McIntyre, 2016).

As well as this, 67% of Millennials pay for 1-3 streaming services as an alternative to paying for cable
television or watching TV channels which require a TV license, due to the freedom they have over what they
want to watch, the lack of advert breaks throughout, all for a vastly lower price than the typical household
would pay for cable television (Arnold, 2017).

2.4 Customer Relationship Management

Customer Relationship Management (CRM) has existed as an approach towards managing a Company’s
exchanges with current – as well as potential – customers using data analysis. Though the idea of customer-
based marketing systems were discussed throughout the 1980s, vastly brought about by Robert and Kate
Kestnbaum’s database marketing system (CRM Switch, 2013), the first system created with a CRM specialised
focus was not until Siebel Systems in 1993.

7
This new system and its ideals were well received by revolutionisers, as (Gates, 1999) once said “how you
gather, manage and use information will determine whether you win or lose.” This idea of collecting and
analysing information, such as customer-based trends, paved the way for the Customer Relationship
Management we see today, i.e. “Customers engaging in a conversation on the social media brand page of an
enterprise expect attention and resolution to their concerns from the enterprise just as they would on a
traditional CRM channel such as phone or email.” (Ajmera, Ahn, Nagarajan, Verma, & Contractor, 2013).
Due to this, it could be argued that the social media interactions you see customers having every day with
businesses online is one of the most effective uses of CRM in the modern day.

2.4.1 Benefits of CRM with Social Media Marketing

There are several beneficial aspects of Customer Relationship Management:


1) It enables businesses to treat each customer individually, as opposed to as a group, due to the collection
of personal data unique to each person.
2) It helps to manage and centralise customer data to one place.
3) It helps determine what aspects of the business are working well and what aspects have room for
improvements to be made.
4) It increases employee productivity, as the system gathers the information on customers itself, and
therefore all the employee has to do is utilise this information during the interaction.

2.5 Social CRM

Social Customer Relationship Management involves the use of social media services and technology to enable
businesses to better communicate with their customer base, such as discovering the amount of web traffic,
number of “followers” on specific social medias, as well as the amount of mentions a company may receive
on these sites (Castronovo & Huang, 2012). While traditional CRM was developed in the 80s, Social CRM
has only started to become a topic of conversation in the last decade due to the increase of users on social
media throughout this period of time.

It’s important to note, however, that while traditional CRM is based around “collecting and managing current
customer data”, Social CRM is “more of a strategy for customer engagement”, as it enables businesses to track
sales communications, as well as interactions on social media. Social CRM has helped to create a path to
potential customers by talking to them on their own preferred social media platform, and assists them in sharing
their own experiences with thousands of others online (Rouse, 2017).

8
Figure 2. Google Trends

2.6 Current Leaders within Social Media Marketing

As of October 2018, the ‘High Street’ brands that have been classed as “the most successful on UK social
media”, according to Red Hot Penny, are more expensive retailers, such as Nike, Adidas, Tommy Hilfiger,
Doc Martens, and Marks & Spencer. These brands “demonstrate they know their audience across all channels
and can actively engage them in a natural way.” (Red Hot Penny, 2018). This may, however, stem from the
“bandwagon effect” (Coleman, 2003), as the brands are all long-term household names, therefore there may
be a slight bias. This leaves a gap in the market, i.e. for cheaper alternatives, that digital companies can engage
with.

If you were to investigate these five companies and exclude the two with the highest price tags (Tommy
Hilfiger and Doc Martens), the brands each have over 30,000 tweets on their respective company Twitters.
Nike (@nike), for example, have over 35,500 tweets, despite their last four tweets posted (as of 2019) being
two months earlier, November 2018. This is because their Twitter account spends hours replying to Customer’s
tweets, negative or positive, to show their customer-base that they are looking out for them as a way to build
loyalty.

On a much larger scale, Marks & Spencer (@marksandspencer), have over 416,000 tweets, and throughout
their working day (8am until 10pm) reply to hundreds of customers, directing complaints to their direct
messaging inbox or email addresses so the customer’s concerns don’t go unnoticed, replying to comments of
praise, and informing customers on general queries such as store opening times and when products are likely
to be back in stock.

9
These interactions are what has enabled a good level of trust in a company for its customers and are what
inspires them to include the business’s social medias in their day to day life, despite never having met the
person on the other side of the screen, put simply by the Economic Times, “your audience does care about you
if you are of any help to them” (Thiagarajan, 2018).

Figures 3&4. M&S Twitter

2.7 Social Medias

The most imperative part of Social CRM is ensuring that the business is using the correct social media to reach
their target audience, “Several years ago, the idea of social media marketing was mostly limited to Facebook
and Twitter. In recent years, this type of marketing has expanded to include popular sites such as Pinterest,
Instagram, Snapchat, YouTube and Tumblr.” (Adams-Mott, 2018)

For the purpose of this project, three social medias were analysed for their potential: Instagram, Facebook, and
Twitter. Instagram tends to be short comments, sometimes even emojis – which is encouraged in their comment
section (Fingas, 2018), and therefore doesn’t give enough data to look at sentiment analysis. Facebook contains
developer features that allow data to be recorded through their Graph API, however since the Cambridge
Analytica/Facebook outrage in 2018, Facebook revoked access to all tools used to scraping data from groups
and pages (Y, 2019), this has made it incredibly difficult to gain access, as you must provide a working
prototype of your program before Facebook will grant access. Due to this, Facebook will not be used for
analysis.

Twitter has always been used by businesses for its analytics, so much so that Twitter developed its own
“Analytics” website in 2014 (Edwards, 2014), it is also the simplest to scrape data from as they allow web
developer based Apps, as well as being the easiest to view user’s opinions through the UI of the website itself.
With its recent increase to 280 characters, Twitter is the perfect platform to analyse consumer opinions, and
hence will be the sole Social Media platform for this project.

10
2.8 Best Potential Times of Year to Investigate

There are several times in a year when customers are more likely to shop, and therefore businesses are more
active on social medias and put out marketing strategies in an attempt to entice business. It could be said that
the best time of year to investigate customer’s feelings towards a business is the festive period, as it is one of
the key times a year that businesses compete over British consumer’s shopping, with the average Christmas
spend increasing year on year in the UK, and the country’s yearly spend reaching far above the European
average (Clarke, 2017).

Around the festive period, social media is used to share a company’s Christmas campaign or advert, and it can
become an annual event to a business, i.e. John Lewis’s annual Christmas advert, which begins to gain
momentum every year at least a month before the advert is released (Google Trends, 2008). Other times of
year where engagement is raised are events such as end of season sales, Black Friday, Valentine’s Day, Easter
Weekend, and the Summer Holidays.

Businesses must consider the impact of these events in order to achieve optimum social media reach within
their market sector. In fact, social media analyst James Lovejoy stated, “by being quiet or not paying attention
to what’s happening on social, many brands are becoming blind to how they’re being discussed online and the
way social affects [them].” (Ilyashov, 2015).

2.9 Text Mining and Sentiment Analysis

The project will be using text mining to scrape information from Twitter for analysis. As there is limited time
for the project to be created, Machine Learning cannot be used as this requires more data, as well as training
data, therefore Linguistic Rules are the more sensible choice. There are many benefits to Linguistic Rules,
such as fast analysis, easy to spot irregular data, and granular analysis to break up the data into smaller sections
such as phrases or words/emojis used, all of which are beneficial for a project of this scale (Huddy, 2017).

After mining the data, sentiment analysis will be used to determine whether the overall feel of the message is
positive, neutral, or negative, using Python. This would then enable a conclusion on the overall opinion of a
company by its consumers, which in turn could lead to strategizing business/marketing decisions; Pak and
Paroubek explained this well, stating “As more and more users post about products and services they use, or
express their political and religious views, microblogging web-sites become valuable sources of people’s
opinions and sentiments. Such data can be efficiently used for marketing or social studies.” (Pak & Paroubek,
2010).

11
Figure 5. YouGov “How Good Is Good?”

12
2.10 Conclusion

After reviewing the writing of various sources through websites, journals, et cetera, an overwhelming majority
of the material found demonstrates the credibility of the development of the project, that of which will be taken
forward into its development stages. Throughout the research information was discovered surrounding the
prospect of CRM, and in addition to that the modern version, Social CRM, as well as their benefits. The
companies with the best online marketing strategies in recent years, and the advantages of using different
Social Media for marketing based on their customer base were also looked at.

The research confirmed that the ideal social media to take forward for analysis is Twitter, as it gives the widest
consumer reach, as well as confirming the use of linguistic rules to analyse the sentiment of the data collected.
Research also confirmed the best times of the year to complete the data set for this project, however it may not
be possible to collect data at these times as the project schedule is January to April, and if the project were to
be created using data from the Festive Period 2018, this information may be outdated by the deadline of the
project. Therefore the data will be collected when the project reaches that particular point in development,
estimated to be around March or April.

13
3 Section 3: Product Research

As social media marketing is still an emerging market, there are various other products that exist which enable
a business to review their social media reach, including how many people look at, engage in, and talk about
their posts.

The Usability Criteria for the Project states:


1) User must be able to access posts from Twitter.
2) User must be able to see the Sentiment Analysis regarding these posts.
3) User must be able to view Data Visualisations of these findings.

3.1 Twitter

One example of this is Twitter’s own ‘Twitter Analytics’, which everyone has access to within the Twitter
website. This enables Users to investigate the “number of impressions”, “number of engagements”, and
“engagement rate (impressions divided by engagements)” per Tweet sent, as well as how many people clicked
the Tweet itself, clicked on the profile after seeing the Tweet, and ‘Retweeted’, replied, and ‘liked’ it.

Figure 6. Twitter Analytics

While this gives detailed analysis on individual Tweets – and if the User has a Business account activated
allows them to see the engagement for all their tweets combined – Twitter Analytics doesn’t allow any form
of sentiment analysis as its limited to quantitative metrics as opposed to qualitative.

Whilst it is possible to use Twitter’s “advanced search” in this case to look up key words for sentiment analysis,
this is cumbersome and tedious as the developer would have to comb through every possible word they would
want to search for, and manually form the overall result from the different outcomes.

3.2 Hootsuite Insights

Another product that could be perceived as similar to the Project is Hootsuite Insights, an additional function
of the analysis tool Hootsuite. This analysis tool gives a breakdown of different measures, such as number of
mentions per social networking site, geo-distribution, language, gender, and the average sentiment of the
messages towards the company.

Hootsuite also allows the business to connect their accounts for over 35 social networks, such as Facebook,
Twitter, YouTube, Google+, LinkedIn, Instagram, and Pinterest; there’s little doubt that Hootsuite are the
market leaders in Social Networking Analytics.

14
Despite this, there are three conceivable disadvantages to Hootsuite’s product that don’t fit the usability
requirements for the Project in Development, the first being they don’t allow viewing of other company’s
statistics; the majority of the figures they offer to the User, they must be signed in to view. Furthermore, the
second being the cost of the marketing system; the ability to look at ‘Custom analytics’ i.e. sentiment analysis,
causes the cost per month for their services to rise steeply from £25 a month to £99, which depending on the
liquidity of the business and their expenditure may be out of question. Finally, the third disadvantage is similar
to that of Twitter Advanced Search, such that if the business wanted to investigate Sentiment Analysis, they
must input the words, phrases, and emojis that they wish to search for themselves, as opposed to the product
having a word-bank of positive, neutral, and negative words that the system could immediately fetch from.

Figure 7. Hootsuite Insights

3.3 Semantria for Excel by Lexalytics

Semantria was founded in 2011 in an endeavour to make Sentiment Analysis available for wider audiences.
The idea of the business is to analyse the polarity of social media posts towards the company using entity
extraction and categorisation, meaning the company can get an in-depth display of customers emotions towards
them.

15
Figure 8. Lexalytics

The benefits of using Semantria as opposed to Hootsuite or Twitter Analytics is that Semantria is said to
analyse not only the individual words in a sentence, but also the over gist of the sentence as a whole to enable
a deeper analysis. This is useful with Sentiment Analysis as it catches times when posts on Social Media may
have been written in a sarcastic way, which is typical in the modern day and age.

While Semantria has many beneficial qualities as a Social Media Analytics product, the layout of the tool is
particularly basic, only offering Category names and the number of responses related to it. For this reason,
Semantria does not meet the minimum Usability Criteria for the Project as it doesn’t provide any form of data
visualisation, such as graphs, or charts, which would make the information vastly easier to read and
comprehend.

3.4 Weighted Scoring Model

A weighted scoring model was developed to investigate the overall usability of these three products given the
Usability Criteria. The most important criteria were deemed to be browsing the data set, and viewing data
visualisations and these enable the most contact with the data, however filtering and sorting the data has also
been given a large weight within the model.

Requirement Score
Criteria Weight Twitter Hootsuite Insights Semantria
Browse data set 30% 100 50 0
Sort data by 10% 30 50 50
emotion
Sort data by time 5% 0 60 0
Sort data by gender 5% 0 50 0
Sort data by 10% 100 0 25
company
Filter data by word 10% 100 100 60
or phrase
View data 30% 0 60 0
visualisations
Weighted Scores 100% 53 51 13.5

To conclude, while the aforementioned programs all have their own benefits for use, none of them cover all
three usability criteria, as shown in the weighted scoring model by none of the products found during the
product research resulting in over 55/100, hence why the Project being created stands alone in its field.

16
4 Section 4: Legal, Social, Ethical and Professional Issues and
Considerations

As with any project being created online, the Developer must take serious consideration regarding any possible
issues that could come about from its inception. The General Data Protection Regulation (GDPR) that came
into force in May 2018 must be taken into account as the purpose of the project is to collect data from
individuals without informing them of it. However, as Social Networking sites are aware that many companies
collect data from their sites for marketing and other purposes, it is written into the Terms and Conditions of
these sites that User’s data may be collected. Despite this, it’s still socially and ethically ambiguous to collect
people’s data for these purposes without informing them about it, as the vast majority of the public will never
read the Terms and Conditions (Cakebread, 2017).

Due to the recent scandal involving Facebook and Cambridge Analytica, Facebook and many other social
medias have tightened their security involving applications made by third party users. This move was initiated
in good faith due to them wanting to protect their consumer’s data, however it does mean it has made it far
more difficult to mine data from these sites; with Facebook’s being the most lengthy process to get the
authorisation to collect this, even if the application is being created for scientific or educational purposes.

The Data Protection Act 2018 is the UK’s specific implementation of GDPR, this controls how a person’s
personal information is used by organisations or businesses. For example, a business must ensure that personal
information is “used fairly, lawfully, and transparently”, “used to specified, explicit purposes”, and “kept for
no longer than necessary” (Gov.uk, 2018).

It could be considered legally, socially, ethically and professionally elusive in the interest of this project, as it
is highly unlikely that the developer will contact the owner of every account to receive permission for using
their social media posts. If a person were to contact a social media site to have their data erased, which is well
within their rights since GDPR was implemented, they wouldn’t know if their data has been used by any other
party without their permission, therefore their information may still exist despite the person exercising their
“right to be forgotten” (Art. 17 GDPR).

There are professional considerations that should be taken into account, in which the specific data collected
houses customer’s opinions on various Competitors for the User’s brand. This information should not be
viewed by any employee that has no need to view it, as the raw data collected using Python code can include
usernames, names, and locations, and other personal information that social medias collect on their Users.

17
5 Section 5: Requirements and Methodology

5.1 Requirements Analysis

As the Project does not elicit third party Stakeholders, where a system may have established its requirements
through surveys and questionnaires, the Developer was unable to. Due to this, Requirements have stemmed
primarily from discussions with the clients, as well as investigating Competitor’s Systems, and discovering
the aspects that the Developer believes they could have added to further their programs.

5.2 Comparison of systems

To collect the data from Twitter, the best solution is to use a combination of third party applications and
Python, which allows the Developer to collect the exact data that they need and negate any parts of the file
they deem unnecessary.

After collecting the data, this needs to be exported, this enables the Developer to look over the data set and
find any anomalies as well as ensuring the data has been collected correctly. The best possible solution for this
is to use Excel to create the formulas for Sentiment Analysis, before uploading the data into a data visualisation
tool such as Tableau.

5.3 Functional Requirements

The functional requirements for this Project are as follows:


1) Users are able to easily browse the complete Data set (i.e. social media posts)
2) Users are able to sort Data by time
3) Users are able to sort Data by emotion
4) Users are able to sort Data by gender
5) Users are able to sort Data by Company
6) Users are able to filter data by specific words
7) Users are able to edit and remove Data
8) Users are able to view a variety of pre-made data visualisations for Data
9) Users are able to create their own data visualisations for Data
10) Users are able to export their Data Visualisations individually
11) Users are able to export their Data Visualisations as a whole
12) Users are able to upload Data sets for analysis
13) The Developer is able to upload Data sets for analysis

5.4 Non-functional Requirements

The non-functional requirements for this Project are as follows:


1) The Data must be kept securely
2) The Project must run smoothly with minimal interruptions
3) The Project must update itself if any new data is added to the Data set
4) A person’s individual Data must be able to be deleted if requested
5) The Project must be user friendly

18
A MoSCoW table should be provided in order to form the order of priority for the established
requirements. These are divided into “Must have”, “Should have”, “Could have”, and “Won’t have
(this time)”.

No. Requirement. MoSCoW?


1) Users are able to easily browse the complete Data set (i.e. social media posts) Must have

2) Users are able to sort Data by time Should have

3) Users are able to sort Data by emotion Must have

4) Users are able to sort Data by gender Could have

5) Users are able to sort Data by Company Must have

6) Users are able to filter data by specific words Should have

7) Users are able to edit and remove Data Could have

8) Users are able to view a variety of pre-made data visualisations for Data Must have

9) Users are able to create their own data visualisations for Data Should have

10) Users are able to export their Data Visualisations individually Must have

11) Users are able to export their Data Visualisations as a whole Must have

12) Users are able to upload Data sets for analysis Won’t have

13) The Developer is able to upload Data sets for analysis Could have

The amount of “Must have” prioritised requirements should conceivably never be over 60%, within
this project there are 6 “Must have” and 13 total requirements which gives a percentage of 46%, with
the “Should have” requirements compelling another 23%. Consequently, there shouldn’t be any
concerns regarding the timing of this Project.

19
5.5 Methodology

In order for the majority of projects to be successful, the project team should follow a suitable methodology
throughout its creation. The project will be developed using an iterative DSDM Atern methodology; this is
largely due to DSDM Atern’s eight principles that tie in with the ideals of the program:

Principle 1. “Focus on the Business Need”

As the project specifically regards a business and what they want to discover about a connection between social
media use and reputation/profits, it relates greatly to the first principle of DSDM “Focus on the Business
Need”. The programmers will need constant communication and feedback between themselves and the
company to assess the requirements, as well as establishing a deeper understanding into what the business
needs, predominantly through the MoSCoW technique.

Principle 2. “Deliver on Time”

Another principle of Atern, “Deliver on Time”, is relatable to the project, as there is a strict deadline of the
29th of April to have the project and reports written up and concluded. This can primarily be achieved through
setting conscientious deadlines to keep the project progressing fluidly and focusing on the business priorities
throughout.

Furthermore, another aspect of DSDM that regards time is the “80:20 rule”. As the project centres around
collecting data from social media, this could be regarded as a large percentage of the overall project, therefore
it is likely that 80% of the project can be completed in 20% of the time, and the remaining 80% of the time
can be used to add fine details and increase the functionality.

Principle 3. “Collaborate”

Working as a team throughout the project, and including the input and feedback of the client, ensures that the
best possible project is created. This enables:

1) An increased understanding of the task at hand and the expected outcome.


2) Greater speed in which the outcome is achieved.
3) A shared ownership between the creators of the project and the clients.

To ensure this is achieved, the project leader must actively involve everyone in the team to form a “one-team”
mentality, as well as empower them to make decisions on the project.

Principle 4. “Never compromise on Quality”

This principle states that the level of quality the project is to be completed to should be agreed upon at the start
of the planning, and all work completed ensures that level is met.

The Project Manager must consequently set the level of quality anticipated at the outset of the project’s
formation and ensure that quality does not become an eventual variable. As well this they must guarantee to
test the project early in development, and continuously after, which may be accomplished through several
methods such as functionality, usability, or performance testing.

20
Principle 5. “Develop Iteratively”

Similarly to principle four, principle five focuses on developing and testing the project through an iterative
development style. To achieve this the project manager, whilst iteratively developing, must ensure and
continually confirm that the correct system is being built. Then, if necessary, change the development of the
project to ensure it is achieving the best production possible by proceeding to experiment and evolve the
project.

Principle 6. “Build incrementally from firm foundations”

Principle six refers directly to the benefits of the business, by entailing that building incrementally allows the
project to be released sooner for the Client, which henceforth enables developers to understand the scope of
the business and relate the potential updates towards their operational requirements.

This is possible by establishing a large amount of the design up front before development begins, alongside
striving for an early delivery of the final product. It’s essential for the Developer to focus on what they want
to be produced – as opposed to how they plan on getting to it – throughout this.

Principle 7. “Communicate Continuously and Clearly”

As poor communication is often cited as the biggest single cause of project failure, the techniques and
principles associated to DSDM Atern are devised to improve communication between members of the
development team to ensure a successful business model.

The superlative way for the Project Manager to ensure this principle is maintained is by ensuring activities are
in place which encourage effective communication between the team, such as arranging workshops, and
requesting face to face documentation. Furthermore, interaction between the team and the shareholders, both
formal and informal, is imperative. This can be completed by keeping documentation, such as itineraries.

Principle 8. “Demonstrate Control”

The final principle entails that the team are in control of the project at all times. This is achieved by being
proactive when monitoring and controlling the progress made with the project, as well as being able to prove
at any time that they are in control. This should be established throughout the team by using appropriate levels
of formality on a day-to-day basis to ensure procedure, as well as managing contracts and other business needs
accordingly.

5.6 Justification of the suitability of a Methodology or a Framework followed

The primary reason DSDM Atern was chosen as the methodology for this operation is that the project conforms
to many of its principles, such as “focus on the business need” and “deliver on time”. The utilisation of this
methodology ensures that a specific product is created, which is established by the business’s needs, as well
as precise dates for completion of each aspect of the project. The vast amount of planning and details that
DSDM encourages certifies that the project would be completed on time.

21
6 Design
6.1 UML Use Case Diagram

The use case diagram illustrated below (Figure 9.) demonstrates the various actions a User can complete
through the code, as well as the limitations. As shown below, an Actor (User) is able to ‘Input Query’ for the
code to search for, ‘List’ the data, which includes being able to view a list of Tweets, as well as allowing
exportation of the data out of the program to a .csv file. They must also be able to terminate the program.

Figure 9. Draw.io

22
7 Development Process

7.1 Stage 1: Initial Fetching Python Code

During the first stage of development, the key task was to write the code which would allow the Client to
retrieve a set number of tweets from Twitter that include a certain word or phrase, decided by the Client. To
do this, a Twitter App is needed. As of November 2017, Twitter required all Users to create a Twitter Developer
account in order to create an App, in which they must agree to various legal and ethical requirements, and
specify the nature of the Application they are creating (Roth, 2018).

Figure 10. Twitter Developers, ‘Application Details’

Throughout the application process it was repeatedly stated that the Application was going to be developed for
Educational purposes, as this increases its chances of being approved. Pleasingly, the Application was

23
approved within five minutes of its submission and was then assigned various API keys and Access tokens to
enable a piece of code to access Twitter through a third-party system, in this case, Jupyter Notebook.

Figure 11. Twitter Developer, ‘Keys and tokens’ (Keys and tokens redacted)

Once the App had been approved and could then harvest data from Twitter, the keys and tokens could be
implemented within code. In keeping with DSDM’s sixth principle, “Build Iteratively from Firm Foundations”,
the basis for the project was to write the code that fetched the data from Twitter, ensure that it worked and
retrieved the types of data needed (such as whole tweet, time, and username), and then implemented the
sentiment analysis factor.

To assist with the creation of this project, a Twitter-based Python Library, Tweepy ("bliti", 2019), was
imported, which enables easy access to Tweets through Twitter’s API. Tweepy offers a vast number of code
snippets and tutorials in their documentation, which helps thousands to use their system to simplify the code
in which data is requested from Twitter, as well as specifying the exact variables required. Code by github user
ritvikmath was also used as the structure for fetching the Tweets (ritvikmath, 2018).

The initial code for this stage of the project was designed to check the Developer’s timeline as a precaution to
ensure the code was working – this being possible through Jupyter Notebook’s ‘cell’ feature – before enabling
the User to enter a word or phrase. The program will then search through the most recent 100 tweets that
include this query and output them on the screen for the user to browse, before exporting them in a .csv file
named after the User’s choosing.

Iterations

Iteration Date Code Result


1 24/2/19 import csv Fail

# Creating the API object while passing in Infinite loop of the


auth information 17 most recent
api = tweepy.API(auth) tweets

# Open/Create a file to append data


csvFile = open('asos3.csv', 'a')

#Use csv Writer

24
csvWriter = csv.writer(csvFile)

# Calling the user_timeline function with our


parameters
results = api.search(q=query, lang=language)

counter = 0

while counter != 500:


for tweet in results:
if (not tweet.retweeted) and ('RT @' not
in tweet.text):
#Write a row to the csv file/ I use encode
utf-8
csvWriter.writerow([tweet.created_at,
tweet.user.screen_name, tweet.text])
counter +=1

csvFile.close()

2 25/2/19 import csv Success, however,


removes emojis
# Creating the API object while passing in
auth information
api = tweepy.API(auth)

# Open/Create a file to append data


csvFile = open('asos5.csv', 'a')

#Use csv Writer


csvWriter = csv.writer(csvFile)

# Calling the user_timeline function with our


parameters
results = api.search(q=query, lang=language,
count=100)

for tweet in results:


if (not tweet.retweeted) and ('RT @' not in
tweet.text):
#Write a row to the csv file/ I use encode
utf-8
csvWriter.writerow([tweet.created_at,
tweet.user.screen_name,
tweet.text.encode('utf-8')])

csvFile.close()

3 25/2/19 import csv Success

# Creating the API object while passing in


auth information
api = tweepy.API(auth)

# Open/Create a file to append data


csvFile = open('asos3.csv', 'a')

25
#Use csv Writer
csvWriter = csv.writer(csvFile)

# Calling the user_timeline function with our


parameters
results = api.search(q=query, lang=language,
count=500)

for tweet in results:


if (not tweet.retweeted) and ('RT @' not in
tweet.text):
#Write a row to the csv file/ I use encode
utf-8
csvWriter.writerow([tweet.created_at,
tweet.user.screen_name, tweet.text])

csvFile.close()

Although the code was partially written by Tweepy, there were several lines that had to be changed, added, or
removed as they did not fit the specification for the project being created. The most essential part of this
involved the exportation of the tweets. It is believed that, especially in the modern day on social media, a large
part of semantic analysis is based around emojis and emoticons used by the public. As such, the code regarding
the export of tweets to a .csv file had to be edited to ensure that there was not any formatting within the process
that removed the emojis, or rendered them unreadable by a spreadsheet.

There was an initial issue when opening the documents created by the program. Excel was needed to create
the formulas required to analyse the tweets appropriately, however it is not capable of reading emojis from a
.csv file. As the program was developed using a MacBook, however, the Developer also had access to Numbers
from the iWorks package by Apple. Opening the .csv files in Numbers then exporting them to a .xlsx file to
then open in Excel was a perfect workaround for this, as .xlsx files can also be used in Data Visualisation
software.

Figure 12. Opening a .csv file in Excel

26
Figure 13. Opening the same .csv file in Numbers

7.2 Stage 2: Implementing the Semantic Analysis

7.2.1 Iteration 1

The next stage of Development was to write the necessary code to scan through each individual tweet and give
it a rating out of ten, depending on how positive or negative the tweet seemed. Using Python for coding quickly
became a nuisance due to the lack of a “case” or “switch” feature that coding in Java or C++ would have
possessed.

Although not particularly aesthetically attractive, the decision was taken to utilise various “if” statements that
would check each tweet individually within the .csv “for loop” and look for different common words that may
appear. These words were established initially through the YouGov “How Good is Good?” Diagram (Figure
5.), referenced in Section 2 of the report, then built upon with several articles on the sentiment of emojis
(Novak, Smailovic, Sluban, & Mozetic, 2015), (Lim, 2018), and (Brandwatch, 2019).

7.2.2 Iteration 2

After creating the initial groundwork for the code, it was run several times. Any tweet that was not giving a
semantic rating was analysed to see what language was used, and the context it was used in, which enabled
the Developer to increase the language to be analysed in the future.

Another issue highlighted was that a lot of emojis that were not mentioned in any of the research articles
were not being rated whatsoever. Using a website that counts every time an emoji is used in a tweet
(emojitracker, 2019), the Developer looked at the top thirty emojis being used worldwide, and viewed
several hundred tweets per emoji to look at the general context of when that emoji is used, before
implementing them in the code.

Once implemented, a large number of sampled Tweets would return with a semantic value attached.
However, when reflecting on the sampled data of the .csv files, it came to light that the true context of
Tweets often became misconstrued as the code could not always determine the implied message outcome.

Examples include the code classing a Tweet as negative due to the emoji chosen by the User, even though
the language used alongside it was in fact positive. In another instance, the code struggled to determine
whether the feedback was positive or negative, due to conflicting language used. The code valued the emojis
used as a 5/10, an indifferent unconcerned score, whilst the language chosen later in the Tweet scored a
much more positive 8/10 or 9/10.

27
7.2.3 Iteration 3

Due to these tweets with various potential meanings ‘slipping through the cracks’, it was decided that the code
should be edited to sum up the value of every word, phrase, or emoji/emoticon used.

For example, with the tweet:

“Thank you @hm for selling this amazing T-shirt


😩♥♥♥ Totally in love https://t.co/50tfmtlwNU”

The emoji ‘😩’ has a value of 5 as it can be used in both a positive or negative sense, the heart emoji has a
value of 8, and the word ‘love’ has a value of 9. Within iterations 1 and 2, the value of this tweet would have
recorded as 5, due to the emoji being the first thing mentioned in the tweet; once the code was edited with a
‘Counter’ and ‘Sum’ of the values, this tweet then returned a semantic value of (5+8+9)/3, for an overall value
of 7.3 recurring. After further operational runs of .csv files with this new version of code to look for any
language omissions, it was approved to be the final version.

7.3 Stage 3: Collating the Data in Excel

The excel file was designed to hold five columns of data that were deemed most important for analysis:
- The name of the Company being discussed
- The time of Tweet
- The username of the Tweeter
- The Tweet itself
- The semantic rating

As previously explained, due to emojis not formatting correctly within Excel the data set was initially compiled
within Numbers, and was then exported as a .xlsx file to be opened and sorted within Excel. The sorting and
filtering functions within the application meant that any tweets with no semantic value could be negated from
the data sample as they held no purpose in the analysis, and would also have interfered with the calculations
made within the document.

It was at this point that the data visualisations had to be planned, as the formulae created had to help visualise
the data. The chosen calculations were:
1) Average semantic rating of Tweets per Company.
2) Number of Semantic Tweets in the sample.
3) Number of Tweets per hour of the day.
4) Number of Tweets per Company per rating from 1-10.

These calculations were conveyed through a combination of “COUNT”, “COUNTIF”, “COUNTIFS”, “SUM”,
and “SUMIF” formulae that presented the data in numerical form within the Excel spreadsheet, next to the
data itself. These formulae are presented in Appendix E.

28
Figure 14. Screenshot of numerical data next to tweet data in Excel Spreadsheet

Using Excel to be able to view the data set was imperative as it relates to several of the thirteen requirements
found in Section 5 of the report, such as:
- Users are able to easily browse the complete Data set (i.e. social media posts) (Req. Number 1)
- Users are able to sort Data by time (Req. Number 2)
- Users are able to sort Data by emotion (Req. Number 3)
- Users are able to sort by Company (Req. Number 5)
- Users are able to filter data by specific words (Req. Number 6)
- Users are able to edit and remove Data (Req. Number 7)

While the majority of these requirements were “Must haves” on the MoSCoW table, a few of them were
“Should haves”, and therefore are exceeding the customer expectation by being included in the project.

7.4 Stage 4: Creating Data Visualisations

The software decided upon for the project was Tableau, due to its vast capabilities which enabled the creation
of six different visualisations of the data to present to the client. Having a wide variety of visualisations that
can be easily compacted into one or two dashboards for readability means the Client can make an informed
decision regarding how much to invest in their Social CRM. The simplicity of Tableau also ensures that once
the project is handed over to the Client, they can view, edit, and add further data to the data set, with the correct
training.

Though Tableau have developed several iterations of their software for different levels of Data Analysis, the
standard product and therefore the one used for this project was “Tableau Desktop 2019.1”. After opening
Tableau, it instantly asks what data source to ‘Connect’ to import the data. This makes it extremely easy, as
“Microsoft Excel” is at the top of the list of options and is simplified to the name of the software, as opposed

29
to the name of the file name extension, ‘.xlsx’. After this, it is straightforward to choose the columns of data
to be used, and Tableau responds by outputting the columns into what it believes are “dimensions” and
“measures” in order to accurately create the correct type of visualisation.

Figure 15. Tableau splitting the Columns into ‘Dimensions’ and ‘Measures’

The first visualisation demonstrates a ‘Word Cloud’ of the most used language in the Tweets collected. This
was difficult to construct at first, as Tableau does not have the capability to split sentences into individual
words. Therefore, the Tweets had to be split using the ‘Text to Columns’ feature within Excel that takes a data
sample and splits it into individual words, depositing each one into a different cell. While useful, it did make
the document look untidy, and therefore was completed at the end of the process so it could be put out of sight,
to the right of the tables shown in Figure 14.

After this operation had been completed, the twenty-seven columns the Tweets were split into had to be merged
together to enable the creation of the Word Cloud. This is where the Tableau feature ‘Pivot’ is used. This
creates two values out of any number selected by the User: a combination of headers columns, and a
combination of values columns.

Subsequently, the ‘Filter’ function had to be used to sift out the generic words that lack any semantic value.
The benefit of this feature is to bring to light the descriptive data that will show the intended context of the
customer, the creator of the Tweet.

On this occasion, once this process was complete, a total of 236 values had been negated from the sample
such as ‘also’, ‘does’, ‘hi’, ‘I’, etc. The sample was then left with words that explained why the Tweets
existed, for example ‘delivery’ demonstrates that a lot of the sample had a question or statement regarding
delivery, ‘returns’, etc. as well as various words to demonstrate emotions, such as ‘like’, ‘good’, ‘loving’,
‘cute’, etc.

It was important to include a visual representation of sample data as well as the numerics behind the analysis,
as it enables the Client to view the emotions conveyed on a larger spectrum than positivity/negativity. Hovering
over the words in the visualisation discloses how many times that word was stated within the sample, which,
if filtered by the Top 50 used is between 7 – 45 times per word.

30
The second visualisation created demonstrates the average semantic rating of each company in order from
highest to lowest. This was created through a table made within the Excel Spreadsheet that enabled a value
named “Semantic Rating” (per company) to be placed on one axis, and the corresponding “Company Name”
values to be placed on the other. It was important to create this as it gives the Client a quick glance at which
companies have the most positive or negative reactions from their customers. They can then focus on these
companies’ data when looking at the other visualisations or can choose to still look at the data as a whole.

This was one of the easier graphs to create as the information was simple to carry over from the Spreadsheet.
Furthermore, to make it easier for the Client to view this data in a user friendly manner, the semantic values
for each company were placed on top of the graph, which is useful for this scenario due to the average values
for the top three companies (Missguided, H&M, and Zara) only being 0.12 apart.

The third data visualisation utilises a pie chart displaying how many Tweets were sent overall, per hour of the
day. This was one of the harder visualisations to create; Tableau does not allow values that are simply
“HH:MM” to be a valid format, it therefore generates its own date (01/01/1899) to use as a placeholder. Due
to this, when creating the pie chart, it had to be checked that the labels on the graph and keys did not include
the placeholder date, as this would have made the graph confusing and distract from its data.

Similarly to the second visualisation, to make the information easier to read ‘at a glance’ for the Client, labels
were added that explain which hour each segment of the chart correlates to. With this information it is simple
to perceive the most popular hours for Tweets to be sent, and where the Client would like the exact amount of
Tweets per hour they can simply roll the mouse over the chart within Tableau and it will disclose this
information.

An additional pie chart, the fourth visualisation, was created to display the total number of Tweets per
Company collected in the sample. As the project is an investigation into not only the semantic value, but also
the popularity of the companies in question, this diagram easily exhibits the names of each company and the
exact number of Tweets in the sample they hold.

Having this diagram demonstrates that despite a company perhaps having more Tweets sent to them than any
other, the majority of them could show no sentimental value and simply be inquisitive and mundane, as
opposed to opinions that could influence other potential customers. This diagram was relatively easy to create
in Tableau, especially after creating a pie chart in the prior visualisation.

The penultimate visualisation is a breakdown of the number of Tweets per Rating – out of 10 – per company.
Each company has its own bar chart labelled from 1-10 that shows the number of Tweets they received within
each rating, which the Client can hover over to look at the exact number of Tweets, however this was not how
it was initially planned to be presented.

The original idea for this visualisation was to present a stacked bar chart that would demonstrate the total
number of Tweets in the data set with each value out of 10, this would then demonstrate which company
received a noticeable number of Tweets within that value. As the data was collected from Excel using formulae,
the data – similarly to the individual words for the Word Cloud – would have had to have been merged together
with a Pivot, however this was problematic as each data set can only have one Pivot.

This meant that when attempting to Pivot the data together, it merged with the Word Cloud data, and rendered
both sets of values unreadable. While this was frustrating, the data still makes logical sense on its own and is
easily comprehensible as six separate bar charts.

The sixth and last visualisation was the only statistic to not come directly from the data collected, as it acts as
background information on the companies studied to provide the Client with additional information. It presents
a bar chart that displays the number of Tweets that each company has sent on their UK Twitter page, as well

31
as their ‘Help Page’ if they possess one, as of the 10th of April 2019. This is useful to compare how active a
company is on social media with the data supplemented from other visualisations.

After these visualisations were constructed, two Dashboards were generated and linked together to create a
medium to easily view the results.

These Dashboards were split into “Gathering Statistics” which contains:


- Tweets per hour
- Number of Tweets per company
- Company’s overall Tweet count on their Twitter pages

As well as “Analysis Statistics” which holds:


- The World Cloud
- Average semantic rating per Company
- Breakdown of number of Tweets per rating per Company.

As all the information would not fit on one dashboard, it was imperative to ensure that the way the data is
presented to the Client is user friendly. This was achieved through splitting them into the three visualisations
regarding the basis of the statistics, and the three visualisations concerning the analysis while delving deeper
into the data, as this seemed like the most logical approach.

A key aspect of the presentation of these diagrams was to ensure they were easy to infer, due to this each
diagram follows the same colour scheme of twenty different colours – named Tableau 20 in Tableau’s settings
– to keep it uniform and aesthetically acceptable for the Client. This had to be altered slightly for visualisation
three due to there being more than 20 inputs. Where it made sense and did not make the visualisation too
crowded, labels were placed on the chart itself, as this eliminated the need for a ‘Key’ above, to the side, or
below each diagram, as these took up valuable space.

7.5 Time Box

Task Task Task Status


Number
1 Initial Research Completed on time
2 Requirements Gathering Completed on time
3 Code writing Completed on time
4 Run code every two hours from 12pm to 12am for Ran the code at least 5 times per
each company (7 times per company) company, however some Tweets
appeared in more than one sample due
to a lack of Tweets regarding that
company in the 2 hours.

Completed on time
5 Excel Spreadsheet Design and Development Completed on time
6 Data compilation and exportation to Excel Completed on time
7 Formulae created within Excel Completed on time
8 Data Visualisations created All visualisations created according to
plan apart from the stacked bar chart,
which was adapted.

Completed on time.
9 Testing of Function Completed on time
10 Fix any Errors found from Testing period Completed on time
11 Presentation of Product to Client Completed on time
12 Final Report Completed on time
13 Final Product Completed on time

32
7.6 Potential Alternate Approaches

A number of methods were considered during different stages of development of the project. An example of
this was during the preliminary design process, when there was debate surrounding whether the data set should
be presented as a SQL database, or within a spreadsheet such as Excel or Numbers.

The positives of holding the data within a database is that it is more secure and holds a higher level of security
than a spreadsheet. This is because it is more difficult to accidentally delete records, and is encrypted to a
higher standard than a typical spreadsheet. However, it could be argued that a larger number of the general
public can view and edit spreadsheets in comparison to knowing SQL. Due to this, mistakes could be made if
an inexperienced person attempts to edit the data within the database.

After collating the data, a further dilemma surrounded the decision of which visualisation tool to use, as each
of the market leaders has their own strengths and weaknesses. One possible option was Power BI, a Microsoft
owned tool that connects directly to Excel to create graphs and dashboards from the data. Another was SiSense,
which enables data visualisation, but considers itself a ‘big data analytics solution’ to both analyse and visualise
large volumes of data.

After reviewing the positives and negatives of the three options, Tableau was chosen due to its wealth of
features regarding data visualisation – which is all that was needed at that stage of the development process –
as well as being provided with a complimentary license key for Tableau Desktop.

8 Testing
Following the development of the two Dashboards to present the data to the Client, the entire Tableau Project
underwent thorough and comprehensive Testing to ensure it was working as expected. This can be found in
Appendix G. Test 9 demonstrated that a formula created to tally the number of Tweets per hour was not
behaving correctly. Finding this enabled me to fix the formula and force an update to the visualisations to
ensure they held the correct data. All 26 other tests passed first time.

33
9 Evaluation

9.1 Evaluation of Product

To analyse whether the project has been a success I feel it is imperative to look back at the requirements
gathered and see if they have been met in accordance with the MoSCoW prioritisation.

No. Requirement. MoSCoW? Final


Product?
1) Users are able to easily browse the complete Data set (i.e. social media Must have Yes
posts)

2) Users are able to sort Data by time Should have Yes

3) Users are able to sort Data by emotion Must have Yes

4) Users are able to sort Data by gender Could have No

5) Users are able to sort Data by Company Must have Yes

6) Users are able to filter data by specific words Should have Yes

7) Users are able to edit and remove Data Could have Yes/No

8) Users are able to view a variety of pre-made data visualisations for Must have Yes
Data

9) Users are able to create their own data visualisations for Data Should have Yes

10) Users are able to export their Data Visualisations individually Must have Yes

11) Users are able to export their Data Visualisations as a whole Must have Yes

12) Users are able to upload Data sets for analysis Won’t have Possible

13) The Developer is able to upload Data sets for analysis Could have Possible

Out of the thirteen requirements gathered, every requirement that was initially classed “must have” or “should
have” has been successfully implemented, the only definite ‘no’ within the requirements is in regard to sorting
data by gender. This is because Twitter does not enable this option natively, and it would involve a deep level
of machine learning to look at usernames, display names, and tweets to analyse whether the system believes
an account is male or female. Every other requirement could be a ‘yes’ with the correct training on the system.

Despite the vast majority of the requirements being met, I still believe that there is a lot that can be improved
regarding the system I’ve created. Firstly, the initial plan was to analyse data from Facebook and Instagram as
well as Twitter, however due to GDPR, the lengthy process of applying for permission to build an App on
Facebook in 2019 requires more time and budget than I had as a University student. This is due to them
requiring an entire working prototype of the App you are building before they approve it.

34
Furthermore, as Facebook own Instagram and they do not have a platform for third party Apps within the
Instagram App itself, a different type of data scraping tool would have to had been coded and tested, which
again required vastly more time than I had to work on the project.

I also believe that given more time the project could have collated a lot more data than it currently holds, which
would give a more accurate representation of the public’s opinions on the companies. This would also have
allowed me to analyse more semantics and further improve the particular area of the code that detects the
sentiment of said data.

However, I do feel that this project contains a well-rounded representation of the online public’s opinions
regarding particular companies. The code concerning the semantic analysis also gives an accurate reading of
a consumer’s thoughts, and the data visualisations created through Tableau shows a variety of interesting ways
to look at the data collected. This enables a user-friendly analysis for a deeper meaning once compared to the
overall success of a company.

If this project were to have future iterations, it would benefit from being adapted into a database for security
and speed of data collection and recall, as this would form more structure for the project. The number of
companies monitored could be increased, and I also believe it would be a good idea to have ‘stream listeners’
implemented that would scrape each Tweet as it is posted and load it into the database as opposed to having a
person manually run the code to every one to two hours.

In future versions of the project I would also ensure there are weeks, and months’ worth of data, as this then
unlocks the possibility to analyse the average sentiment of Tweets month by month as well as the volume of
Tweets collected. This could then be visualised through Tableau as a comparison of each individual company’s
data, as well as an evaluation of the market as a whole.

9.2 Self-Evaluation

Considering my skill level as a University Student with simplistic coding knowledge, I believe I created this
project to the best of my abilities.

When beginning to research this project I knew I was going to have to learn a coding language that I had
minimal, if any, experience in, as the four or five coding languages I know well I have been learning since
Secondary School. Because of this, when I decided on Python for my coding language I knew the project was
going to be a struggle to create, but I was eager for the challenge as I found my project topic fascinating, and
there was no way to create the project I wanted without giving learning Python a go.

A downside of this, however, was that I was perhaps putting too much pressure on myself to learn an entirely
new coding language in my last year of University. Several times I became stuck on a relatively simple piece
of code that I believe I would have known the solution to had my education in the language matched that of
Java, HTML, or CSS. Due to this, I feel as though I wasted several days changing small parts of my code to
investigate where I had gone wrong, which generally ended up being a problem with my syntax, and not with
any of the written code itself.

Given the chance to create the whole project again, I believe a lot of the improvements I would make would
be time based. I would begin learning Python and completing tutorials earlier in the development cycle so that
I could spend less time resolving my code. I could then spend more time developing the project and increasing
the scope: be it the size of the data set, the number of companies analysed, or the reliability of the semantic
analysis.

If given the possibility I would also spend more time developing my skill set with Tableau, as it is such a
powerful and capable system, and I feel as though I have only grazed the surface of its capabilities.

Despite this, I began this project wanting to analyse how positive or negative people’s opinions on social media
were and display the results, and have developed a program that does exactly that. I have succeeded in fulfilling

35
every one of the requirements I initially set out to complete within Section 5 of the report, and I have learnt a
new coding language that will aid me greatly post-university in the Data Analysis sector.

It is for these reasons that I believe I created a good, working model, which fulfils the aims and objectives I
initially set out to achieve, and I am happy with my work.

36
10 Conclusion
This report initially set out to investigate whether there a is a connection between how active a company is on
social media, their reputation online with their customers, and their profits, through semantic analysis and data
visualisation of the results.

10.1 Findings regarding Reputation

The data visualisations created highlight the potential connection between how proactive a company is online,
and the average reputation they have with their customers or potential customers.

Figure 16. Total Number of Tweets sent from Company’s (UK if possible) Twitter Page, and Help Page (if applicable)

As is shown, the number of Tweets sent by each company varies drastically between the six companies chosen
for the study. In order from most to least:
1) ASOS
2) Boohoo
3) Missguided
4) H&M
5) Zara
6) Nasty Gal

37
Figure 17. Average Semantic Rating per Company

This develops when compared to the average semantic rating of each company from the Tweets collected, in
which:
1) Missguided
2) H&M
3) Zara
4) Boohoo
5) Nasty Gal
6) ASOS

The first interesting comparison, is the company that has sent the most Tweets by almost 800,000, i.e. ASOS,
has the lowest semantic value rating, while H&M and Zara, who have sent some of the lowest number of
Tweets from their Corporate Twitter accounts, and have less of an online presence, have a higher semantic
rating. At first glance, it appears as though there is little correlation between the two graphs shown, however,
when the data is reviewed without the inclusion of one of the companies, the graphs have a clear correlation
between them.

It could be argued that customer interaction (i.e. Social CRM) on Twitter inspires Customers to consciously
think about the brand, and therefore they are more likely to want to shop there. Several brands, usually ones
targeted towards the younger generations, will upload funny pictures, quotes, quizzes, or other posts that
encourage communication between themselves and their customers.

One of these brands, Missguided, post pictures or videos that they believe their user base find ‘relatable’ every
day. Customers then react to these posts and treat them the same way they would it if a non-corporate account
had sent it, and reply to the Tweet with their reaction, or tagging a friend in it with a comment. Posting this
type of content then floods the Company’s feed with Customers replying, as well as the “@Missguided” tag.

Figure 18. Missguided Tweet

38
Figure 19. Excerpt from Data Set
This excerpt of a fraction of the data demonstrates how Missguided flood their Twitter stream with a large
amount of people simply having a conversation with the company, as opposed to asking questions or sharing
views on it. Each of these Tweets has a rating of 7 or above due to the positive language used in the Tweet,
despite the Tweet realistically having nothing to do with the company itself.

Due to this, if Missguided is negated from the results and the data visualisations, a much clearer pattern is
revealed.

Figure 20. Missguided Negated

Figure 21. Missguided Negated

39
If Missguided is removed, looking at the semantic ratings versus the activity of the company it is shown that,
amusingly, the more active a company is on social media, the worse a reputation they have online from their
customers. The companies ASOS, Boohoo, and Nasty Gal – all of which are based solely online, have the
three worst semantic average ratings, whereas H&M and Zara, who have both a steady online and offline
presence, despite less Tweets, have a higher overall rating.

It could be argued this is because the more active a company is on social media the more comments they tend
to inspire, as well as the idea that people are less likely to give positive feedback about an interaction or product
if it is good, but they are more likely to give negative feedback if it is bad (Thomas, 2018). This could give an
insight into the reason why the online companies have the worst ratings, and therefore it could be said that the
more of an impact a company makes on social media, the worse a reputation they could have online.

10.2 Findings regarding Profitability

From research found regarding each company’s profitability, the majority of which discusses the recently
ended tax year, a number of statistics were discovered and analysed in comparison to the data found in Section
10.1:
- The company with the highest semantic rating, H&M, announced that profits had fallen for the seventh
consecutive quarter from December 2018 to February 2019, down 1.5% from a year prior (Irish Times,
2019).
- 2nd highest semantic rating, Zara’s profits posted in January 2018 rose 30% to £40.86 million from the
year prior (Companies House, 2018).
- 3rd, Boohoo’s revenues rose by 48%, and profits rose by 38% to £59.9 million (Kinder, 2019).
- 4th, Nasty Gal, owned by Boohoo, sales increased by 96% to £47.9 million (Kinder, 2019), with
revenues absorbed into Boohoo.
- 5th, and the lowest semantic rating, ASOS reported within the 6 month period of 28th of August 2018
to the 28th of February 2019, profits have fallen 87% in comparison with the year prior (BBC News,
2019)
- Missguided suffered a pre-tax loss of £46 million from April 2017 to April 2018, in comparison to a
£1.6 million loss the year prior, which they have blamed on “premature investment” (BBC News,
2019).

Once again there is little correlation between the figures shown. Despite this it could be said that there is a
slight association between how active a company is on social media, and their related incomes. Boohoo, the
second most active corporate account, has had an increase of almost 50%, ASOS, while down 87% in pre-tax
profits, gained an increase of 14% in their sales year on year, and Missguided’s group turnover for the year
increased 4.9% to £215.91 million (Nazir, 2019).

There is something to be said regarding the profitability of the online markets in the 21st Century, which can
be analysed by analysing through their social media responses using the systems suggested in this report.
Whilst it does seem that, the more active a company is on social media, the more likely they are to receive
negative comments: this shouldn’t necessarily be seen to be a negative thing, being more active and
communicating more with customers does appear to increase sales.

40
11 Bibliography

"bliti". (2019, 4). Tweepy. Retrieved from Tweepy: https://www.tweepy.org/


Adams-Mott, A. (2018, 6). Advantages and Disadvantages of Social Media Marketing. Retrieved 1 9, 2019,
from Small Business Chronicles: http://smallbusiness.chron.com/advantages-disadvantages-
social-media-marketing-21890.html
Šilingas, D., & Butleris, R. (2015). Towards implementing a framework for modeling software
requirements in MagicDraw UML. Information Technology and Control, 38(2).
Ajmera, J., Ahn, H.-i., Nagarajan, M., Verma, A., & Contractor, D. (2013). A CRM System for Social
Media. WWW '13 Proceedings of the 22nd international conference on World Wide Web, 49-58.
Arnold, A. (2017, 10). Convenience Vs. Experience: Millennials Love Streaming But Aren't Ready To Dump
Cinema Just Yet. Retrieved from Forbes:
https://www.forbes.com/sites/andrewarnold/2017/10/26/millennials-love-streaming-but-
arent-ready-to-dump-cinema-just-yet/#2d11e7b56311
BBC News. (2019, 4). ASOS profits plunge 87% after difficult year. Retrieved from BBC News:
https://www.bbc.co.uk/news/business-47877688
BBC News. (2019, 1). Missguided fashion chain sees losses widen as costs rise. Retrieved from BBC News:
https://www.bbc.co.uk/news/business-46783114
Brandwatch. (2019, 3 29). The Most Popular Emojis. Retrieved from Brandwatch:
https://www.brandwatch.com/blog/the-most-popular-emojis/
Bullock, L. (2018, 11 27). The Biggest Social Media Fails of 2018. Retrieved from Forbes:
https://www.forbes.com/sites/lilachbullock/2018/11/27/biggest-social-media-fails-
2018/#da9943518f8c
Cakebread, C. (2017, 11 15). You're not alone, no one reads terms of service agreements. Retrieved from Business
Insider: https://www.businessinsider.com/deloitte-study-91-percent-agree-terms-of-service-
without-reading-2017-11?r=US&IR=T
Castronovo, C., & Huang, L. (2012). Social Media in an Alternative Marketing Communication Model.
Journal of Marketing Development and Competitiveness, 117-131.
Clarke, J. (2017, 11). UK shoppers set to spend more this Christmas compared to last year. Retrieved from
Independent: https://www.independent.co.uk/news/business/news/uk-shopping-christmas-
forecast-british-consumers-brexit-a8031606.html
Coleman, A. (2003). Oxford Dictionary of Psychology. New York: Oxford University Press, p.77.
Companies House. (2018, 1). Annual Report and Financial Statements for the year ended 31 January 201.
Retrieved from Companies House: https://s3.eu-west-2.amazonaws.com/document-api-
images-live.ch.gov.uk/docs/sdOxhBqVYBtgmcjTTqR2-
Tm72MSHyljBSTk8S7U_Z9g/application-pdf ?X-Amz-Algorithm=AWS4-HMAC-
SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-
Credential=ASIAWRGBDBV3LGD5ORE6%2F20190425%
CRM Switch. (2013). A Brief History of Customer Relationship Management. Retrieved from CRM Switch:
https://www.crmswitch.com/crm-industry/crm-industry-history/
Day, T. (2013). Success in Academic Writing. Palgrave.
Eadicicco, L. (2015, 12). Americans Check Their Phones 8 Billion Times a Day. Retrieved from Time:
http://time.com/4147614/smartphone-usage-us-2015/
Edosomwan, S., Kalangot Prakasan, S., Kouame, D., Watson, J., & Seymour, T. (2011). The History of
Social Media and its Impact on Business. The Journal of Applied Management and Entrepreneurship,
79-91.
Edwards, J. (2014, 08 28). Twitter Now Shows You Exactly How Many People See Your Tweets — And It's
Mesmerizing. Retrieved from Business Insider: https://www.businessinsider.com/twitter-
analytics-dashboard-launched-2014-8?r=US&IR=T
emojitracker. (2019, 4 3). Emoji Tracker. Retrieved from Emoji Tracker: http://emojitracker.com
Fingas, J. (2018, 09 09). Instagram's emoji shortcuts help you comment in record time. Retrieved from Engadget:
https://www.engadget.com/2018/09/09/instagram-emoji-
shortcuts/?guccounter=1&guce_referrer_us=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce
41
_referrer_cs=LoLm0cHSgd1SMKMPFZcxtQ
Fortin, J. (2018, 1 13). H&M Closes Stores in South Africa Amid Protests Over 'Monkey' Shirt. Retrieved from
The New York Times: https://www.nytimes.com/2018/01/13/world/africa/hm-south-africa-
protest.html
Gates, B. (1999). Business at the Speed of Thought: Using a Digital Nervous System. Penguin.
Google Trends. (2008). "John Lewis christmas" trend. Retrieved from Google:
https://trends.google.com/trends/explore?geo=GB&q=John%20Lewis%20christmas
Gov.uk. (2018). Data Protection. Retrieved from GOV.UK: https://www.gov.uk/data-protection
Huddy, G. (2017, 10 19). How Text Analytics Works for Social Media. Retrieved from Crimson Hexagon:
https://www.crimsonhexagon.com/blog/how-text-analytics-works-for-social-media/
Ilyashov, A. (2015). Here’s How Luxury Brands Are Doing Social Media Very Wrong (& The Few Who Break
The Mold). Retrieved from https://www.refinery29.com/en-us/2015/10/95018/luxury-fashion-
brands-social-media
Irish Times. (2019, 3). Retrieved from https://www.irishtimes.com/business/retail-and-services/h-m-
profit-falls-less-than-expected-after-it-curbs-discounts-1.3842804
Joseph, S. (2017). How Nike is using digital channels to drive sales. Retrieved from Digiday UK:
https://digiday.com/marketing/nike-using-digital-channels-drive-sales/
Kinder, T. (2019, 4 25). Instagram helps Boohoo to snap up a sales increase. Retrieved from The Times:
https://www.thetimes.co.uk/article/instagram-set-helps-boohoo-to-snap-up-a-sales-increase-
5ffdssrbh
Lim, K. H. (2018, 04). Positive and Negative Emojis used for the Sentiment Analysis. Retrieved from
ResearchGate: https://www.researchgate.net/figure/Positive-and-negative-emojis-used-for-the-
sentiment-analysis_fig3_324639092
Marketing Week. (2018, 9 13). How One Facebook Campaign Changed M&M's Approach to Mobile Ads.
Retrieved from Marketing Week: https://www.marketingweek.com/2018/09/13/facebook-
mms-mobile-ads/
McIntyre, H. (2016, 7). Millennials Aren't Very Interested In Traditional Radio Any More. Retrieved from
Forbes: https://www.forbes.com/sites/hughmcintyre/2016/07/12/millennials-arent-very-
interested-in-traditional-radio-any-more/#b7e7bf37c4e4
Nazir, S. (2019, 1). Missguided reports £46.7m loss. Retrieved from Retail Gazette:
https://www.retailgazette.co.uk/blog/2019/01/missguided-reports-46-7m-loss/
Novak, P. K., Smailovic, J., Sluban, B., & Mozetic, I. (2015). Sentiment of Emojis. Plos One.
O'flynn, R. (2017, 7 17). The Way The Social Cookie Crumbles: The Genius Of Oreo’s Social Media Marketing.
Retrieved from 201digital: https://www.201digital.co.uk/way-social-cookie-crumbles-genius-
oreos-social-media-marketing-can-learn/
Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. LREc,
1320-1326.
Portocarrero, J. M., Delicato, F. C., Pires, P. F., Gámez, N., Fuentes, L., Ludovino, D., & Ferreira, P.
(2014). Autonomic Wireless Sensor Networks: A Systematic Literature Review. Journal of Sensors.
Red Hot Penny. (2018, August 16). The Social Scorecard - Fashion & Accessories 2018. Retrieved November
8, 2018, from https://www.redhotpenny.com/social-scorecard-fashion-accessories-
2018/#wpcf7-f4557-p4549-o1
ritvikmath. (2018, 6 18). Scraping Data. Retrieved from github:
https://github.com/ritvikmath/ScrapingData/blob/master/Scraping%20Twitter%20Data.ipyn
b
Roberts, M. (2018, 6 12). The Best Chain Restaurant Twitter Reactions to IHOP Changing Its Name to 'IHOb'.
Retrieved from People: https://people.com/food/ihop-name-change-ihob-twitter-reactions/
Roth, Y. (2018, 7). New developer requirements to protect our platform. Retrieved from Twitter:
https://blog.twitter.com/developer/en_us/topics/tools/2018/new-developer-requirements-to-
protect-our-platform.html
Rouse, M. (2017, 9). Definition: social CRM. Retrieved from WhatIs.com:
https://searchcrm.techtarget.com/definition/social-CRM
Rudestam, K. E., & Newton, R. R. (2007). Surviving Your Dissertation: A Comprehensive Guide to Content and

42
Process (3rd ed.). SAGE.
Thiagarajan, S. (2018, 11). Facebook, Instagram or Twitter? Social media strategy that you should follow. Retrieved
from Economic Times: https://economictimes.indiatimes.com/small-biz/marketing-
branding/marketing/facebook-instagram-or-twitter-social-media-strategy-that-you-should-
follow/articleshow/66471776.cms
Thomas, A. (2018, 2). The Secret Ratio That Proves Why Customer Reviews Are So Important. Retrieved from
Inc.: https://www.inc.com/andrew-thomas/the-hidden-ratio-that-could-make-or-break-your-
company.html
Tobin, B. (2018, 7 10). IHOP changes name back from IHOB. Retrieved from USA Today:
https://eu.usatoday.com/story/money/2018/07/09/ihop-changes-name-back-
ihob/769310002/
Y, E. (2019, 01 30). 5 Things You Need to Know Before Scraping Data From Facebook. Retrieved from
Octoparse: https://www.octoparse.com/blog/5-things-you-need-to-know-before-scraping-
data-from-facebook
YPulse. (2018, 6). THE 10 BRANDS GEN Z & MILLENNIALS TRUST MOST. Retrieved from
YPULSE: https://www.ypulse.com/post/view/the-10-brands-gen-z-millennials-trust-most

12 Appendices

12.1 Appendix A: Figures

Figure 1.
YPulse. (2018, 6). THE 10 BRANDS GEN Z & MILLENNIALS TRUST MOST. Retrieved from
YPULSE: https://www.ypulse.com/post/view/the-10-brands-gen-z-millennials-trust-most

Figure 2.
Google Trends (2019) GOOGLE TRENDS SOCIAL CRM. Retrieved from Google Trends:
https://trends.google.com/trends/explore?date=all&q=Social%20CRM

Figure 3&4.
Twitter (2019) Marks & Spencer Twitter Account. Retrieved from Twitter:
https://twitter.com/marksandspencer/with_replies
Figure 5.
Smith, M (2018) How good is “good”? Retrieved from YouGov:
https://yougov.co.uk/topics/lifestyle/articles-reports/2018/10/02/how-good-good

Figure 6.
Twitter Analytics (2018) How to use Twitter analytics. Retrieved from Business Twitter:
https://business.twitter.com/en/analytics.html

Figure 7.
Hootsuite (2018) Sentiment Analysis Tools for Social Media Marketers. Retrieved from Blog.Hootsuite:
https://blog.hootsuite.com/social-media-sentiment-analysis-tools/

Figure 8.
Lexalytics (2018) Semantria for Excel. Retrieved from Lexalytics:
https://www.lexalytics.com/semantria/excel

43
Figure 9.
Draw.io (2019) Diagram created by Developer. Developed and downloaded from Draw.io:
draw.io

Figure 10.
Twitter Developer (2019) App Details. Retrieved from Twitter:
https://developer.twitter.com/en/apps/16124184

Figure 11.
Twitter Developer (2019) Keys and tokens. Retrieved from Twitter:
https://developer.twitter.com/en/apps/16124184

Figure 12.
“Data -0 ratings.xlsx”

Figure 13.
“Data -0 ratings.xlsx”

Figure 14.
“Data -0 ratings.xlsx”

Figure 15.
Tableau, “Data Visualisations.twb”

Figure 16.
Tableau, “Data Visualisations.twb”

Figure 17.
Tableau, “Data Visualisations.twb”

Figure 18.
Missguided Twitter Account (April, 2019) Twitter. Retrieved from Twitter:
https://twitter.com/Missguided/status/1120779470191955970

Figure 19.
“Data -0 ratings.xlsx”

Figure 20.
Tableau, “Data Visualisations.twb”

Figure 21.
Tableau, “Data Visualisations.twb”

12.2 Appendix B: Contextual Report

‘Aims and objectives

44
To investigate this, the project will collect data from three of the most popular Social Medias (Twitter,
Facebook, and Instagram) for several retail companies, and index the results based on how positive/negative
the words/phrasing is. This information will then be displayed through a data visualisation software and
compared with the annual profits for each of the businesses to determine if a link exists between how active a
company is on Social Media, and the amount of profits they make per annum.

[n] = Days

1.1 Research Report


1.1.1 Write a contextual report regarding the project that will describe an outline if what the project
is hoping to create, and the timeline of how it is to be created, to allow for a better understanding.
1.1.2 Write Introduction [2.0]
1.1.3 Write Literature Review [6.0]
1.1.4 Write Product Research [10.0]
1.1.5 Write Plan for Term 2 [6.0]

1.2 Design Documentation


1.2.1 Write a design documentation to provide information surrounding the designed project and the
process used.
1.2.2 Write Requirement Specification [4.0]
1.2.3 Describe the current solution, as well as the proposed solution [2.0]
1.2.4 Create data design [2.0]
1.2.5 Create architecture design [3.0]
1.2.6 Create interface design [5.0]
1.2.7 Create procedural design [3.0]

1.3 Implementation
1.3.1 Collect and store data from social media
1.3.1.1 The data from various social medias will be scraped from the sites using Python
scripts. This will enable them to be analysed later.
1.3.1.2 Scrape data from Twitter. [10.0]
1.3.1.3 Scrape data from Instagram. [10.0]
1.3.1.3 Scrape data from Facebook. [10.0]
1.3.2 Collect and store indexing references.
1.3.2.1 Store various words and phrases within a document and index them based off how
positive or negative the words appear through a combination of research (80%) and personal assumptions
based off the developer’s age relevance (20%).
1.3.2.2 Research and gather words and phrases for a positivity ratio. [5.0]
1.3.2.3 Input them into a document and establish the ratios for each data input. [5.0]
1.3.3 Create a visual representation of the data
1.3.3.1 Using a dashboard system such as Tableau, create a physical visualisation of the
results of the research to show to the client.
1.3.3.2 Import data into the dashboard system. [1.0]
1.3.3.3 Display the data within the system using various charts and graphs to make it easy for
the Customer to view. [10.0]

1.4 Evaluation Report


1.4.1 Write an evaluation report of the project to assess the positive and negative outcomes, as well
as establish whether the initial investigation was a success.
1.4.2 Write executive summary. [1.0]
1.4.3 Write introduction to the project. [2.0]
1.4.4 Write the purpose and objectives of the evaluation. [5.0]
1.4.5 Write an evaluation of the methodology. [4.0]
1.4.6 Evaluate the findings from the project [10.0]
1.4.7 Evaluate the areas of improvement [2.0]
1.4.8 Write conclusion and recommendation for the business [3.0]’

45
12.3 Appendix C: Python Jupyter Notebook Code

import tweepy
access_token = '312899257-
ZjXYlHt8PQSXuUrNc2HunsANmP7eUi9Htt7crV1T'
access_token_secret =
'Otko0azkxW46bAki1ph1OPOWX4NhII7yi3ng41e5HGNdR'
consumer_key = 'iXsoJx26WfBePQscEMPMCaabg'
consumer_secret =
'fK0TWiME7CSRSV8CaoOIyUXwQ3VlRlP4NsAHcx2OCU86HyWm6B'

# Creating the authentication object


auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
# Setting your access token and secret
auth.set_access_token(access_token, access_token_secret)
# Creating the API object while passing in auth information
api = tweepy.API(auth)

# Using the API object to get tweets from your timeline, and
storing it in a variable called public_tweets
public_tweets = api.home_timeline()
# foreach through all tweets pulled
for tweet in public_tweets:
# printing the text stored inside the tweet object
print(tweet.user.screen_name)
print(tweet.text)
print(tweet.user.location)
print()

# Creating the API object while passing in auth information


api = tweepy.API(auth)

# The search term you want to find


query = "University of Greenwich"
# Language code (follows ISO 639-1 standards)
language = "en"

# Calling the user_timeline function with our parameters


results = api.search(q=query, lang=language, count=100)

# foreach through all tweets pulled


for tweet in results:
if (not tweet.retweeted) and ('RT @' not in tweet.text):
# printing the text stored inside the tweet object
print(tweet.user.screen_name,"Tweeted:",tweet.text)
print()

import csv

# Creating the API object while passing in auth information


api = tweepy.API(auth)

# Open/Create a file to append data


csvFile = open('asos3.csv', 'a')

46
#Use csv Writer
csvWriter = csv.writer(csvFile)

# Calling the user_timeline function with our parameters


results = api.search(q=query, lang=language, count=500)

for tweet in results:


if (not tweet.retweeted) and ('RT @' not in tweet.text):
#Write a row to the csv file/ I use encode utf-8
csvWriter.writerow([tweet.created_at,
tweet.user.screen_name, tweet.text])

csvFile.close()

12.4 Appendix D: Semantic Analysis Code

import csv

# Creating the API object while passing in auth information


api = tweepy.API(auth)

# Open/Create a file to append data


csvFile = open('zara 10th april 23 00.csv', 'a')

#Use csv Writer


csvWriter = csv.writer(csvFile)

# Calling the user_timeline function with our parameters


results = api.search(q=query, lang=language, count=100)

rating = 0
counter = 0
finalRating = 0

for tweet in results:


if (not tweet.retweeted) and ('RT @' not in tweet.text):
#Write a row to the csv file
if 'abysmal' in tweet.text or 'appalling' in
tweet.text or 'dreadful' in tweet.text or 'awful' in tweet.text or
'terrible' in tweet.text or 'very bad' in tweet.text or 'really
bad' in tweet.text or '😡' in tweet.text or '😠' in tweet.text or
'😷' in tweet.text or 'worst' in tweet.text or 'outraged' in
tweet.text or 'disgusted' in tweet.text or 'hate' in tweet.text:
rating = rating + 1
counter += 1
if 'rubbish' in tweet.text or 'unsatisfactory' in
tweet.text or 'bad' in tweet.text or 'poor' in tweet.text or '🙁'
in tweet.text or '😞' in tweet.text or ':(' in tweet.text or '):'
in tweet.text or '💀' in tweet.text or 'annoy' in tweet.text or

47
'piss' in tweet.text or 'wrong' in tweet.text or 'ridiculous' in
tweet.text or 'sucks' in tweet.text or 'waiting' in tweet.text:
rating = rating + 2
counter += 1
if 'quite bad' in tweet.text or 'pretty bad' in
tweet.text or 'somewhat bad' in tweet.text or 'below average' in
tweet.text or '💔' in tweet.text or '😣' in tweet.text or '☹' in
tweet.text or '😒' in tweet.text or '😢' in tweet.text or 'delay'
in tweet.text or 'delayed' in tweet.text or 'laughable' in
tweet.text:
rating = rating + 3
counter += 1
if 'mediocre' in tweet.text or '🙃' in tweet.text
or '👎' in tweet.text or '🙄' in tweet.text or '🤔' in tweet.text
or '😪' in tweet.text or 'not shocked' in tweet.text or '🥺' in
tweet.text:
rating = rating + 4
counter += 1
if 'average' in tweet.text or 'not bad' in
tweet.text or 'fair' in tweet.text or 'alright' in tweet.text or
'ok' in tweet.text or 'okay' in tweet.text or 'satisfactory' in
tweet.text or 'fine' in tweet.text or 'somewhat good' in
tweet.text or '😳' in tweet.text or '😭' in tweet.text or '😩' in
tweet.text or '😫' in tweet.text or '👀' in tweet.text or '😱' in
tweet.text or '😬' in tweet.text or 'omg' in tweet.text or 'but' in
tweet.text or 'refund' in tweet.text:
rating = rating + 5
counter += 1
if 'quite good' in tweet.text or 'decent' in
tweet.text or 'above average' in tweet.text or 'pretty good' in
tweet.text or 'good' in tweet.text or '🙂' in tweet.text or '💪' in
tweet.text or '😅' in tweet.text or '😎' in tweet.text or '😈' in
tweet.text or 'like' in tweet.text:
rating = rating + 6
counter += 1
if 'great' in tweet.text or 'gr8' in tweet.text or
'really good' in tweet.text or 'rlly good' in tweet.text or 'very
good' in tweet.text or 'v good' in tweet.text or '💖' in tweet.text
or '☺' in tweet.text or '😘' in tweet.text or '😌' in tweet.text
or '👍' in tweet.text or '👏' in tweet.text or '🙌' in tweet.text
or ':)' in tweet.text or '(:' in tweet.text or '💥' in tweet.text
or '💙' in tweet.text or '🤣' in tweet.text or '🖤' in tweet.text
or '👌' in tweet.text or '😜' in tweet.text:
rating = rating + 7
counter += 1
if 'awesome' in tweet.text or 'fantastic' in
tweet.text or '😂' in tweet.text or '💕' in tweet.text or '😍' in
tweet.text or '😊' in tweet.text or '❤' in tweet.text or '♥' in
tweet.text or '💜' in tweet.text or '💛' in tweet.text or '✅' in
tweet.text or '🎉' in tweet.text or '🤗' in tweet.text or '🙏' in

48
tweet.text or '✨' in tweet.text or 'on point' in tweet.text or
'come through' in tweet.text or 'come thru' in tweet.text:
rating = rating + 8
counter += 1
if 'superb' in tweet.text or 'brilliant' in
tweet.text or 'incredible' in tweet.text or 'excellent' in
tweet.text or 'outstanding' in tweet.text or '😁' in tweet.text or
'😄' in tweet.text or '🥰' in tweet.text or '💯' in tweet.text or
'love' in tweet.text:
rating = rating + 9
counter += 1
if 'perfect' in tweet.text:
rating = rating + 10
counter += 1

if counter == 0:
rating = 0
else:
finalRating = rating/counter

csvWriter.writerow([query, tweet.created_at,
tweet.user.screen_name, tweet.text, finalRating])

rating = 0
counter = 0
finalRating = 0

csvFile.close()

12.5 Appendix E: Excel Spreadsheet Formulae

49
50
12.6 Appendix F: Tableau Data Visualisations

51
52
53
54
12.7 Appendix G: Testing

Test Visualisation Test Undertaken Expected Result Actual Result Pass/


Number Fail?

1 Word Cloud Ensure data is The word “cute” The word “cute” Pass
displayed should appear 8 times appears 8 times, 10 if
accurately if entered in the Excel you include duplicate
Spreadsheet Tweets, which
Tableau negates when
it imports the data
2 Word Cloud Clicking a word Word should appear The expected result Pass
highlighted with
number of times it
appears, the rest of the
words are greyed out
3 Word Cloud Selecting multiple Words should all The expected result Pass
words appear highlighted
with number of times
they appear, the rest of
the words are greyed
out

55
4 Word Cloud Ensuring hover Hovering mouse over The expected result Pass
over feature word shows the
works number of times it is
mentioned in the data
set
5 Average Ensure data is Check if results The expected result Pass
Semantic displayed correlate with the
Rating correctly “Averages Table” in
the Spreadsheet
6 Average Clicking a bar Bar should appear The expected result Pass
Semantic highlighted while the
Rating rest of the graph is
greyed out
7 Average Selecting multiple Bars should appear The expected result Pass
Semantic bars highlighted while the
Rating rest of the graph is
greyed out
8 Average Ensuring hover Hovering mouse over a The expected result Pass
Semantic over feature bar shows the average
Rating works semantic rating for that
company
9 Tweets per Ensure data is Check 11am to see if Spreadsheet sort and Fail
hour Pie Chart displayed Spreadsheet also gives filter gives 39
correctly a result of 38
10 Tweets per Ensure data is Check 11am to see if The expected result Pass
hour Pie Chart displayed Spreadsheet also gives
correctly (v2) a result of 39
11 Tweets per Clicking a Segment should appear The expected result Pass
hour Pie Chart segment highlighted while the
rest of the pie chart is
greyed out
12 Tweets per Selecting multiple Segments should The expected result Pass
hour Pie Chart segments appear highlighted
while the rest of the
pie chart is greyed out
13 Tweets per Ensuring hover Hovering mouse over a The expected result Pass
hour Pie Chart over feature segment shows the
works number of Tweets
corresponding to that
hour
14 Semantic Ensure data is Check if “ASOS” The expected result Pass
Tweets per displayed under filter in Excel
Company correctly Spreadsheet equals 128
15 Semantic Clicking a Segment should appear The expected result Pass
Tweets per segment highlighted while the
Company rest of the pie chart is
greyed out
16 Semantic Selecting multiple Segments should The expected result Pass
Tweets per segments appear highlighted
Company while the rest of the
pie chart is greyed out
17 Semantic Ensuring hover Hovering mouse over a The expected result Pass
Tweets per over feature segment shows the
Company works number of Tweets for
that company in the
data set

56
18 Tweets per Ensure data is Check if there are 5 The expected result Pass
rating per displayed tweets for Missguided
company correctly with a rating between
3-3.9
19 Tweets per Clicking a bar Bar should appear The expected result Pass
rating per highlighted while the
company rest of the graph is
greyed out
20 Tweets per Selecting multiple Bars should appear The expected result Pass
rating per bars highlighted while the
company rest of the graph is
greyed out
21 Tweets per Ensuring hover Hovering mouse over a The expected result Pass
rating per over feature bar shows the number
company works of Tweets for that
rating and company
22 Total number Clicking a bar Bar should appear The expected result Pass
of tweets on highlighted while the
company rest of the graph is
Twitter page greyed out
23 Total number Selecting multiple Bars should appear The expected result Pass
of tweets on bars highlighted while the
company rest of the graph is
Twitter page greyed out
24 Total number Ensuring hover Hovering mouse over a The expected result Pass
of tweets on over feature bar shows the number
company works of Tweets for that
Twitter page company’s Twitter
pages
25 Gathering Ensure button Entering Presentation The expected result Pass
Statistics linking to mode and clicking the
Dashboard Analysis of button should transfer
Statistics works in the User to the
Presentation Analysis of Statistics
Mode Dashboard
26 Analysis of Ensure button Entering Presentation The expected result Pass
Statistics linking to mode and clicking
Dashboard Gathering button should transfer
Statistics works in User to the Gathering
Presentation Statistics Dashboard
Mode
27 Gathering Ensure button Clicking button whilst The expected result Pass
Statistics linking to on Tableau Public
Dashboard Analysis of version of workbook
Statistics works should transfer the
on Tableau Public User to the Analysis of
Statistics Dashboard
28 Analysis of Ensure button Clicking button whilst The expected result Pass
Statistics linking to on Tableau Public
Dashboard Gathering version of workbook
Statistics works should transfer the
on Tableau Public User to the Gathering
Statistics Dashboard

57

You might also like