Text Miner Setup

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Text-Miner Setup Process

Setting up of text miner for the particular Brands/Subcategories includes the following steps:

Use CPB-VoC as the organization to create workspaces for the categories (Pilot phase - 10 brands) for
the ML as this can be used by all members to view the progress.
a. Once on this page - https://heartbeat.amazon.com/textminer#/orgs/CPB-VoC click on
create your workspace
i. The naming convention followed should be the subcategory followed by the username.
Eg: "Coffee Pods CBP by Sijinj"
b. Once the workspace is created add the necessary filters
i. 3 months ago (as the system is capable of taking data just 3 months back)
ii. The next attribute is "product reviews" as our team focusses on customer rating.
iii. In the filter tab, select "Manage my filters", a new tab opens up
1. Click on New Filter
2. Provide a name for the filter - by convention - subcategory - username -
subcategory codes
3. Add a description (generally copy the same as filter name)
4. Click on Add fields
a. Select @customerinteractions .rating.overall under "field" and set the
values to 1 and 2. This is because this is our primary pain points.
b. Select @product.productsubcategorycode under "field" and set the
values to the corresponding subcategories
c. Select @product.brand.raw under "field" and set the values to the
corresponding brand names
2. Click on the permissions button. In the pop-up that appears, choose "LDAP", 
input "CPB-VoC", Select "view & edit" and click on "Add". Close the pop up
3. Click on save
ii. Now the newly created filter will be available in the filter tab. Select it!
b. This populates the number of product reviews  in the "Topic stats" tab.
c. Create a topic named "Test" and save it. This gives the total reviews and the % annotated
and the corresponding precision
d. Create another topic named "Test - 2" and save it. Go through the comments one by one
and collate the list of words that need to be added in the "include" tab. The total count here
should match with the "test" topic eventually.
e. Once the topic "test - 2" has been created the inclusion words can be grouped and separate
topics can be created.
i. While creating a topic, ensure to click the save button after adding the inclusions and
exclusions
ii. To add phrases use double quotes.
iii. The" ~ " symbol followed by a number is added to the phrases if we expect the words in
the phrases to be spaced out in different customer reviews.
b. Finally go through the comments in each topic and annotate them thereby generating the
precision
 
Primary objective of this exercise is to have a tool that can be used by the operations team for the time
being until we develop our tool.
The point of this exercise is to create datasets for the ML to learn from initially.
 
Challenges faced:
 Known is a drop, unknown is an ocean. The inclusion/exclusion words provided by us will be
exhaustive as long as this activity is done on a recurring basis. Also the synonyms and antonyms of
the words need to be added to avoid irregularities. (words such as could/should/would etc to be
used in combination with the primary keywords to form phrases)
 Handling exceptions is tricky. Eg: Flushable
 Should also familiarize oneself with the slang/jargons of the respective region.
 Lesser the customer reviews, more difficult it is to identify the defects that could be associated
with the products and hence quality of the sample set for the ML becomes poor.

You might also like