Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

September 22, 2021

SOFTWARE REQUIREMENT
SPECIFICATIONS FOR AI/ML
BASED CONTENT CREATION
AND DELIVERY

SOM 690 – Software Project Management

Group 9
Nitya Garg | 209278052
Pulkit Jindal | 209278077
Divya Chintapanti | 209278108
Kartik Ranjan | 209278109
Shreya Chakraborty | 209278111
Kapish Goyal | 209278113
Preetham Upadhya | 209278115
Apoorva Mishra | 209278116

Page | 0
Topic Page No.

Vision 2

Mission 2

1. Introduction

1.1 Purpose 2

1.2 Intended Audience 2

1.3 Intended Use 2

1.4 Key Stakeholders 2

1.5 Scope 3

2. Overall Description

2.1. Product Perspective 3

2.2. Product Functions 4

2.3. User Characteristics 4

2.4. Assumptions and Dependencies 5

3. System Features and Requirements

3.1. Functional Requirements 5

3.2. Key functional areas of Software 7

3.3. Infrastructure requirements 7

3.4. System Features 7

3.5. Nonfunctional Requirements 11

Page | 1
Vision
To create a more optimized and efficient system for content creation and broadcast using AI/ML.

Mission
To explore and understand AI/ML functionalities to deploy an AI/ML-based system that enables better
content development and delivery for better content, ad placement, and viewership.

1. Introduction

1.1 Purpose
With the customer preferences and demands changing so dynamically in the media industry, it becomes
utmost important for the companies to stay one stay ahead in predicting what the customer really wants
and how to increase the viewership as well as boost the revenue generation from it via ad sales,
syndication, or linear channel rates.

1.2 Intended Audience


The SRS document is intended for the CIO, The Head of Innovation, Head of Digital Transformation,
domain experts, project managers and developers, sales and marketing team, and content creation team.

1.3 Intended Use


To get an idea about this model would transform and digitalize the daily operations in the value chain of
a content.

1.4 Key Stakeholders


The key stakeholders in this project can be divided as follows:
▪ The Organization
▪ Internal to the Organization
✓ Leadership Level:
▪ CIO
▪ Head of Innovation
▪ Head of Digital Transformation
▪ Domain Experts

Page | 2
✓ Project Level:
▪ Departmental Head
▪ Project Manager
▪ Development Team
▪ Content Creation/Handling Teams
▪ Sales and Marketing Teams
▪ External to the Organization
✓ Customers/End-users
✓ Sales and Marketing Teams of Advertisers (those purchasing ad airtime)

1.5 Scope
The model would bring about operational changes that would eliminate most of the daily work and
automate several processes that optimize the value chain of a content from creation to delivery to
generating revenue. This would aid the business teams to identify KPIs that would impact the viewership
ratings and hence aid in data driven content writing, scheduling of content and ads, enhancing customer
engagement and viewership experience.

2. Overall Description

2.1 Product Perspective


The product aims to optimize and automate the value chain of content creation and delivery using AI/ML.
The system will consist of the following parts:
I. Data Ingestion
II. Data Modelling
III. Model Preparation and Training
IV. Content Related Predictions

The first part of the project would consist of the data ingestion part. Before building the predictive model,
the data needs to be collected and harmonized into a standard format. The data being collected would
include the various facial expressions being held by the characters on screen, time of airing of the frame,
viewership count, etc. The data collected, would then need to be analysed.

Next, the data would need to be cleaned and ordered in the most efficient manner. For example, each
audio file will be normalized such that they have the same audio length. The data would need to be
modelled into the most usable format – be it in terms of structure, labels, etc. The idea is to be able to
understand how the various parameters (and to what extent) affect the final outcome.

Page | 3
After the data sets are prepared and formatted properly for use, the AI/ML model will be built and will
undergo training using the datasets. Then, new data will be supplied to it in order to test out the accuracy
of the predictions.

Once the predictions are made, the kind of content to be made will be obtained. These models will be
used to help develop the recommended content. The product will slowly be incorporated to include
various functionalities that prevent the creation of content that is harmful, offensive, etc., to the audiences
that are most likely to be viewing the program. The product will also include the ability to strategically
place ads that are in tune with what is being broadcasted/viewed so that the brand paying for the airtime
receives maximum retention from the audience. (In such cases, the brand will be charged a premium for
the airtime.)

2.2 Product Functions


With this product, content creators will be able to create more shows/productions that the audiences
find more relatable. There would also be a correlation between the target audience and the time of day.
The aim of this is to increase viewership and by extension revenue.
Marketers will also be able to buy time during specific shows/productions to have their brand's ads run
such that maximum retention of their product takes place. This feature would be made available at a
premium to the target users and would also contribute to the increase in revenue.

2.3 User Characteristics


There are 5 types of users who have slightly different uses of the system and have their own requirements
each.
The overall interface will be intuitive and user-friendly. The basic characteristics expected of everyone is
to be computer literate and capable of searching for and looking at data.
1. Digital Analytics Team – Should have access to data repository and be familiar with reading
graphs to understand what type of content is preferred

2. Business Team – Should be capable of classifying emotions on the application and understanding
the classes of content in various categories

3. Digital Transformation Team – Should be familiar with machine learning methods and have
the previous data that is required for the same. They should also have ready access to content
and be capable of understanding it.

Page | 4
4. Marketing and Advertisement – Need to be capable of understanding graphs and relevant
data to identify where the ad intervals for placing of ads.

5. Production Team – Should be able to select and view the best shot, start and end of a shot, to
be capable of identifying the most marketable content.

2.4 Assumptions and Dependencies


▪ Weekly BARC data is mandatory for this model to give desired output and help in business
operations.
▪ Integration with existing database on cloud is essential for a seamless extraction and processing
of data.

3. System Features and Requirements

3.1 Functional Requirements

Name FR-1: Character Identification


Summary Identifying the characters in each frame with a desired confidence interval
Rationale Breaking down each content to a per frame (per second) basis and identifying
which characters are present in that given frame along with their facial
expression. This would then help us understand the customer preferences in
terms of characters and their storyline.
Requirements The character preference for a consumer of OTT content is very important as
people watch content of their favorite characters and prefer them over others.
To apply analytics on this the character identification is required. The
requirement is for digital team to have a data repository based on which they
can understand consumer preferences. Following are the requirements for the
digital team:
a) Breakdown content into frames
b) Identify characters in each frame
c) Build a data repository of characters and their occurrences in the
content
d) Digital team requires this data repository to perform further analytics
on content
References UC-1, UC-2, UC-3

Page | 5
Name FR-2: Emotion Recognition
Summary Classification of the emotion of the characters into pre-defined categories
Rationale Emotion recognition only from visual expression can’t be considered accurate
enough and hence, a speech emotion recognition system would enhance the
model and give more accurate results that would aid the business teams.
Requirements The aim of the business team is to make content that connects more with the
consumers and one part of that would be to check what kind of emotions are
preferred by users. To have this data, we need to classify emotions in pre-defined
categories. To build this we would require a speech recognition model and aid
the business team and digital analysis team to fulfill their following requirements:
a) Have a data repository with pre-defined emotion categories
b) Define multiple classes in the content corresponding to above categories
c) Provide this data to the digital analytics team for further requirement
References UC-4

Name FR-3: Analysis of BARC data


Summary Analysis of the weekly ratings corresponding to the content
Rationale To get an idea about customer preferences and use the insights to aid data driven
content creation, proper analysis needs to be done as to what the customer likes
to watch and what not at a very minute level.
Requirements One of the main aims of the project is to create a capability for user preference
analytics. The user choice of content needs to be analyzed so to decide on what
type of content to make in future. Following are the main requirements from
the business team:
a) Define KPIs that would help us in identifying user preference
b) Build analysis over and above the identified KPIs
c) Understand user pattern and consumer preference
d) Correlate user behaviour with their choice of content
e) Build analytics capabilities for continuous monitoring of KPIs
References UC-5

Name FR-4: FPC Planner


Summary Scheduling of content to create maximum impact on ratings and ad-conversion
Rationale Proper scheduling of content based on the customer demand and placement of
relevant ads when the audience engagement is highest will boost the customer
base as well as the revenue from ad sales.
Requirements The requirement is to know which content to schedule when according to the
consumer preferences. For this we need to know the following details:
a) Consumer preference of content with respect to time slots available
b) What kind of advertisements to show in breaks
c) What should be the length of an ideal break with respect to the content
References UC-6

Page | 6
Name FR-5: Performing Quality Checks
Summary Increase customer engagement by automatically performing the visual analysis
Rationale Performing quality checks and identifying key frames to create trailers and
highlights takes a lot of manual effort and time. Automating the process to
enhance the viewership engagement.
Requirements This is a post-production requirement. The business needs to know the content
quality as well as the regulatory compliances with respect to the content.
Following are some of the requirements:
a) Check content quality with respect to multiple parameters in content
like filming quality, frame rate, shot quality etc.
b) Keep checks with respect to regulatory compliances of the content and
flag non-compliant content automatically
c) Identify best shots from the content so that they can be used for trailers
as well as posters for customer engagement
d) Keep checks in the integration of audio and video data and flag any
unpleasant part of the content
e) Maintain a monitoring system for the above requirements
References UC-7

3.2 Key Functional Areas of Software


▪ Graphical interface for easy-to-read statistical changes
▪ Predictive component for new ideas to implement in content creation
▪ Ability to identify optimal placements of ads during created content
▪ Ability to identify unoptimized areas in the value chain

3.3 Infrastructure Requirements


The following elements will be needed in terms of infrastructure to carry out this project:
▪ Open-Source Libraries for AI/ML programming like NumPy, SciPy, Scikit Learn etc.
▪ Cloud services like S3 storage, AWS Lambda
▪ Open-Source platforms like TensorFlow, Kafka etc for development and deployment.

3.4 System Features


Use cases:
Name UC-1: Data repository
Summary Creating a database of the content pipeline for various stakeholders to access
Rationale To start the pre-processing of data, it is critical to have a collated database that
has proper access rights to different stakeholders such as data scientists and each
of the user activity of read or write is monitored and audited.
Users Rationale

Page | 7
Preconditions The content pipeline is stored on native storage devices and categorization and
sorting is quite difficult.
Basic course a) Logging in to the on-premises storage with admin accounts.
of events b) Initiating the transfer of data from native on premises storage to cloud
storage like AWS S3.
c) Performing sanitary checks on the data stored on cloud.
d) Create user profiles for access and modifying the data.
e) Defining rights on each of the user profiles created.
f) Creating a log for user activity for tracking the handling of data.
Alternative Providing access to the data scientists on the on-premises native storage data
paths but that is not advisable due to security reasons.
Post- A segregated data base of all the content that is there in the pipeline for the
conditions development team to work upon.

Name UC-2: Extraction of frames


Summary Breaking down the content in frames at per second level
Rationale Breaking down each content to a per frame (per second) basis to perform data
segregation and normalization that would then help in tagging each of these
frames to a particular character.
Users Digital Transformation Team
Preconditions Database of prior relevant content is available on the cloud storage and frames
need to be extracted and segregated
Basic course a) Logging in to the cloud storage platform.
of events b) Collection of data from the storage buckets for the tool to use.
c) Collected data to be fed in the tool to extract frames of each second.
d) Identifying similar faces using ML model to segregate the extracted
frames.
e) Train the model with the segregated data
Alternative For frames which are not being identified can be ignored if minimum data is there
paths to train the tagging model or else manual classification is needed.
Post- Automatic segregation of each frame for any of the relevant content that would
conditions eliminate long hours of manual work.

Name UC-3: Face tagging


Summary Identifying the characters in each frame with a desired confidence interval
Rationale Identifying the segregated frames from the content and tag them to a particular
character or object that can later be used by the business teams to formulate an
analysis with the viewership ratings.
Users Digital Transformation Team
Preconditions BARC data provides insights about the variation in the ratings but a proper
relation with the segments and characters couldn’t be derived.
Basic course a) Model is trained on the test data (segregated frames)
of events b) A desired level of accuracy is obtained from the model by retraining.

Page | 8
c) Model is then run on the frames extracted from the new contents in the
pipeline.
d) Segregated frames on basis of characters and objects are then created
and stored.
e) Analytics team creates clusters on per minute basis using the segregated
frames.
Alternative For frames which are not giving desired results (identified using confidence
paths intervals), they can be flagged out and manual tagging can be done.
Post- Automatic identification of characters or object in each frame for any of the
conditions relevant content that would eliminate long hours of manual work and aid the
teams to form clusters for further analysis.

Name UC-4: Speech Emotion Recognition (SER)


Summary Classification of the emotion of the characters into pre-defined categories
Rationale Emotion recognition only from visual expression can’t be considered accurate
enough and hence, a speech emotion recognition system would enhance the
model and give more accurate results that would aid the business teams.
Users Digital Transformation Team
Preconditions Training data that has similar dialogues by various characters in different class of
emotions that we want to segregate out output in.
Basic course a) Creating a database of voices of the characters that are to be predicted
of events in different tones and pitches according to the class of emotions.
b) Feature extraction using MFCC in LibROSA Python library by plotting
spectrograms that detects the change in frequency of the voice.
c) Using CNN classification technique to segregate the voice notes in the
pre-defined emotion classes.
d) Test the accuracy of the model given the confidence interval.
e) Retrain the model if desired accuracy is not met.
Alternative For frames which are not giving desired results (identified using confidence
paths intervals), they can be flagged out and manual recognition can be done.
Post- Automatic identification of emotions of characters throughout the content that
conditions would eliminate the need to watch and classify the genre of the content.

Name UC-5: Viewership Analytics


Summary Analysis of the weekly ratings corresponding to the content
Rationale To get an idea about customer preferences and use the insights to aid data driven
content creation, proper analysis needs to be done as to what the customer likes
to watch and what not at a very minute level.
Users Viewership Analytics Team
Preconditions Weekly ratings (TRP) data must be available from BARC so that it can be mapped
with the ML model output to understand what the clusters are (on parameters
like character, emotion, place etc.) that the customer is demanding and what
he/she is not.

Page | 9
Basic course a) Breakdown and analysis of weekly TRP ratings at per minute basis and
of events segregate the highly rated and low rated sections of the content.
b) Extracted frames/sections of the content according to their ratings will
be then looked for in the model output to get parameters like the
characters, their emotions, place, conversation etc.
c) Summarize the mapped data to form clusters that define highly rated and
low rated values of the parameters
d) Use this summary to get insight about the type of content that is
preferred and focus further creation of content on similar lines.
Alternative If the clusters change as per customer preferences, then the model automatically
paths gives out the insights and content team is made aware of the same.
Post- Weekly redundant job of manually assessing the TRP data and forming clusters
conditions is eliminated and the process is fully automated

Name UC-6: Scheduler/Planner


Summary Scheduling of content to create maximum impact on ratings and ad-conversion
Rationale Proper scheduling of content based on the customer demand and placement of
relevant ads when the audience engagement is highest will boost the customer
base as well as the revenue from ad sales.
Users Marketing and Advertisement team
Preconditions Keywords and entities must be extracted beforehand from the content that is
going to be aired. Past weeks ratings should also be handy.
Basic course a) Based on the viewership analytics, customer sentiments are measured,
of events and a scheduler is planned which tries to air relevant content at the peak
hours.
b) Keywords and objects are extracted from the frames that are formed
while creating face tags.
c) Advertisements of relevant objects/entities are then placed at proper
intervals so that brand retention and hence conversion ratio is high.
d) Breaks within an episode is also planned in a way that it doesn’t hamper
the viewer’s experience and so abrupt endings are avoided.
e) This can be done automatically instead of watching each of the content
manually to identify such key frames.
Alternative Less relevant ads that doesn’t add much to the revenue can be placed at off
paths hours with repeated content that doesn’t require prior planning.
Post- Better ad retention, viewership, customer engagement and hence increased
conditions revenue from broadcasting and ad sales.

Name UC-7: Post-production Analysis


Summary Increase customer engagement by automatically performing the visual analysis
Rationale Performing quality checks and identifying key frames to create trailers and
highlights takes a lot of manual effort and time. Automating the process to
enhance the viewership engagement.

Page | 10
Users Production Team
Preconditions Raw formatted content is available which then can be used to generate trailers
and precaps or even posters to be made available to the viewers.
Basic course a) Model analyzes the content to select best possible shots from the
of events content.
b) These shots are identified based on parameters like color balance, focus
on characters, emotions, best shot of characters etc.
c) It also gives the start and end of a shot to ensure maximum engagement
and avoid abrupt endings to the shot, something that is very difficult for
a human eye to interpret.
d) This aids the production team to have a robust and quick mechanism in
place that helps in marketing the content well.
Alternative Highlights of sports events also requires selection of best shots which can also
paths be done using this model.
Post- Better customer engagement, ensuring proper quality of content and compliance
conditions regulations of the content.

3.4 Non-functional Requirements

3.4.1 Scalability
The NLP and CNN model should be able to handle more and more data with time as usage data will be
keep flowing in at a rapid rate. Also, as the no. of shows increase over time the server for deployment
should be easy to scale and shouldn’t require reinstallation and backup.

3.4.2 Security
BARC review data is very sensitive and hence the infrastructure we use for storing their data should be
highly secure without any vulnerability

3.4.3 Capacity
Video and Audio generates large size data. The database and the cloud storage used must be large
enough to accommodate all the data required for the project

3.4.4 Maintainability
Once the models are built and deployed, the system needs to be continuously monitor for change in
accuracies and new developments. The ML models would have to be trained in periods to adhere to
new data generated and falling accuracy.

Page | 11
3.4.5 Reliability
This software will be developed with machine learning, feature engineering and deep learning techniques.
So, in this step there is no certain reliable percentage that is measurable.
Also, user provided data will be used to compare with result and measure reliability. The maintenance
period should not be a matter because the reliable version is always run on the server which allow users
to access summarization. When admins want to update, it take long as upload and update time of
executable on server. The users can be reach and use program at any time, so maintenance should not
be a big issue.

3.4.6 Supportability
The system should require C, Java, Python and Matlab knowledge to maintenance. If any problem
acquires in server side and deep learning methods, it requires code knowledge and deep learning
background to solve. Client-side problems should be fixed with an update, and it also require code
knowledge and network knowledge.

3.4.7 Usability
The system should be easy to use. The user should get the clustered output and inferences with one
button press if possible. Because one of the software’s features is timesaving. The system also should be
user friendly for admins because anyone can be admin instead of programmers. Training the
Autoencoders and classifiers are used too many times, so it is better to make it easy.

3.4.8 Performance
The capacity of servers should be as high as possible. Calculation time and response time should be as
low, because one of the software’s features is timesaving.

Page | 12

You might also like