Pass 3 - Expected Volume and Database

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Pass 3: Expected Volume and Database

Expected Volume in Terms of User Base

The platform will have three main target markets or user segments. This includes
corporations who subscribe on behalf of their employees for different uses. These
corporations can range from different sectors such as call centers. The second type of users
will be schools that will have teachers utilize the platform for class. The third type of users is
personal users who are individuals utilizing the platform for miscellaneous such as students.
Below I will estimate the total addressable market size for the platform, by doing a rough
estimate of each user segment.

To calculate the size of the target market in corporations, we begin by looking at the
number of corporations in the Philippines. DTI reported a total of 957,620 business
enterprises in the Philippines. This includes large corporations which make up 0.49% and
MSMEs which make up 99.51%. However, the platform will be more suitable for small,
medium, and large corporations. This is given that micro corporations only have 1-9
employees which may not find much value in the platform, since the platform is mainly
utilized for meetings. Accounting for this, it leaves 107,493 corporations from large to small
enterprises as potential users. The total number of employees from these sectors amounts
to 6,064,164 people. However, certain sectors may not utilize the platform like
manufacturing, and not all employees/departments need to use the platform. Other
corporations may have other reasons such as not being able to afford a platform or lack the
technological adoption to implement it in their organization. This means user penetration
should be much lower at about 50%. This leaves about 3,032,082 employees who serve as
potential users of the platform.

For schools, there are around 60,744 schools across different educational levels in
the Philippines. Assuming that public schools will be unable to afford the platform(unless a
partnership is done with the government or CHED) due to a lack of budget, this leaves about
13,132 schools (from private, LUC/SUC, and PSO). The DepEd estimate for the number of
teachers and personnel in these schools in 2020 was placed at 300,000. However, as with
any platform Transcribe.Ai will not necessarily be able to gain 100% user penetration in this
segment. These schools may a variety of reasons to not use the platform such as choosing
other alternatives, lacking budget, or lack of technology adoption. As such, an optimistic
estimate would be a 70% penetration for the platform at its peak. Multiplying this user
penetration to the remaining schools and teachers would leave about 9192 schools as
potential subscribers to the platform which leads to 210,000 personnel and teachers as

The last target market would be users using the platform for personal use. However,
this will mainly be composed of students, since it is this type of user profile that would have
the most value for the use case of the platform such as for note-taking and summaries.
Considering that the platform may be too expensive for most public school students, only
non-public school students will be used in the estimate. According to DepEd, this estimates
3,332,054 students. However, not all students would use the platform for a variety of reasons
such as lack of budget, not caring enough to utilize the platform for studies, and more. As
such, the penetration for the platform would be around 40%, which leaves 1,332,822
students as potential users.
Overall, the total addressable market of the platform in the Philippines would be
estimated at 4,574,904 potential users. However, given that the platform will only begin
launching this means that it will unable to fully capture the entire market, and will likely
experience users churning at the earlier stages of the platform (due to bugs and user
interface being improved). For estimation purposes, Transcribe.Ai targets to have 10,000
users by year 1. This is because it will still be in its beta phase as it focuses on the
development of the platform, improving the user interface, and fixing bugs encountered. It
will then follow a 145% annual user growth rate following the industry average of Saas
startup, which will gradually decrease as the company reaches maturity. As such, in
succeeding years, the annual user growth rate will decrease at a constant rate (i.e. 145% in
Year 2, 135% in Year 3, 123% in Year 4). In addition, the churn rate of users will be included
in the estimate following the industry average with an annual churn rate of 8% observed in
subscription services. If the company finds success in the Philippines, it may begin its
expansion into other countries such as South East Asia.
Figure 1. User Forecast for Transcribe.Ai

Estimated Database Size (Characters + Audio File)

The first big files where database estimating would be required in is the Personal
Information segment. In this page the following will be obtained upon registration of the user:
email address, password, phone number, first name, and last name. According to Baynard,
the average length of email address were placed at 25 characters. On the other hand,
InfoSec reported that average password length was placed at 10 characters. The phone
number character was computed given the 10 number limit in the Philippines. Lastly, the first
name and last name according to the Research Gate was 7 and 8 characters, respectively.
In order to account for any errors for underestimating the character length in the fields
above, a margin of error of 15 characters was added. This brings the total characters for the
personal information page to be 75 characters per user. This then converts to 75 Bytes per

Table 1. Personal Information Character Estimate

Personal Information

Field Characters

Email Address 25*

Password 10**

Phone Number 10

First Name 7***

Last Name 8***

Margin for Error 15

Total per User 75

***Research Gate

The next big files would be those related to the account or platform features. This
means user specific features that are provided by the platform, and the general features of
transcribing files of the platform. For Account Plan, this was calculated by summing the
character lengths of the different plans available on the platform (Basic, Pro, Business and
Enterprise), and then dividing it by 4 which would be 6.5 characters. This estimate can be
adjusted in the future depending on the number of users who avail of each plan. Next, the
custom or personal vocabulary is a platform feature where the user can create their custom
word to be transcribe(Ex. “FF” can be a customer word which in gaming context means
surrender). Given that the average english word is around 4.5 characters, this was rounded
up and then multiplied by an estimate of 10 custom words per user. This is followed by the
account referral link where users can refer other people to the platform for benefits or
promotions. The account referral link was estimated to be 28 characters, following the length
of typical tinyurl url. In addition, file names would also be included for the transcribed files
with character lengths of up to 16, and with an average files of 20 per user. In line with the
transcribe files which are categorized by content, there will be an average characters per
category would be 10, with an average of 5 categories per user.

The largest components of the segment in terms of characters would be the

transcription file and summary/overview file. These files are filled with many characters as
the transcription file is the audio transcribe into text form, with the summary file being an text
overview of the contents. The transcription file was calculated by multiplying the average
words spoken per minute which is 140 words, to 4.5 characters which is the average length
of an english word. This was then further multiplied by 45 minutes being the average
transcription file length, and multiplied by an average of 10 transcribed files per user. Given
that the summary file should be an overview of the transcription, it should be around one fifth
in length in terms of characters. A margin for error of 50,000 characters was included in case
of underestimating the different fields, especially for the transcription files. This all
accumulates to 390,654.5 characters per user which converts to 390,654.5 Bytes per user.
Table 2. Account Features Character Estimate
Platform/Account Features

Field Characters

Account Plan 6.5*

Custom/Personal Vocabulary 5** x 10 (words) = 50

Account Referral Link 28***

File Names 16 x 20 = 320

Bookmarks/Category Tagging 10 x 5 (types of categories) = 50

Transcription File 140**** x 4.5**** x 45 x 10
= 283,500

Summary/Overview File 140 x 4.5 x 45 x 10 x ⅕

= 56,700

Margin for Error 50,000

Total per User 390,654.5

*Calculated by Summing (Basic, Pro, Business and Enterprise) then dividing by 4
****Gma Transcription

Audio Files Sizing

In order to minimize costs of storing audio files in the platform’s database, a hard limit
of 2GB will be implemented as the maximum capacity for all the audio files. According to
Standford, 1MB converts into around 1 minute of playtime in a mp3 file. This 2GB maximum
storage capacity per account then converts into around 2000 minutes of audio playtime. If a
storage restriction was not implemented this may lead to excess cost in storing these files in
the database. However, there are two alternatives to increase the storage cost. First,
Transcribe.Ai can offer plans or add-ons which increase the storage capacity of the account.
This allows Transcribe.Ai to not loss profits through carrying excess amounts of costs in
storing files. Second, Transcribe.Ai can integrate with your existing google drive or dropbox
account through which the audio can be stored through. This allows users to not have to pay
an increase amount for going over the storage limit, while also reducing costs on the side of
Transcribe.Ai. Instead the cloud storing platform such as google drive or dropbox will bear
the cost of storing these files. With these alternatives, an estimate of 1GB out of 2Gb will be
utilized by users on average. This is also considering the fact that 1GB converts to 1000
minutes of playtime or 16.7 hours, which not all users will be able to fully utilize (due to its
lengthy playtime). In addition, a 10MB will be allocated for the personal voiceprint of the
user, which uses machine learning to be able to more accurately transcribe the users voice.
10MB was allocated since user will be asked to read a 10 minute pre-determined script
which the computer can utilize in better analyzing and detecting the users voice.
Table 3. Audio Files Sizing Estimate
Audio Files

File File Size

Personal Voiceprint 10MB*

Audio Files 1.5GB

Forecast and Cost Estimation
The total storage needed per year has been estimated according to the forecast net users of the platform. Listed in Year 0 are the total
amount of storage needed per segment for each user which amounts to 0.0010014 TB. The succeeding years were then calculated by
multiplying 0.0010014TB by the number of users in each year. As with the increase amount of users in the platform, it will require an increasing
storage capacity to carry the load of all the users.

Figure 2. Total Storage Required Forecast

The total costs required in order to have the storage capacity that can fill Transcribe.Ai’s userbase has been forecasted below. For using
a cloud database such as AWS, the AWS calculator was used to estimate the total cost required to carry the storage capacity. For each year,
the AWS calculator was used to estimate the total cost (with the url listed in the references below). As for using the a physical hard disk, a per
unit basis of 8TB for a single hard disk from Alibaba was used which cost $259. This is because buying in terms of the 8TB was the most
optimal in terms of pricing where 1TB = $32.4 as compared to other variants such as the 1TB hard drive where 1TB = $49. In terms of only
storage, it seems that have a physical storage with hard disk are relatively cheaper to having a server through AWS. It can be see from Year 1
that alone that the Hard Disk costs was only at P14,913 vs the P732,498 of AWS. As such, in terms of purely cost, Hard Disk would be the best
choice. However, other factors are yet to be considered such as benefits of cloud, maintenance cost, physical location, and more. This will be
explored in Pass 4.

Figure 3. Total Cost of AWS vs Hard Disk Forecast Comparison


Borysko, N. (2021, April 5). Average saas growth rate: Brief guide for startups. Eleken.
Retrieved May 11, 2022, from

DepEd posts 4% increase in enrollment for basic education in SY 2021-2022.

Department of Education. (2021, November 18). Retrieved May 11, 2022, from

DTI. (2020). DTI. Retrieved May 11, 2022, from

Llego, M. A. (2021, August 24). DepEd Basic Education Statistics for school year
2019-2020. TeacherPH. Retrieved May 11, 2022, from

The MSME sector at a glance - Philippine Senate. (2012,

March). Retrieved May 11, 2022, from


You might also like