Risk Analysis: Report On Internship in Machine Learning

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

 

Risk Analysis 
Report on Internship in Machine Learning 

 
 

1 Sept’20 - 1 Dec’20 

Submitted By :

Gaurav Gupta
(​Third-year undergraduate student, Instrumentation and Control Engineering, N.S.I.T.​)

Aditya Aggarwal
(​Third-year undergraduate student, Computer Science Engineering, J.I.I.T​)
   

1
Table of contents 

S.NO  Title  Page1 

1  Acknowledgment  3 

2  Information about the company  4 

3  Description of the internship  5-7 


experience 

4  Conclusion  8 

     

     

2
Acknowledgment

We undertook this internship project and completed the internship report under the
guidance of Anubhav Chugh and Empliance Information Services India. We are grateful
to the Empliance staff for their patience and assistance during our training at their
company. It was a good learning experience for us to work with their IT department, as
the project involved Machine Learning applications.

3
Information about the company

Empliance is a leading provider of compliance management solutions that assists


its clients to comply with stringent statutory & regulatory norms that might flag
them as non-compliant.

Through a combination of in-depth subject expertise, in-house research


capabilities, and technological tools, Empliance innovates and helps implement
compliance processes, policies & procedures that are designed to assist clients
in clear decision making and alert them on instances of non-compliance.

4
Description of the internship experience
At the very start of the internship program, we were tasked to find a relationship
between the factors like Management Evaluation, Operational Assessment,
Financial Assessment, Legal/Compliance Screening, and the expected Risk
Band of a company. After applying many categorizing algorithms (such as K
means, K modes clustering) in different dimensions, the relationship was found
to be dependant on the more number of independent factors other than the given
four, and hence the relationships derived from the applied methods were
discovered to be inaccurate.

After the previous endeavor, our next task was to enhance their database of
companies. After discussing various features, flagging companies that were
involved in fraudulent activities, corruption, and defamation cases, was chosen.
For accomplishing this task we first utilized the article URLs database provided
by the company and then tried training machine learning models meant for
categorical features. We discovered that no matter the technique used, the
accuracy of the model didn’t exceed 65%. After analyzing all components
involved in changing accuracy we decided the biggest issue was with the
database. So we starting building a database from scratch. Utilizing the new
database we trained various models and after comparing their accuracy we
choose the model based on Multinomial naive Bayes as it had the highest
accuracy (89.9%). Utilizing the model we developed a program which on
inputting a CSV file containing a column with names of companies fetches
articles from the web and labels them accordingly and outputs CSV files named
after the respective company, containing the labels and all the URLs found
regarding the given categories and a result file summarizing the results
generated in any given session.

5
Task-1
Objective:
To find a way/order to improve the existing risk band analysis.

Findings:
Tasked to find a relationship between the factors like Management
Evaluation, Operational Assessment, Financial Assessment, Legal/Compliance
Screening, and the expected Risk Band of a company we applied many
categorizing algorithms (such as K means, K modes clustering) in different
dimensions, the relationship thus obtained was found to be dependant on the
more number of independent factors other than the given four.

Conclusion:
It was found that the initial objective was conceptually flawed.
The relationship was found to be dependent on various factors that could
be represented and utilized to predict the risk of a company as the method
of classification was ambiguous.

6
Task - 2

Objective:
To make a program that takes the names of companies as input and
searches the internet for their involvement in fraudulent activities, corruption, and
defamation cases and returns a label and the respective URL.

Result:
We first utilized the article URLs database provided by the company and
then tried training machine learning models meant for categorical features. We
discovered that no matter the technique used, the accuracy of the model didn’t
exceed 65%. After analyzing all components involved in changing accuracy we
decided the biggest issue was with the database. So we starting building a
database from scratch. Utilizing the new database we trained various models and
after comparing their accuracy we choose the model based on Multinomial naive
Bayes as it had the highest accuracy (89.9%). Utilizing the model we developed
a program which on inputting a CSV file containing a column with names of
companies fetches articles from the web and labels them accordingly and
outputs CSV files named after the respective company, containing the labels and
all the URLs found regarding the given categories and a result file summarizing
the results generated in any given session.

7
Conclusion

From our internship at Empliance, We were able to get a better understanding of


how machine learning in the industry works and how effective it is. I enjoyed
working with the Empliance team to devise and implement various machine
learning algorithms. However, I still have a long way to go in understanding the
real-life aspects of Machine Learning, and I need to build up my public speaking
skills as well.

Overall, I found the internship experience to be positive, and I'm sure I will be
able to use the skills I learned in my career later.

You might also like