Professional Documents
Culture Documents
Blackbook (Sahil)
Blackbook (Sahil)
PROJECT REPORT
ON
SAHIL BONDE BE A 11
SUPERVISOR
PROF. SNEHAL SHINDE
Certificate
This is to certify that the project entitled “Voice Changing system using
machine learning” is a bonafide work done by Sahil Bonde and is
submitted in partial fulfillment of the require- ments for the degree of
Bachelors of Engineering in Computer Engineer- ing to the University of
Mumbai.
Examiners:
Date:
Place:
Declaration
We declare that this written submission represents our ideas in our own
words and where others’ ideas or words have been included. We have ad-
equately cited and referenced the original sources. We also declare that we
have adhered to all principles of academic honesty and integrity and have
not misrepresented or fabricated or falsified any idea/data/fact/source in our
submission. We understand that any violation of the above will because for
disciplinary action by the Institute and can also evoke penal action from the
sources which have thus not been properly cited or from whom proper per-
mission has not been taken when needed.
Sahil Bonde
Abstract
i
Abbreviations
ii
List of Figures
iii
List of Tables
iv
Contents
Abstract i
Abbreviations ii
List of Figures...................................................................................... iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Organization of Project Report . . . . . . . . . . . . . . . . 2
2 Literature survey 4
2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Existing System . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 8
3 Requirement Gathering 9
3.1 Software and Hardware Requirements ....................................... 10
4 Plan Of Project 11
4.1 Methodology.............................................................................. 12
4.2 Project Plan (Gantt Chart) .......................................................... 13
4.3 Implemented System.................................................................. 14
5 Project Analysis 15
5.1 Use Case Diagram ..................................................................... 16
5.2 Use Case Document ................................................................... 17
5.2.1 Use case Analysis .......................................................... 18
5.3 Class Diagram ........................................................................... 19
5.4 Activity Diagram ....................................................................... 20
5.5 Sequence Diagram ..................................................................... 21
6 Project Design 22
v
Contents
6.1 Data Flow Diagram .................................................................... 23
6.2 Flow Chart ................................................................................. 24
7 Implemented System 25
7.1 System Architecture ................................................................... 26
7.2 Sample code............................................................................... 27
8 Result Analysis 29
8.1 Result Analysis .......................................................................... 30
References 34
Acknowledgment 35
vi
Voice Changing system using machine learning
Chapter 1
Introduction
1.1 Background
1.2 Relevance
online interactions. Conversely, they also raise concerns about voice imper-
sonation and the potential for misuse in fraud or social engineering attacks,
highlighting the need for robust authentication and detection mechanisms.
Accessibility and Assistive Technology: Voice conversion technology can
benefit individuals with speech disabilities by enabling them to communicate using
synthesized voices that better match their identities or preferences. Moreover, it
can aid language learners by providing opportunities to practice speaking in
different accents or dialects.
Research and Development: Voice changing systems serve as a valuable
research tool for studying speech synthesis, voice perception, and human-
computer interaction. Advancements in machine learning techniques for
voice conversion contribute to our understanding of speech processing and
pave the way for future innovations in artificial intelligence and natural lan-
guage processing.
Chapter 2
Literature survey
In recent years, the field of cyber security has seen increasing attention due
to the rising frequency and sophistication of cyber attacks. In this section,
we review related work in the area of cyber attack detection, with a focus on
malware detection and analysis of the ransomware malicious code
1. Voice Conversion using Deep Learning
In this research Albert Aparicio Isarn and Antonio Bonafonte In this project
we present a first attempt at a Voice Conversion system based on Deep
Learning in which the alignment between the training data is intrinsic to the
model.Our system is structured in three main blocks. The first performs a
vocoding of the speech (we have used Ah ocoder for this task) and a nor-
malization of the data.
2. An overview of voice conversion and its challenges
Chapter 3
Requirement Gathering
2. Hardware Requirements :
Chapter 4
Plan Of Project
4.1 Methodology
Chapter 5
Project Analysis
Chapter 6
Project Design
In this level 0 DFD, the Voice conversion system Basically user sends the
voice to application and Apk sends to admin Admin process all the sound
convert the voice and sent back to user
• DFD (Level 1):
This Level 1 DFD illustrates the key interactions and components of the
Voice Conversion using ml there is user and admin and processing models
and targetting database
– Voice Converter This text box likely refers to a script or function used by
the administrator to deploy the honeypot system on the network.
– User: Represented the user to perform voice converion using ml algorithms
Basically Firstly we have some trained RVC voice models and upload that
RVC models to the EchoFetch Voice conversion tool and It fetches pitch the
sound and Gives sound like rvc models
Chapter 7
Implemented System
%cd /content/
!pip install colorama --quiet
from colorama import Fore, Style
import os
print(f"{Fore.CYAN}> Cloning the repository...{Style.RESET
!git clone https://github.com/w-okada/voice-changer.git --
print(f"{Fore.GREEN}> Successfully cloned the repository!{
%cd voice-changer/server/
print(f"{Fore.CYAN}> Installing libportaudio2...{Style.RES
!apt-get -y install libportaudio2 -qq
print(f"{Fore.CYAN}> Installing pre-dependencies...{Style.
# Install dependencies that are missing from requirements.
!pip install faiss-gpu fairseq pyngrok --quiet
!pip install pyworld --no-build-isolation --quiet
print(f"{Fore.CYAN}> Installing dependencies from requirem
!pip install -r requirements.txt --quiet
print(f"{Fore.GREEN}> Successfully installed all packages!
Token = ’2WNe6ETalPYMTrD6NdHK1QwB4cx_4V6rJmFvWmvgjQ2g7Xuxw
Region = "us - United States (Ohio)" # @param ["ap - Asia/
#@markdown **5** - *(optional)* Other options:
ClearConsole = True # @param {type:"boolean"}
%cd /content/voice-changer/server
from pyngrok import conf, ngrok
MyConfig = conf.PyngrokConfig()
MyConfig.auth_token = Token
MyConfig.region = Region[0:2]
#conf.get_default().authtoken = Token
#conf.get_default().region = Region
conf.set_default(MyConfig);
import subprocess, threading, time, socket, urllib.request
PORT = 8000
from pyngrok import ngrok
ngrokConnection = ngrok.connect(PORT)
public_url = ngrokConnection.public_url
from IPython.display import clear_output
def wait_for_server():
while True:
Pillai HOC College of Engineering and Technology, Rasayani 27
Voice Changing system using machine learning
time.sleep(0.5)
sock = socket.socket(socket.AF_INET, socket.SOCK_STRE
result = sock.connect_ex((’127.0.0.1’, PORT))
if result == 0:
break
sock.close()
if ClearConsole:
clear_output()
print("--------- SERVER READY! ----------------- ")
print("Your server is available at:")
print(public_url)
print(" ----------------------------------------------------------- ")
threading.Thread(target=wait_for_server, daemon=True).star
!python3 MMVCServerSIO.py \
-p {PORT} \
--https False \
--content_vec_500 pretrain/checkpoint_best_legacy_500.pt
--content_vec_500_onnx pretrain/content_vec_500.onnx \
--content_vec_500_onnx_on true \
--hubert_base pretrain/hubert_base.pt \
--hubert_base_jp pretrain/rinna_hubert_base_jp.pt \
--hubert_soft pretrain/hubert/hubert-soft-0d54a1f4.pt \
--nsf_hifigan pretrain/nsf_hifigan/model \
--crepe_onnx_full pretrain/crepe_onnx_full.onnx \
--crepe_onnx_tiny pretrain/crepe_onnx_tiny.onnx \
--rmvpe pretrain/rmvpe.pt \
--model_dir model_dir \
--samples samples.json
ngrok.disconnect(ngrokConnection.public_url)
Chapter 8
Result Analysis
User Interface:
This section likely represents the main screen of the EchoFetch voice con-
version tool. It might allow users to configure and modify various settings
for the Conversion.
Congfiure Settings:
user can easily upload Trained RVC models to the Echofetch The results
demonstrate the system’s ability to achieve high-quality voice conversions
with significantly reduced system requirements compared to existing
setups.you can easily modify tune chunks and select gpu
Chapter 9
9.1 Conclusion
The future work for the Retrieval-Based Voice Conversion System encom-
passes a multifaceted approach. It involves integrating advanced algorithms,
exploring state-of-the-art techniques in machine learning, expanding the voice
database with diverse voices and challenging scenarios, and optimizing for
real-time conversion without compromising quality. The focus extends to
enhancing hyper parameter tuning through advanced techniques and devel-
oping a user-friendly interface for broader accessibility. Cross-linguistic ca-
pabilities, robustness improvements, and interactive voice conversion meth-
ods are key objectives, along with incorporating user feedback, emotional
variability, and ethical considerations. The road map also includes evalu-
ating the system’s generalization on voices not in the training dataset, ex-
ploring cross-modal conversions, addressing privacy concerns, and ensuring
ethical use.
References
(a) Sisman, B., Yamagishi, J., King, S., Li, H. (2020). ”An overview of voice
conversion and its challenges”: From statistical modeling to deep learning.
IEEE/ACM Transactions on Audio, Speech and Language Processing, 29,
132-157. doi: 10.1109/TASLP.2020.3038524
(b) Ryuichi Yamamoto, Eunwoo Song, and Jae-Min Kim. (2020). Parallel
wavegan: ”A fast waveform generation model based on generative adver-
sarial networks with multi-resolution spectrogram”. In ICASSP 2020-2020
IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP). IEEE, pp. 6199–620.
(c) Zhang, A., Lipton, Z. C., Li, M., Smola, A. J. (2021). ”Dive into deep learn-
ing” arXiv preprint arXiv:2106.11342. Available at: https://iopscience.iop.org/article/10.
899X/844/1/012039/pdf
(d) Chien-Yu Huang, Yist Y. Lin, Hung-Yi Lee, Lin-Shan Lee. (2020). Defend-
ing your voice:”Adversarial attack on voice conversion”. ArXiv, abs/2005.08781.
(e) Seung-Won Park, Doo-Young Kim, Myun-Chul Joe. (2020). Cotatron:”Selection
of Optimal Solution for Example and Model of Retrieval Based Voice Con-
version”, abs/2005.03295.
Acknowledgment
It is a privilege for us to have been associated with Prof. Snehal Shinde,
our guide, during this project work. We have been greatly benefited by her
valuable suggestions and ideas. It is with great pleasure that we express our
deep sense of gratitude to them for their valuable guidance, constant
encouragement and patience throughout this work. We are also indebted to
our guide for extending the help to academic literature. We express our
gratitude to Dr. Rajashree Gadhave (Project Coordinator), Prof. Rohini
Bhosale(Head of Department of Computer Engineering), Dr. J. W. Bakal
(Principal) for their constant encouragement, cooperation and support. We
take this opportunity to thank all our classmates for their company during
the course work and for useful discussion we had with them. We would be
failing in our duties if we do not make a mention of our family members,
including our parents for providing moral support, without which this work
would not have been completed.
Thanking You,
Sahil Bonde
List of Publications
Journal
(a) Sahil Bonde,Kiran Jagdale,Sayana Maity Prof.Snehal Shinde, “EchoFetch
Retevial Based Voice Conversion”, ISSN: 2456-4184,April, 2024, volume 9
.
[Status: Submitted]
Plagiarism of Report