Professional Documents
Culture Documents
data-analytics-internship-report-santhosh
data-analytics-internship-report-santhosh
data-analytics-internship-report-santhosh
During
III Year II Semester Summer
Submitted to
The Department of Computer Science and Engineering
Bachelor of Technologyin
Computer Science and Engineering
By
SANTOSH LOLAM 20311A0599
Affiliated to
Jawaharlal Nehru Technology University
Hyderabad - 500085
Department of Computer Science and Engineering
CERTIFICATE
This is to certify that this Summer Industry Internship –II report on “Visualization and
Analysis of India’s GDP using AWS services”, submitted by Santhosh Lolam
(20311A0599) in the year 2023 in partial fulfillment of the academic requirements of
Jawaharlal Nehru Technological University for the award of the degree of Bachelor of
Technology in Computer Science and Engineering, is a bonafide work- summer industry
internship that has been carried out during III B.Tech CSE II semester, will be
evaluated in IV
B.Tech CSE I Semester , under your guidance.
This report has not been submitted to any other institute or university for the award of
anydegree.
External Examiner
Date:-
DECLARATION
It is declared to the best of our knowledge that the work reported does not form part of any
dissertation submitted to any other University or Institute for award of any degree
ACKNOWLEDGEMENT
I would like to express our gratitude to all the people behind the screen who helped me to
transform an idea into a real application.
I would like to thank our Project coordinator Mrs.B.Vasundhara Devi for her technical
guidance, constant encouragement and support in carrying out our project at college.
I profoundly thank Dr. ARUNA VARANASI, Head of the Department of Computer Science
& Engineering who has been an excellent guide and also a great source of inspiration to our
work.
We would like to express our heart-felt gratitude to our parents without whom we would
not have been privileged to achieve and fulfill our dreams. We are grateful to our principal,
Dr.T.Ch.Siva Reddy, who most ably run the institution and has had the major hand in
enabling me todo our project.
The satisfaction and euphoria that accompany the successful completion of the task would
be great but incomplete without the mention of the people who made it possible with their
constant guidance and encouragement crowns all the efforts with success. In this context, we
would like thank all the other staff members, both teaching and non-teaching, who have
extended their timely help and eased our task.
Storing data in Amazon Redshift along with Amazon S3 is of paramount importance in the
field of data analytics, serving as a foundational solution for secure, scalable, and reliable
data storage. Amazon S3's ability to handle diverse datasets, from raw to processed, makes
it an ideal choice for analytics workflows, ensuring seamless scalability as data volumes
grow. The durability and availability of Amazon S3 contribute to the reliability of analytics
processes, while robust security features such as access controls and encryption safeguard
sensitive data. The integration capabilities with various analytics tools streamline
workflows, allowing analysts to efficiently access and analyze data. Overall, Amazon S3
plays a central role in empowering organizations to derive meaningful insights from their
data while maintaining the integrity, security, and scalability required for
effective data analytics.
LIST OF FIGURES
S.NO Figure No. Title of Figure Page No.
INDEX
Abstract i
List of Figures ii
1. INTRODUCTION 1
1.1 Scope 1
2. SYSTEM ANALYSIS 4
3.SYSTEM DESIGN 6
3.2 Modules 7
4. SYSTEM IMPLEMENTATION 9
5. OUTPUT SCREENS 15
BIBLIOGRAPHY 21
Appendix A: Abstract 22
Appendix B: Correlation between the Summer Industry Internship-I and the Program
Outcomes (POs), Program Specific Outcomes (PSOs) 23
Appendix C: Domain of Internship and Nature of internship 24
1. INTRODUCTION
Storing and managing data efficiently is a critical aspect of modern digital ecosystems, and
Amazon Redshift has emerged as a cornerstone in this endeavor. As a highly scalable,
durable, and secure object storage service, Amazon S3 offers organizations a robust
platform to store, retrieve, and manage vast amounts of data in the cloud. This introduction
provides an overview of the key features and benefits of leveraging Amazon S3 for data
storage, highlighting its pivotal role in addressing the evolving needs of businesses in the
digital age. From its seamless scalability to advanced security measures and integration
capabilities, Amazon S3 has become a go-to solution for diverse applications, ranging from
data analytics and content distribution to backup and archiving. Understanding the
significance of Amazon S3 lays the foundation for harnessing the full potential of cloud-
based storage solutions in the pursuit of effective and streamlined data management.
1.1 Scope
Organizations used to rely on on-premises solutions or other cloud storage systems for
data storage before implementing Amazon Simple Storage Service (Amazon S3).On-
premises setups often involved physical servers and local infrastructure, posing
challenges in scalability and flexibility. Some organizations used other cloud storage
platforms, facing limitations in terms of scalability and integration. The pre-Amazon
S3 era was characterized by a lack of seamless scalability and comprehensive features
in data storage systems.
• Limited Scalability
• Higher Upfront Costs
• Complex Maintenance
• Reduced Flexibility
• Limited Accessibility and Collaboration
The proposed system for data storage in Amazon Simple Storage Service (Amazon S3)
revolves around creating a streamlined and secure infrastructure. Utilizing S3 buckets as the
organizational framework, the system ensures the systematic categorization and storage of
diverse datasets. Access controls are meticulously configured to fortify security measures,
providing granular control over data access. Data transfer methods, including direct uploads
and seamless integration with AWS services, facilitate the efficient and secure flow of a
variety of data types into Amazon S3, ensuring adaptability to dynamic data requirements.
Strategic decisions regarding storage classes, such as Standard, Intelligent-Tiering, Glacier,
and Glacier Deep Archive, are made based on the specific characteristics of the data. This
approach optimizes storage by balancing considerations of durability, accessibility, and
cost-effectiveness. Enabling versioning enhances data integrity, offering protection against
accidental deletions or modifications. Automated backup strategies and lifecycle policies
are implemented, efficiently managing data retention periods and transitions between
storage classes. Moreover, the seamless integration of Amazon S3 with analytics tools
within the AWS ecosystem streamlines data analysis workflows. This integration empowers
organizations to extract valuable insights from stored data, fostering informed and data-
2
MERITS:
• Scalability and Flexibility
• Enhanced Security Measures
• Optimized Storage Cost
• Data Integrity and Disaster Recovery
• Seamless Integration for Analytics
2. SYSTEM ANALYSIS
System analysis for storing data in Amazon Simple Storage Service (Amazon S3) involves a
comprehensive examination of the requirements, functionalities, and constraints associated
with utilizing this cloud storage solution. The analysis encompasses several key aspects:
Bucket Management:
• Creation: Users should be able to create new S3 buckets to logically organize and store data.
• Deletion: Authorized users should have the ability to delete buckets that are no longer
needed.
• Configuration: Users must be able to configure bucket properties, including access controls
and logging settings.
Access Controls:
• ACLs and Bucket Policies: Implement access control lists (ACLs) and bucket
policies to control who can access and perform operations on S3 buckets and objects.
Security Measures:
• Encryption: Implement encryption mechanisms for data in transit and at rest, ensuring
the security and confidentiality of stored information.
Usability:
• User Access Management: Implement user access management to control who can perform
The performance requirements for storing data in Amazon Simple Storage Service (Amazon
S3) center on optimizing data transfer, retrieval, and system responsiveness. The system
must ensure high-speed data transfer between clients and S3 buckets, with clearly defined
minimum acceptable rates for uploads and downloads, accounting for network latency.
Minimizing latency in data access and retrieval operations is paramount, and the system
should support a specified number of concurrent requests without compromising
performance. Different storage classes, such as Standard and Glacier, should exhibit defined
performance characteristics, and the system must scale horizontally to handle increasing
data volumes while maintaining high availability and reliability. Data redundancy measures
should be in place to ensure availability in the event of hardware failures, and the system
should optimize data retrieval speed, especially for frequently accessed data. Throughput
requirements must be specified for data transfer operations, and seamless integration with
analytics tools and other AWS services should be ensured. Monitoring and reporting
mechanisms for performance metrics, including caching to optimize retrieval, should be
implemented to evaluate and maintain the system's efficiency, responsiveness, and
scalability over time.
➢ Technology: Amazon S3
3. SYSTEM DESIGN
All big data solutions begin with storing data. This is the first step in the big data pipeline.
You can store data with several different services from Amazon Web Services (AWS).
Amazon Simple Storage Service (Amazon S3) is one of the most commonly used services
for storing data. The AWS Management Console to create an S3 bucket. You will then add
an AWS Identity and Access Management (IAM) user to a group that has full access to
Amazon S3. You will also upload files to Amazon S3, and run simple queries on the
data in Amazon S3. You must have permissions to access Amazon S3. IAM is a web
service for securely controlling access to AWS services. One best practice for managing
IAM permissions is to create groups of users with a set of permissions. These permissions
6
Section 5: Usability
Amazon S3's usability is reflected in its intuitive web interface, facilitating easy navigation,
bucket management, and access control configuration. User access management ensures a
secure and efficient experience, making it accessible for users to manage and retrieve
data seamlessly.
In UML, use-case diagrams model the behavior of a system and help to capture the
requirements of the system. Use-case diagrams describe the high-level functions and
scope of a system. These diagrams also identify the interactions between the system and
its actors.
4. SYSTEM IMPLEMENTATION
4.1 Procedure
In this task, we will review the permissions for the awsusers IAM group and add the awsuser to that
group
The policy document is in JavaScript Object Notation (JSON) format. This policy states that
users in that group are allowed to take all actions for Amazon S3 on all resources.
• Choose Cancel.
The policy document is in JSON format. This policy states that users in the group are not
9
• Choose Cancel.
In this task, you will add the awsuser to the awsusers group. You will also log out of the console
and log back in to the console with the awsuser account and password.
• From the navigation header, open the list of account actions and copy the account ID.
• To sign back in with the awsuser credentials, choose Sign in to the Console.
• Select IAM user and then use the following information to sign in:
Note: Remove the dashes from the account number before you enter it.
o Password: myP@ssW0rd
• Enter a bucket name with three or more characters. Uppercase characters are not allowed.
Note: S3 bucket names must be unique across all buckets in Amazon S3. If you get a conflict with
Note: Write down the bucket name because it will be used in future steps.
In this task, you will upload an object to the S3 bucket that you created. First, you must get the file.
• Choose Upload.
In this task, you will query the object that you uploaded to verify that it was uploaded successfully.
• Review the file properties for the file that you uploaded.
Note: You should get a message stating that versioning is not enabled for the bucket. This
behavior is expected.
• You should see the first few records from the file.
• Replace the previous query by deleting it and then paste the query you copied.
• In the Result pane, you should get the total number of records, which is 5.
In this task, you will change the encryption setting and storage class for the lab1.csv file.
• In the Amazon S3 breadcrumbs, choose the bucket name for your bucket.
You receive a confirmation that you successfully edited the storage class.
In this task, you will upload a file that is compressed as a .gzip file. First, you must get the file and
• In the Amazon S3 console, choose your bucket from the breadcrumbs again.
• Choose Upload.
• Choose Add files, and choose the lab1.csv.gz file that you downloaded previously.
• Choose Upload.
5. OUTPUT SCREENS
Output Screens of various functionalities in our application are shown over here
along with the description.
Fig 5.1
13
Buckets and objects are the basic building blocks for Amazon S3. You create buckets and
add objects to the buckets. Objects in Amazon S3 can be up to 5 TB. You can set individual
object properties—such as encryption at rest and storage class type—in the Amazon S3
console. Amazon S3 supports two kinds of encryption: Advanced Encryption Standard
(AES)-256, and AWS Key Management Service (AWS KMS).
If you select server-side encryption, each object has a unique key. The keys are also
encrypted with a master key that AWS rotates regularly. If you choose to use AWS KMS,
your objects will also be encrypted with unique keys, but you will manage those keys
yourself.
When you uploaded the lab1.csv file, you accepted the default storage class, which is
Standard. Amazon S3 provides six different storage classes, each with different properties
and cost structures.
Fig 5.2
14
Fig 5.3
Fig 5.4
15
Fig 5.5
Fig 5.6
16
Fig 5.7
Fig 5.8
17
Fig 5.9
18
6. INTERNSHIP FEEDBACK
It was a good experience performing all the lab activities and also refering the keen power
point presentations provided. Also it was a new experience for us to enhance your skills by
using all theapplications provided in the internship. we have got hands-on experience to use
each and every tool in AWS platform by performing various lab activities . The guided labs
were the building blocks which are to be learnt to perform the challenging labs which
were really challenging and compact.
19
CONCLUSION
In conclusion, employing AWS data analytics with data stored in Amazon S3, coupled with
Identity and Access Management (IAM), establishes a robust and secure foundation for
scalable and efficient data processing. Amazon S3 serves as a highly durable and scalable
storage solution, accommodating diverse data types and volumes. IAM ensures secure
access controls, allowing fine-grained permissions to regulate who can interact with the
data. This integrated approach facilitates seamless data analytics workflows, from ingestion
to transformation and analysis. The combination of these AWS services enables
organizations to harness the power of their data, ensuring reliability, scalability, and
stringent security measures throughout the entire data lifecycle.
FUTURE SCOPE
The future scope of AWS data analytics in storing data using Amazon S3 and IAM (Identity
and Access Management) is poised for continued growth and innovation. As organizations
increasingly prioritize data-driven decision-making, the demand for scalable and secure data
storage solutions coupled with robust analytics capabilities is set to surge. AWS, with its
comprehensive suite of services, including Amazon S3 for durable and scalable storage, and
IAM for fine-grained access control, positions itself at the forefront of this evolution. Future
developments may see enhanced integration with machine learning and AI services,
enabling more sophisticated analytics. Additionally, advancements in real-time analytics,
data governance, and compliance features within the AWS ecosystem are likely, offering
organizations powerful tools to derive actionable insights from their data while ensuring
security and compliance standards are met. The collaborative nature of AWS services is
expected to foster an ecosystem where seamless interactions between storage, access
control, and analytics components drive continuous innovation in data analytics solutions.
20
BIBLIOGRAPHY
[1] https://awsacademy.instructure.com,
[2] Grady Booch, James Rumbaugh, Ivar Jacobson. The Unified Modeling Language
UserGuide. Addison-Wesley, Reading, Mass., 1999.
[3] https://docs.aws.amazon.com/s3/?id=docs_gateway#lang/en_us
[4] https://medium.com/aws-lambda-serverless-developer-guide-with-hands/amazon-s3-main-
features-buckets-and-objects-use-cases-and-how-it-works-b2689024e1b6
[5] www.w3schools.com
[6] www.wikipedia.org
21
APPENDIX A: ABSTRACT
Batch No:B18
Title
Roll No Name
ABSTRACT
Storing data in Amazon Redshift is of paramount importance in the field of data analytics, serving as a
foundational solution for secure, scalable, and reliable data storage. Amazon S3's ability to handle diverse
datasets, from raw to processed, makes it an ideal choice for analytics workflows, ensuring seamless
scalability as data volumes grow. The durability and availability of Amazon S3 contribute to the
reliability of analytics processes, while robust security features such as access controls and encryption
safeguard sensitive data. The integration capabilities with various analytics tools streamline workflows,
allowing analysts to efficiently access and analyze data. Overall, Amazon S3 plays a central role in
empowering organizations to derive meaningful insights from their data while maintaining the integrity,
security, and scalability required for effective data analytics.
22
Batch No:B18
Title
Roll No Name
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3
M H H H H L M H M H H H H H M
23
Batch No:B18
Title
Roll No Name
Table 2: Nature of the Project/Internship work (Please tick √ Appropriate for your
project)
Nature of project
Others
Batch No. Title Product Application Research (Please
specify)
VISUALIZATION
B18 AND ANALYSIS OF
INDIA’S GDP USING √
AMAZON REDSHIFT
24
VISUALIZATION
AND ANALYSIS
B18 √
OF INDIA’S GDP
USING AMAZON
REDSHIFT
25