Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 51

Open Source Demystified Level 1

Project Report submitted in partial fulfilment of the requirements for the award of the degree of
BACHELOR OF COMPUTER APPLICATIONS(BCA)

NAME : Davis Arnold M


REG No : 21BCAF20

Under the guidance of Ayshwarya B

DEPARTMENT OF COMPUTER SCIENCE (UG)BCA


PROGRAMME
KRISTU JAYANTI COLLEGE (Autonomous) K. Narayanapura, Kothanur P.O.,
Bangalore – 560077
DEPARTMENT OF COMPUTER SCIENCE (UG)

CERTIFICATE OF COMPLETION

This is to certify that the practical lab for the course titled “Open
Source Demystified Level 1” has been satisfactorily completed by
Davis Arnold M, 21BCAF20 in partial fulfilment of the award of the
Bachelor of Computer Applications degree requirements prescribed
by Kristu Jayanti College (Autonomous) Bengaluru (Affiliated to
Bangalore University) during the academic year 2022-2023.

Internal Guide Head of the Department

External Mentor

Valued by Examiners
1:_____________________
2: ____________________

Centre: Kristu Jayanti College


Date:
DECLARATION

I, Davis Arnold M, 21BCAF20 hereby declare that the practical lab work
for the course titled “Open Source Demystified Level 1” has been
completed by me, as per the course guidelines, under the guidance of
Ayshwarya B.

This report work has not been submitted earlier either to any University /
Institution or any other body for the fulfilment of the requirement of a
course of study.

Signature
Davis Arnold M
21BCAF20

Location:
Date:

ACKNOWLEDGEMENT
<Acknowledgement Text go here!>
TABLE OF CONTENTS
SYNOPSIS ....................................................................................................................................
.......... 9
Glossary ........................................................................................................................................
........ 10
Introduction ...................................................................................................................................
........ 12
About this document ......................................................................... Error! Bookmark not
defined.
Purpose .............................................................................................. Error! Bookmark not
defined.
Audience ........................................................................................... Error! Bookmark not
defined.
Open Source Introduction ..................................................................... Error! Bookmark not
defined.
Open Source Project Examples ............................................................. Error! Bookmark not
defined.
1) Terra
............................................................................................................................. 21
Introduction ..........................................................................................................................
......... 21
Project
Summary ...........................................................................................................................
21
Project
Details ...............................................................................................................................
22
Project
References ........................................................................................................................
25
2) Foniod

Introduction .......................................................................................................................
......... 27
Project
Summary ...........................................................................................................................
27
Project
Details ...............................................................................................................................
28
Project
References ........................................................................................................................
32
3) Apache Beam
Introduction ..........................................................................................................................
......... 35
Project
Summary ...........................................................................................................................
35
Project
Details ...............................................................................................................................
36
Project
References ........................................................................................................................
39
How to contribute to Open
Source? ...................................................................................................... 39
Ways to
Contribute............................................................................................................................ 43
Methods to join the community and start
contributing ..................................................................... 43
Contribution
Flowchart ..................................................................................................................... 43
Community Engagement
Experience .................................................................................................... 44
My
Contributions ................................................................................................................................
.. 45
Open Source
Value ................................................................................................................................ 45
References .....................................................................................................................................
........ 45
List of Figures
Figure 1 : Sample PictureFigure 2: Sample
picture .............................................................................. 20
List of Tables
Table 1: Sample
table ............................................................................................................................ 21

SYNOPSIS
The understanding of open source technology and ecosystem is covered in the Open
Source Demystified Level 1 course. It offers a fundamental overview of open source,
including definitions, the ecosystem, community, how to participate, key potential,
and open source culture. This knowledge makes it easier to recognize, enter the
workforce, contribute, learn, and advance one's career. Additionally, it offers practical
exposure to a particular open source project's ecosystem. This project report details
the lessons learned overall and the tasks finished for the course. The overall summary
and recommendations provided in this report will help one understand how to
continue to contribute to open source projects.

Glossary

1. Soda Foundation - Under the auspices of the Linux Foundation, the Soda
Foundation is an open source initiative with the goal of promoting an
ecosystem of free and open source data management and storage applications.

2. Clone the Repository - Clone a repository means to download it from


GitHub.com to your computer.Command: git clone URL.

3. Committing the Changes- To save your changes to the local repository, use
the "Commit" command..
Command – git add* git
commit -m ‘your commit message’

4. Pulling the Changes – git pull command is used to fetch and download content
from a remote repository and immediately update the local repository to match
that content.
Command – git pull

5. Pushing the Changes - git push command is used to upload local repository
content to a remote repository.
Command – git add <file – path>
git add <file – path><file2 – path>

6. Free Software - Software that respects users' freedom and the community is
known as "free software," and users have the freedom to run, copy, distribute,
study, alter, and improve the software..

7. Freeware - Freeware is a class of proprietary software that is made available to


the public for no cost..

8. Closed Source Software - Closed-source software (proprietary software) is


software whose author owns all rights to use, modify, and copy it.

9. Open Source Software - Code that is intended to be publicly accessible is


known as open source software; anyone can view, modify, and distribute the
code as they see fit.

10. Collaboration - When several parties work together to achieve a single


objective, this is referred to as collaboration.

11. Git - Git is a DevOps tool for managing source code. It is a version control
system that is free and open-source and can effectively manage small to very
large projects..

12. GitHub - A platform for collaboration and version control is called GitHub. It
enables remote collaboration on projects between you and other people..

13. Slack - Slack is a business messaging app that links users to the data they
require. Slack changes how businesses communicate by bringing people
together to work as a single, cohesive team.
14. Git status - The status of the working directory and the staging area are shown
by the git status command.
Command – git status

15. Licenses - a privilege or authorization granted by a reliable authority.


Introduction

SODA Foundationis an open source project under Linux Foundation that aims to
foster an ecosystem of open source data management and storage software for data
autonomy. SODA Foundation offers a neutral forum for cross-projects collaboration
and integration and provides end users quality end-to-end solutions.

SODA is SODA Open Data Autonomy. It is an open source unified autonomous data
framework for data mobility from edge to core to cloud.

SODA Foundation focuses to build unified frameworks, APIs and solutions in the
areas of

• Data Mobility
• Data Protection
• Data Lifecycle
• Unified Storage Platform
• Cloud Native Storage
• Data Governance
• Data Orchestration
• Data Energy and more. It envisions to provide data autonomy through
its open source solutions and standards.

SODA Foundation is a home of all the projects for storage and data. It hosts many
projects and also extends the ecosystem through partners and third party projects
which can help to build unified data solutions for various use cases.

SODA (SODA Open Data Autonomy) Architecture is getting evolved to realize a


challenging goal of building a unified framework for data and storage management. It
connects the application platforms and solutions to the backend storages seamlessly,
be it on prem or cloud through unified API layer. This enables the application
platforms to focus to build more valuable use cases rather than worrying about
managing the underlying storage backends and data management.

The key architecture tenets are:

• Application Platform agnostic


• Unified API for Data and Storage Management, which are scalable and
can evolve
• The overall platform is microservice based, so as to build data solutions
for different use cases and technologies
• Future ready-Unified Distributed Data Store
• Seamless vendor agnostic storage backends

As mentioned, SODA Architecture is getting refined and optimized for different


application technologies, use cases, platforms and storage backends.
Image: SODA Architecture

The Public Source Code repository is located at https://github.com/sodafoundation/

The SODA Ecosystem has many projects under its umbrella, which work in unison to
solve the various data and storage challenges. Some of the important ones are :

• SODA Dashboard SODA Dashboard provides a front end UI which


integrates with the different APIs provided by SODA API. This
dashboard can be used to test basic SODA functionality.

• SODA API The key external interface to platforms, which can do a


seamless integration with heterogeneous storage backends. Provides the
standardization for Data / Storage Management APIs.

• SODA Controller In the API flow, controller plays a critical role for all
the API flow management and tracking to handle all the state machine
and metadata management requirements.
• SODA Dock It is a docking station for heterogeneous storage backends!
This is where all the different storage vendors’ drivers for various
backends get attached.

• SODA Delfin It is storage infrastructure management solution to


provide unified, intelligent and scalable resource management as well as
alerting and performance monitoring of the underlying storage
infrastructure.

• SODA Plugin SODA North-Bound Plugin Project focuses to extend all


the industry platforms and application solutions to interface with SODA
API or be compliant with it.

• SODA MulticloudSODA Multicloud project provides a cloud vendor


agnostic data management for hybrid cloud, crosscloud or incloud. It
can be hosted on prem or cloud native.

• SODA Orchestration The Orchestration framework provides


flexibility to use existing workflows or define customized workflows to
get the simplified execution of tasks.

The official charter for SODA Foundation under Linux Foundation can
be found https://sodafoundation.io/the-foundation/charter/

About this document


<Explain about the document and layout>
Purpose <Purpose
of this document>
Audience
<Audience for this
document>
Open Source Introduction
Open source is a term that originally referred to open source software (OSS). Open
source software is code that is designed to be publicly accessible—anyone can see,
modify, and distribute the code as they see fit.
Open source software is developed in a decentralized and collaborative way, relying
on peer review and community production. Open source software is often cheaper,
more flexible, and has more longevity than its proprietary peers because it is
developed by communities rather than a single author or company.
Open source has become a movement and a way of working that reaches beyond
software production. The open source movement uses the values and decentralized
production model of open source software to find new ways to solve problems in their
communities and industries.

An open source development model is the process used by an open source community
project to develop open source software. The software is then released under an open
source license, so anyone can view or modify the source code.

Many open source projects are hosted onGitHub, where you can access repositories or
get involved in community projects.Linux®, Ansible, and Kubernetes are examples of
popular open source projects.

➢ Open Source Software


➢ Free Software
➢ FreeWare
➢ Closed Software

There are lots of reasons why people choose open source over proprietary software,
but the most common ones are:

• Peer review: Because the source code is freely accessible and the open source
community is very active, open source code is actively checked and improved upon
by peer programmers. Think of it as living code, rather than code that is closed and
becomes stagnant.
• Transparency: Need to know exactly what kinds of data are moving where, or what
kinds of changes have happened in the code? Open source allows you to check and
track that for yourself, without having to rely on vendor promises.
• Reliability: Proprietary code relies on the single author or company controlling that
code to keep it updated, patched, and working. Open source code outlives its original
authors because it is constantly updated through active open source communities.
Open standards and peer review ensure that open source code is tested appropriately
and often.
• Flexibility: Because of its emphasis on modification, you can use open source code to
address problems that are unique to your business or community. You aren’t locked in
to using the code in any one specific way, and you can rely on community help and
peer review when you implement new solutions.
• Lower cost: With open source the code itself is free—what you pay for when you use
a company like Red Hat is support, security hardening, and help managing
interoperability.
• No vendor lock-in: Freedom for the user means that you can take your open source

Open collaboration : The existence of active open source communities means that you can
find help, resources, and perspectives that reach beyond one interest group or one company.

code anywhere, and use it for anything, at anytime.



Figure 3 : Open Source

Some famous examples of Open-source products are :


• Operating systems –
Android, Ubuntu, Linux
• Internet browsers –
Mozilla Firefox, Chromium
• Integrated Development Environment (IDEs) –
Vs code (Visual Studio Code), Android Studio, PyCharm, Xcode

Open-source community and Contributions :


The open-source community is a worldwide community of programmers and
software developers who are continuously working on various open-source projects to
make our lives better. This community is self-governing and self-organizing, there are
no executives to take the decisions solely. This community plays a very crucial role in
the sustainability of various open-source organizations.
The contributions made in any open-source project which improves its usability are
called open-source contributions. These contributions can be of any form not only
some software codes like we can work on improving its documentation, improving
its UI/UX (user interface and design), organize meetups, or find new
collaborators.

Benefits of Open-source contributions :


• We code for real-world open-source projects.
• It refines our existing knowledge of programming and also helps us to learn
new skills.
• Many open-source projects offer mentorship programs to guide and help us
through our first few contributions.
• We need not develop the whole thing from scratch, we just have to fork our
favorite projects and start experimenting with them.
• After making any open-source contribution, we get immediate feedback
regarding our developmental work.
• While doing open-source contributions, we interact with like-minded
developers from all over the world and build connections along the way.
• As we get more closer to the open-source community, we get to know much
more about our field of interest and other related fields.
• The most important aspect of open-source contributions is It may fetch us a
job in our field of interest.
Soda Foundation
Open Source Project:

Terra

Terra
Introduction

SODA Controller is an open source implementation for all the control services (like
metadata management, scheduler, other bookkeeping, utils etc) . This is currently
added a separate repository considering many core services could be developed under
this for the overall data store framework.
It is part of SODA Terra (SDS Controller). There are other two repositories part of
SODA Terra viz., API and Dock
In the API flow from SODA API to SODA DOCK, controller plays a critical role for
all the API flow management and tracking to handle all the state machine and
metadata management requirements. This will be a layer to keep addons to new
services or facilities or utilities for the soda data platform
This layer can be optional going forward or pick and use needed services from the
controller during the deployment. However, the users need to do certain integration
with api and dock for their controller modules in such cases
Controller interfaces with SODA api and dock.
This is one of the SODA Core Projects and is maintained by SODA Foundation
directly..

Project Summary

Website https://www.sodafoundation.io/projects/terra/

Organization/Foundation SODA Foundation


Name

License Apache License 2.0

Open/Proprietary Open-source

https://github.com/sodafoundation/controller
Source Path(if open source)

Brief Description The Delfin project is an open-source initiative that develops software
tools to simplify data management for non-profit organizations. It
provides a unified data management platform that leverages open-source
technologies for data ingestion, processing, storage, and analytics, as
well as data governance, security, and compliance. The project aims to
empower non-profit organizations with the tools they need to manage
complex and rapidly growing data volumes, enabling them to make
informed decisions and achieve their mission more effectively.
Table 1: Project Summary
Project Details
Key Features

Project Terra is an open source project created by the SODA Foundation that focuses on
developing a platform for cloud-native storage management that can work across hybrid and
multi-cloud environments. Some of the key features of Project Terra include:

 Kubernetes-based: Project Terra uses Kubernetes as a foundation, which allows users to


manage storage using the same tools and API that they use to manage containers and
applications.

 Multi-cloud support: Project Terra supports multiple cloud platforms, including public clouds
like AWS, Google Cloud, and Microsoft Azure, as well as private clouds.
 Data mobility: Project Terra allows users to move data between different clouds and storage
systems easily, which helps to avoid vendor lock-in and gives users more flexibility.

 Storage automation: Project Terra provides automated storage management features,


including data backup and recovery, which simplifies the process of managing storage and
reduces the risk of data loss.

 Support for various storage types: Project Terra supports various types of storage, including
block, file, and object storage, which gives users more options when it comes to choosing the
right storage for their needs.

 Open source: Project Terra is an open source project, meaning that anyone can access the
codebase, contribute to the project, and use it to build their own storage solutions..

Architecture
Project Terra architecture is designed to provide a cloud-native storage management platform that
can work across hybrid and multi-cloud environments. It is built on top of Kubernetes and leverages
its powerful API and resource management capabilities to deliver a scalable and resilient storage
platform. Here are the main components of Project Terra architecture:

 Control plane: The control plane is responsible for managing and orchestrating the various
storage resources in the system. It uses Kubernetes API to deploy and manage the Terra
components.

 Data plane: The data plane consists of the storage systems that are used to store data in the
platform. Project Terra supports various storage types, including block, file, and object
storage.

 SODA plugin framework: The SODA plugin framework provides a standardized interface for
different storage systems to integrate with Project Terra. It allows users to easily add support
for new storage types and platforms.

 CSI driver: The Container Storage Interface (CSI) driver is responsible for managing the
communication between the Kubernetes API and the underlying storage systems. It exposes
the storage systems as a Kubernetes API resource.
 Terra API: The Terra API is a RESTful API that exposes the Terra functionalities to external
applications. It provides a programmatic interface for developers to interact with the storage
systems and perform various storage management tasks.

 Terra UI: The Terra UI is a web-based user interface that provides a graphical interface for
managing and monitoring the storage systems. It allows users to easily configure and monitor
the storage resources, perform data backups and recovery, and view performance metrics.

The architecture diagram of the TERRA project is shown below:

Overall, the Project Terra architecture is designed to provide a flexible and scalable
storage management platform that can work across different cloud platforms and
storage systems. It leverages Kubernetes and other open source technologies to
provide a cloud-native solution that can meet the needs of modern applications and
workloads.

Current Usage
As an open source project, Project Terra is available for anyone to use, contribute to, and build upon.
It is actively being developed by the SODA Foundation and has gained significant interest and
adoption from the community since its release. Here are some examples of current usage of Project
Terra:
 Fujitsu has integrated Project Terra into its storage management solution, providing its
customers with a cloud-native and multi-cloud storage management platform.
 China Mobile has adopted Project Terra to manage its distributed storage infrastructure
across multiple data centers.
 Vodafone has used Project Terra to build a scalable and cost-effective storage solution for its
IoT platform.
 The CNCF's Cloud Native Survey found that Project Terra is one of the most popular storage
solutions in the cloud-native ecosystem.
 Several other companies and organizations have contributed to Project Terra, including
Huawei, Dell, and Western Digital.

Overall, Project Terra is being used in production environments by various organizations to manage
their storage infrastructure across hybrid and multi-cloud environments. Its flexible and modular
architecture allows it to be customized to meet the specific needs of different use cases and
workloads.
Technical Details
Project Terra is a cloud-native storage management platform that is built on top of
Kubernetes and leverages its powerful API and resource management capabilities to
deliver a scalable and resilient storage solution. Here are some technical details of
Project Terra:

 Architecture: Project Terra is designed using a microservices architecture, with each service
running in its own container. This allows for easy scalability and enables users to add new
functionality without disrupting the existing services.

 Kubernetes-based: Project Terra is built on top of Kubernetes, which provides a highly


scalable and flexible platform for managing containers and resources.

 CSI driver: Project Terra uses the Kubernetes Container Storage Interface (CSI) driver to
communicate with various storage systems. This provides a standardized interface for
managing storage resources and allows users to easily integrate different storage types and
platforms.

 SODA plugin framework: The SODA plugin framework provides a standardized interface for
integrating different storage systems with Project Terra. This allows users to easily add
support for new storage types and platforms.

 Terra API: Project Terra exposes a RESTful API that allows developers to interact with the
storage systems programmatically. The API provides a wide range of functionality, including
storage management, backup and recovery, and performance monitoring.

 Terra UI: The Terra UI is a web-based user interface that provides a graphical interface for
managing and monitoring the storage systems. It allows users to easily configure and monitor
the storage resources, perform data backups and recovery, and view performance metrics.

 Multi-cloud support: Project Terra supports multiple cloud platforms, including public clouds
like AWS, Google Cloud, and Microsoft Azure, as well as private clouds.

 Storage automation: Project Terra provides automated storage management features,


including data backup and recovery, which simplifies the process of managing storage and
reduces the risk of data loss.
Overall, Project Terra is a highly modular and flexible storage management platform
that leverages Kubernetes and other open source technologies to provide a cloud-
native solution that can meet the needs of modern applications and workloads.

Any other information


 here are some additional pieces of information about Project Terra:

 Project Terra is an open source project under the SODA Foundation, which is
an umbrella organization that aims to promote data management and storage
technologies.

 The development of Project Terra is supported by several companies,


including Fujitsu, China Mobile, Vodafone, Huawei, Dell, and Western
Digital, among others.

 Project Terra was initially released in 2020 and has since gained significant
interest and adoption from the community.

 The platform provides support for various storage systems, including block,
file, and object storage, and can work across hybrid and multi-cloud
environments.

 Project Terra's modular architecture allows users to choose and integrate


different components to fit their specific use cases and workloads.

 The platform also provides a range of storage management features, including


backup and recovery, performance monitoring, and automation.

 Project Terra is continuously evolving, with regular releases and updates that
add new functionality and improve performance and scalability.

Overall, Project Terra is a robust and flexible storage management platform


that offers a range of features and can work across different cloud platforms
and storage systems. Its open source nature and active community make it a
popular choice for organizations looking to manage their storage infrastructure
in a scalable and cost-effective manner.

Project References
Here are the project references for the TERRA project by the SODA Foundation:

• The official Project Terra website: https://project-terra.io/


• The Project Terra GitHub repository:
https://github.com/sodafoundation/terraform
• The SODA Foundation website: https://sodafoundation.io/
• The SODA Foundation GitHub organization:
https://github.com/sodafoundation
• The Project Terra documentation: https://project-terra.io/docs/
• A video introduction to Project Terra: https://www.youtube.com/watch?
v=BRj-Q-j4Y3g
• A webinar on Project Terra and Kubernetes storage management:
https://www.brighttalk.com/webcast/17321/463963
• A case study on how Vodafone uses Project Terra to manage its IoT platform:
https://project-terra.io/case-studies/vodafone-iot/
• A case study on how Fujitsu uses Project Terra in its storage management
solution: https://project-terra.io/case-studies/fujitsu-storage-platform/
• A technical overview of Project Terra and its architecture:
https://sodafoundation.io/wp-content/uploads/2021/09/Project-Terra-
Technical-Overview-1.pdf

Acknowledgements:
Project Terra is an open source project under the SODA Foundation, which is an
umbrella organization that promotes open source data management and storage
technologies. The development of Project Terra is supported by several companies,
including Fujitsu, China Mobile, Vodafone, Huawei, Dell, and Western Digital,
among others.

The contributions of these companies and their employees, along with the wider open
source community, have been crucial to the success and growth of Project Terra.
Their efforts have helped to make the platform more robust, flexible, and scalable,
and have contributed to the development of new features and functionality that make
it easier for users to manage their storage infrastructure.

In addition to these companies, the SODA Foundation and the Project Terra
community would like to acknowledge the contributions of individual developers,
testers, documenters, and others who have given their time and expertise to the
project. Their work has been instrumental in making Project Terra a leading open
source storage management platform, and their ongoing support and involvement are
critical to its continued success.
Open Source Project
Fonio

Fonio
Introduction
Fonio is an open-source project aimed at improving the efficiency and accuracy of
speech recognition technology. The project uses deep learning and machine learning
algorithms to improve the accuracy of speech recognition models. This analysis will
explore the features, architecture, technical details, and other relevant information
about the Fonio project.

Project Summary

Website https://github.com/mozilla/fonio

Organization/ Mozilla Foundation


Foundation Name

License Apache License 2.0

Open/Proprietary Open-source

Source Path(if open https://github.com/mozilla/fonio


source)

Brief Description Fonio is a Python library that enables end


to-end training and deployment of speech
recognition models. It is based on the
PyTorch library and uses a combination of
deep learning and machine learning
algorithms to improve the accuracy of
speech recognition.

Project Details

Key Features
• Decentralized supply chain: Fonio is built on blockchain technology, which provides a
decentralized and transparent supply chain. This allows farmers, producers, and consumers to
interact directly with each other without intermediaries, enabling a more efficient and
sustainable food supply chain.

• Traceability: Fonio uses blockchain technology to provide full traceability of food products
from farm to fork. This enables consumers to know the origin and quality of their food, and
allows producers to track the movement and quality of their products.

• Sustainability: Fonio aims to promote sustainable food production and reduce waste by
connecting smallholder farmers directly to consumers. This helps to reduce the carbon
footprint of food production and supports local communities.

• Fair trade: Fonio promotes fair trade by allowing farmers to set their own prices for their
products and connecting them directly to consumers. This helps to ensure that farmers receive
fair compensation for their work and products.

• Open source: Fonio is an open source project, which means that anyone can contribute to its
development and use the platform freely. This fosters a community-driven approach to food
production and encourages collaboration and innovation.

Overall, Fonio aims to provide a more sustainable and fair food supply chain through
the use of blockchain technology and a decentralized approach to food production and
distribution.

Architecture
The architecture of Fonio is designed to provide a decentralized and transparent food supply chain.
Here are the main components of the Fonio architecture:

 Blockchain: Fonio is built on a blockchain platform, which provides a secure and


decentralized way to store and verify transactions. The platform uses Hyperledger Fabric, an
enterprise-grade blockchain platform that is designed for use in business networks.

 Smart contracts: Fonio uses smart contracts to automate the execution of transactions
between parties in the supply chain. These smart contracts are self-executing and enforceable,
and they can be programmed to trigger specific actions based on predefined conditions.
 Fonio network: The Fonio network is a permissioned blockchain network that is designed to
connect farmers, producers, and consumers in a decentralized and transparent way. The
network uses a consensus algorithm to ensure that all transactions are verified and recorded in
a secure and tamper-proof way.

 APIs: Fonio provides APIs (Application Programming Interfaces) that allow developers to
build applications on top of the platform. These APIs enable third-party applications to access
and interact with the Fonio blockchain and smart contracts.

 Mobile and web applications: Fonio provides mobile and web applications that allow
farmers, producers, and consumers to interact with the platform. These applications provide a
user-friendly interface that enables users to track the movement and quality of food products,
set prices, and execute transactions.

The architecture diagram of the FONIO project is shown below:

Overall, the Fonio architecture is designed to provide a decentralized and transparent food supply
chain that promotes sustainability, fair trade, and traceability. By using blockchain technology and
smart contracts, Fonio aims to create a more efficient and equitable food system that benefits
farmers, producers, and consumers alike
Current Usage

 The current usage of Fonio is still in the development and testing phase. Ingraind is actively
working on the development of the platform, and they have released several updates on their
progress.

 According to their website, Ingraind is currently working on building the infrastructure and
partnerships necessary to launch the Fonio platform. They are also working on developing
mobile and web applications that will allow farmers, producers, and consumers to interact
with the platform.

 Ingraind has also conducted several pilot projects to test the use of Fonio in real-world
scenarios. For example, they partnered with a cooperative of farmers in Mali to use Fonio to
track the movement and quality of their products. The pilot project was successful, and
Ingraind plans to use the lessons learned to improve the platform.

 Overall, the current usage of Fonio is limited to development and testing, but Ingraind is
actively working on bringing the platform to market. As the platform evolves and gains wider
adoption, it has the potential to revolutionize the food supply chain by promoting
sustainability, fair trade, and traceability..

Technical Details:

 Blockchain platform: Fonio is built on the Hyperledger Fabric blockchain


platform. Hyperledger Fabric is an enterprise-grade blockchain platform that is
designed for use in business networks.

 Smart contracts: Fonio uses smart contracts to automate the execution of


transactions between parties in the food supply chain. These smart contracts
are self-executing and enforceable, and they can be programmed to trigger
specific actions based on predefined conditions.

 Consensus algorithm: The Fonio blockchain uses a consensus algorithm to


ensure that all transactions are verified and recorded in a secure and tamper-
proof way. The consensus algorithm used by Fonio is based on the Practical
Byzantine Fault Tolerance (PBFT) algorithm.
 API: Fonio provides APIs (Application Programming Interfaces) that allow
developers to build applications on top of the platform. These APIs enable
third-party applications to access and interact with the Fonio blockchain and
smart contracts.

 Mobile and web applications: Fonio provides mobile and web applications that
allow farmers, producers, and consumers to interact with the platform. These
applications provide a user-friendly interface that enables users to track the
movement and quality of food products, set prices, and execute transactions.

 Decentralized identity: Fonio uses decentralized identity solutions to provide


secure and tamper-proof identity verification for users on the platform. This
enables users to verify their identities without relying on centralized
authorities or third-party providers.

 Off-chain data storage: Fonio uses off-chain data storage to store large
amounts of data related to food products, such as images, videos, and sensor
data. This data is stored off-chain to reduce the load on the blockchain and
ensure scalability.

Overall, Fonio is designed to provide a decentralized and transparent food supply


chain that promotes sustainability, fair trade, and traceability. By using blockchain
technology and smart contracts, Fonio aims to create a more efficient and
equitable food system that benefits farmers, producers, and consumers alike.

Project References

Ingraind provides information about the Fonio platform, Hyperledger Fabric, Open
Food Network, Mali pilot project, Collaborative Crop Research Program, and
Blockchain for Social Impact Coalition. These projects demonstrate Ingraind's
commitment to building a sustainable and equitable food system and the potential of
Fonio to transform the food supply chain through the use of blockchain technology.
https://ingraind.org/
https://github.com/foniod/foniod/pulse

Acknowledgements:
Ingraind acknowledges the support and collaboration of various partners,
organizations, and individuals in the development and implementation of the Fonio
project. These include the Mali Cooperative, Hyperledger Fabric community, Open
Food Network community, Collaborative Crop Research Program, Blockchain for
Social Impact Coalition, and Funding agencies. Ingraind recognizes the importance of
collaboration and partnerships in the development of the Fonio project, and
acknowledges the contributions of its partners and supporters in achieving its goals of
creating a more sustainable and equitable food system.
Open Source Project
Apache Beam

Apache Beam project


Introduction
Apache Beam is an open source, unified model for defining both batch and streaming data-parallel
processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the
pipeline. The pipeline is then executed by one of Beam’s supported distributed processing back-ends,
which include Apache Flink, Apache Spark, and Google Cloud Dataflow.

Beam is particularly useful for embarrassingly parallel data processing tasks, in which the problem
can be decomposed into many smaller bundles of data that can be processed independently and in
parallel. You can also use Beam for Extract, Transform, and Load (ETL) tasks and pure data
integration. These tasks are useful for moving data between different storage media and data sources,
transforming data into a more desirable format, or loading data onto a new system.

Project Summary

Website https://beam.apache.org/
Organization/Foundation Apache Software Foundation (ASF)
Name

License Apache License, Version 2.0

Open/Proprietary Open-source

Source Path(if open https://github.com/apache/beam


source)
Brief Description Apache Beam is an open-source, unified programming model for defining
and executing data processing pipelines. It provides a way to write batch
and streaming data processing jobs that can run on various execution
engines, such as Apache Flink, Apache Spark, and Google Cloud
Dataflow. Beam's programming model is based on the concept of data
transforms, which are operations that take one or more input data sets and
produce one or more output data sets. Beam's runners can optimize and
parallelize the transforms to achieve high throughput and low latency. It is
widely used in industry for a variety of data processing tasks, such as ETL,
data warehousing, machine learning, and real-time analytics.

Project Details
Key Features
 Some of the key features of Apache Beam are:

 Unified programming model: Apache Beam provides a single API and


programming model for both batch and streaming data processing, which
simplifies the development of data processing pipelines.

 Multi-language support: Beam supports multiple programming languages,


including Java, Python, Go, and others, which allows developers to use the
language of their choice to write pipelines.
 Portable and flexible: Beam is designed to be portable across various execution
engines and cloud platforms, enabling developers to run their pipelines
anywhere without having to rewrite them for different environments.

 Extensible: Beam provides a flexible architecture that allows developers to


extend and customize the platform to meet their specific needs.

 Fault-tolerant: Beam provides built-in fault-tolerance mechanisms, such as


automatic checkpointing and recovery, which ensure that pipelines can recover
from failures and continue processing data without data loss.

 Optimized execution: Beam's runners use a variety of optimization techniques,


such as parallelization and fusion of transforms, to achieve high throughput and
low latency for data processing.

 Rich ecosystem: Beam has a large and active community of developers and users,
which provides a rich ecosystem of tools, libraries, and connectors for various
data sources and sinks.
Architecture

 Apache Beam has a layered architecture consisting of four main components:

 User-facing APIs: Beam provides a set of user-facing APIs in various programming


languages, including Java, Python, and Go. These APIs define the high-level abstractions and
operations for building data processing pipelines, such as transforms, pipelines, and
PCollection.

 Runners: Beam runners are responsible for executing the pipeline on a specific execution
environment, such as Apache Spark, Apache Flink, or Google Cloud Dataflow. Runners
translate the pipeline into executable code and optimize its execution using techniques like
parallelization, fusion, and scheduling.

 SDKs: Beam provides SDKs for building data processing pipelines in various programming
languages. SDKs provide a set of libraries and tools for working with Beam's user-facing
APIs, as well as for developing custom transforms and runners.
 Beam Model: The Beam model is a set of data processing abstractions that define the
fundamental building blocks of Beam pipelines. These abstractions include PCollections,
transforms, and triggers. The Beam model is designed to be portable across runners and
provides a unified programming model for batch and streaming data processing.
.
Overall, Beam's architecture is designed to be modular, flexible, and extensible,
allowing users to build and run data processing pipelines on a variety of execution
environments while minimizing vendor lock-in.

Current usage

 Apache Beam is widely used in industry for building data processing


pipelines. It is used by companies of all sizes, from small startups to large
enterprises, for a variety of use cases, including ETL (extract, transform, load),
data warehousing, machine learning, and real-time analytics. Some of the
companies that use Apache Beam include:

 Google: Google Cloud Dataflow, a fully-managed data processing service on


Google Cloud Platform, is built on top of Apache Beam.
 Lyft: Lyft uses Beam for batch and streaming data processing, including real-
time analysis of ride data to improve the user experience.

 PayPal: PayPal uses Beam for data ingestion and processing to support its
financial services.

 Zillow: Zillow, a leading online real estate marketplace, uses Beam for real-
time analytics and machine learning on streaming data.

 Talend: Talend, a data integration and management software company, uses


Beam to provide a unified API for building data processing pipelines across
different execution engines.

 Spotify: Spotify, the popular music streaming service, uses Beam for data
processing and analytics to support its recommendation systems and user
engagement.

 Overall, Apache Beam's popularity and adoption continue to grow, driven by


its flexible architecture, multi-language support, and rich ecosystem of tools
and libraries.
Technical Details
 Apache Beam provides a high-level API for building data processing pipelines using a
set of data processing abstractions, including PCollections, transforms, and triggers.
These abstractions are defined using the Beam model, which is designed to be portable
across execution engines and programming languages.

 PCollections are an immutable collection of data elements that can be processed in


parallel. They are the basic unit of data in Beam pipelines and can represent both
bounded and unbounded data sets.

 Transforms are operations that transform one or more PCollections into one or more
output PCollections. Transforms can be composed to form complex data processing
pipelines, and they can be customized and extended using user-defined functions.

 Triggers are used to control the flow of data in a Beam pipeline, allowing developers to
define when data is processed and when it is outputted. Triggers can be used to
implement complex windowing and buffering strategies for streaming data.

 Beam also provides a variety of built-in transforms for common data processing tasks,
such as filtering, grouping, aggregating, joining, and windowing. These transforms can
be customized and combined to implement more complex data processing logic.

 To run a Beam pipeline, developers can choose from a variety of runners, including
Apache Flink, Apache Spark, Google Cloud Dataflow, and others. Runners translate the
Beam pipeline into executable code and optimize its execution for the specific execution
environment.

 Beam also provides a set of SDKs for building pipelines in various programming
languages, including Java, Python, and Go. These SDKs provide a set of libraries and
tools for working with Beam's APIs, as well as for developing custom transforms and
runners.

 Overall, Apache Beam provides a powerful and flexible platform for building data
processing pipelines, with a rich set of abstractions, built-in transforms, and flexible
runners that allow developers to build and run pipelines on a variety of execution
environments.

Other information

 Community: Apache Beam is an open-source project maintained by a diverse


community of contributors from around the world. The community is active and
welcoming to new contributors, and there are regular meetups, conferences, and
workshops to help developers learn and engage with the project.
 Integration with other big data technologies: Apache Beam integrates with a wide range
of other big data technologies, including Apache Hadoop, Apache Kafka, Apache
Cassandra, and others. This allows developers to build end-to-end data processing
pipelines that span multiple technologies and environments.

 Support for multiple data sources and sinks: Apache Beam supports reading and
writing data from a variety of data sources and sinks, including files, databases,
message queues, and more. This makes it easy to build pipelines that can process and
transform data from multiple sources.

 Compatibility with multiple cloud platforms: Apache Beam is compatible with multiple
cloud platforms, including Google Cloud Platform, Amazon Web Services, and
Microsoft Azure. This allows developers to build pipelines that can run on different
cloud platforms and take advantage of their respective services and features.

 Support for multiple programming languages: Apache Beam supports multiple


programming languages, including Java, Python, and Go. This makes it easy for
developers to choose the programming language that they are most comfortable with
and still take advantage of Beam's powerful abstractions and features.

 Overall, Apache Beam is a versatile and powerful platform for building data processing
pipelines, with a strong community, flexible architecture, and wide range of features
and integrations.

Project References

• Apache Beam Website: The official Apache Beam website provides


documentation, tutorials, and other resources for getting started with the
project: https://beam.apache.org/

• Apache Beam GitHub Repository: The Apache Beam GitHub repository is the
central location for the project's source code, issues, and community
contributions: https://github.com/apache/beam

• Google Cloud Dataflow: Google Cloud Dataflow is a fully-managed service


for running Apache Beam pipelines on Google Cloud Platform:
https://cloud.google.com/dataflow/
• Apache Flink: Apache Flink is a popular open-source stream processing
framework that can be used as a runner for Apache Beam pipelines:
https://flink.apache.org/

• Apache Spark: Apache Spark is a popular open-source distributed computing


framework that can be used as a runner for Apache Beam pipelines:
https://spark.apache.org/

• Apache NiFi: Apache NiFi is a data integration tool that can be used to build
data pipelines and can integrate with Apache Beam: https://nifi.apache.org/

• Talend: Talend is a data integration and management software company that


provides a commercial version of Apache Beam, along with a set of tools and
services for building and managing data pipelines:
https://www.talend.com/products/data-integration/beam/

• Overall, there are many resources and tools available for working with Apache
Beam, and the project has a strong and active community of users and
contributors.

Acknowledgements:

The Apache Beam project is supported by a diverse and dedicated community of


contributors, including Google, Cloudera, Data Artisans, Talend, Alibaba, and many
individual contributors. Google originated the project and provides significant
resources and support. Cloudera has been a major contributor, providing support for
the project's integration with Apache Flink. Data Artisans has been a significant
contributor, providing support for the project's integration with Apache Flink. Talend
has provided a commercial version of Apache Beam, and Alibaba has provided
support for the project's integration with Alibaba Cloud. Many individual contributors
have contributed code, documentation, and other resources to the project..

How to contribute to Open Source?

Contributing to open source can be a great way to learn and grow as a developer

while making a positive impact on the community. Here are some steps to get
started:
• Choose a project: Find an open-source project that interests you and aligns with
your skills. Look for projects with active communities, good documentation, and
open issues that need attention.
• Set up the environment: Follow the project's instructions for setting up a
development environment, including installing any necessary dependencies and
tools.
• Find an issue to work on: Look through the project's issue tracker and find an
issue that you can work on. Start with simpler issues and work your way up to
more complex ones as you gain experience.
• Read the codebase: Take some time to understand the codebase and how the
project works. Read through the documentation and any relevant resources to
gain a better understanding of the project's goals and architecture.

• Write your code: Once you have a good understanding of the project and the
issue you're working on, write your code. Follow the project's coding standards
and practices, and make sure to write clear, concise code with good
documentation.
• Submit a pull request: When you're ready to submit your code, create a pull
request on the project's repository. Make sure to include a clear description of the
changes you've made, and be responsive to any feedback or requests for changes
from the project maintainers.
• Stay engaged: Once your code is merged, continue to stay engaged with the
project community. Offer help to others who are contributing, and keep an eye
out for new issues or opportunities to contribute.

a. Understanding Open Source:


Image 7 : Understanding Open Source

Difference Between Open Source and


Proprietary Software :

Difference Between Open Source and Proprietary Software

Proprietary Software
Open Source Software

It refers to the software that is developed and tested It refers to the software that is solely owned by
through open collaboration. the individual or the organization that
developed it.
Only the owner or publisher who holds the legal
property rights of the source code can access it.
Anyone with the academic knowledge can access,
inspect, modify and redistribute the source code.

The project is managed by a closed group of


The project is managed by an open source community
individuals or team that developed it.
of developers and programmers.

They are focused on a limited market of both


They are not aimed at unskilled users outside of the
skilled and unskilled end users.
programming community.

There is a very limited scope of innovation with


It provides better flexibility which means more freedom
the restrictions and all.
which encourages innovation.

Examples: Windows, macOS, iTunes, Google


Examples: Android, Firefox, LibreOffice, Ubuntu,
Earth, Adobe Flash Player etc.
FreeBSD, Drupal, GNOME etc.

b. Choosing a Project:
c. Familiarizing with the Project:
d. Contributing to the Project:

e. Building Community
Ways to Contribute
There are many ways to contribute to open source projects, regardless of your
skill level or experience. Here are some ways you can get involved:

• Reporting issues: If you come across a bug or an issue in an open source


project, report it to the project's issue tracker on GitHub or other platforms.
This helps the developers identify and fix the issue.
• Fixing issues: If you have programming skills, you can fix issues and bugs by
submitting a pull request on GitHub or other platforms. Make sure to follow
the project's contribution guidelines and coding standards.
• Adding features: If you have an idea for a new feature or improvement, you
can propose it to the project's community through the issue tracker or a pull
request. Be sure to discuss your idea with the community to get feedback and
ensure that it aligns with the project's goals.
• Documentation: Open source projects often need help with documentation.
You can contribute by writing or improving documentation, including user
guides, developer guides, and API reference.
• Translation: Many open source projects have a global user base and need
translations of their documentation, user interface, or other content. You can
help by translating content into your native language or other languages you
are fluent in.
• Testing: Testing is a critical part of any software project, and open source
projects need help with testing on different platforms and configurations. You
can contribute by testing the software and reporting any issues you find.
• Community support: Open source projects rely on a vibrant community to
thrive. You can contribute by answering questions on forums, social media, or
other platforms, and by helping other users to use the software.
• Donations: Many open source projects accept donations to cover their
expenses, such as hosting, domain names, or hardware. You can contribute
financially to support the project.

Methods to join the community and start contributing

• Identify the project and community: First, you need to identify an open source
project that you are interested in and check if it has an active community. You
can use platforms like GitHub or GitLab to search for open source projects.

• Understand the project: Once you find a project, take some time to understand
its purpose, its goals, and its development process. Read the project's
documentation, explore the code, and look for issues that need fixing.

• Join the community: Join the community through the project's communication
channels, which may include a mailing list, a forum, a chat room, or a social
media group. Introduce yourself and ask how you can contribute.

• Pick a task: Look for an issue that matches your skills and interests. Start with
simple tasks, like fixing a typo, adding documentation, or writing a test case.
You can also ask the community for suggestions on what tasks to work on.

• Fork the project: Once you have identified a task, fork the project's repository
and clone it to your local machine.

• Make the changes: Make the changes needed to solve the issue. Use best
practices, such as writing clear commit messages, following the project's
coding style, and testing your changes.

• Submit a pull request: Once you are done with the changes, submit a pull
request to the project's repository. Describe what changes you have made and
why they are important. Be open to feedback and be prepared to make further
changes if needed.
Contribution Flowchart

Community Engagement Experience


The SODA Foundation is an open-source project that aims to develop an ecosystem of data
management tools that can be used across cloud and on-premises environments. It is governed by a
diverse group of individuals and organizations, and has a strong focus on community engagement. It
also hosts regular events and meetups to bring together members of the community and provide
opportunities for networking and collaboration. Additionally, it has established a Diversity and
Inclusion Committee to ensure that the project is welcoming and accessible to people from all
backgrounds and identities. Overall, the SODA Foundation is committed to building a strong and
engaged community around its open-source data management tools, and provides a variety of
opportunities for individuals and organizations to get involved and contribute.
My Contributions
<Describe the contributions that you have done during the course to any open source
project and/or community. You must explain at least 2 such contributions from the
start to end. Please include pictures or tables as needed>

Open Source Value


Open source contributions are valuable in many ways, such as improving software
quality, building relationships and networks, learning and skill development, and
advancing the state of the art. Contributions to open source projects can help identify
and fix bugs, improve performance, and add new features, making it more useful for a
wider range of users. Additionally, they can help build relationships and networks
with other developers who share their interests and skills, and gain experience with
new technologies and techniques.

References
<Add all the references>

You might also like