Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Both are crucial!

Efficient data management ensures that businesses extract valuable insights from their data, which ● Efficiency: It enables systems to manage increased data loads without compromising on performance.
What is big data? leads to better strategies and outcomes. It makes the most out of every bit of data that flows through the business
pipelines. Data management ensures that you can trust the data. Applications of Big Data
Big data is a combination of structured, semi-structured and unstructured data collected by organizations that can be By analyzing this data, the useful decision can be made in various cases as discussed below:
mined for information and used in machine learning projects, predictive modelling and other advanced analytics Better Insights: Efficiently handled big data leads to more accurate and deeper insights.
applications. Systems that process and store big data have become a common component of data 1. Tracking Customer Spending Habit, Shopping Behaviour: In big retails store (like Amazon, Walmart, Big Bazar
management architectures in organizations, combined with tools that support big data analytics uses. ● Informed Decisions: With better insights come, better data-driven decisions. etc.) management team has to keep data of customer’s spending habit (in which product customer spent, in which brand
● Enhanced Performance: Proper data management contributes to improved overall business performance. they wish to spent, how frequently they spent), shopping behaviour, customer’s most liked product (so that they can keep
those products in the store).
Big data is often characterized by the three V's: 2. Recommendation: By tracking customer spending habit, shopping behaviour, big data retails store provide a
Definition and Importance
recommendation to the customer. E-commerce site like Amazon, Walmart, Flipkart does product recommendation. They
● the large volume of data in many environments; So, we have established that scalability is pivotal, but let’s probe a bit more. Scalability in data analytics is the track what product a customer is searching, based on that data they recommend that type of product to that customer.
capability of your analytics system to manage increased demands efficiently, ensuring the system remains effective even 3. Smart Traffic System: Data about the condition of the traffic of different road, collected through camera kept
● the wide variety of data types frequently stored in big data systems; and as it grows larger and more complex. beside the road, at entry and exit point of the city, GPS device placed in the vehicle (Ola, Uber cab, etc.).
4. Secure Air Traffic System: At various places of flight (like propeller etc) sensors present. These sensors capture
● the velocity at which much of the data is generated, collected and processed. Why is Scalability Relevant to Modern Businesses?
data like the speed of flight, moisture, temperature, other environmental condition. Based on such data analysis, an
environmental parameter within flight are set up and varied.
Why is big data important? Data Overload: Today, businesses encounter a sea of data every day. With so much data coming in every day, it’s

5. Auto Driving Car: Big data analysis helps drive a car without human interpretation. In the various spot of car
really easy to get data overload. camera, a sensor placed, that gather data like the size of the surrounding car, obstacle, distance from those, etc.
● Companies use big data in their systems to improve operations, provide better customer service, create ● Speed & Efficiency: The quicker and more efficiently a business can process its data, the faster it can make 6. Virtual Personal Assistant Tool: Big data analysis helps virtual personal assistant tool (like Siri in Apple Device,
personalized marketing campaigns and take other actions that, ultimately, can increase revenue and profits. informed decisions. Cortana in Windows, Google Assistant in Android) to provide the answer of the various question asked by users.
● Businesses that use it effectively hold a potential competitive advantage over those that don't because they're able ● Cost Efficiency: Scalable solutions mean that businesses can handle more data without a proportional increase in 7. IoT:
to make faster and more informed business decisions. cost. Manufacturing company install IOT sensor into machines to collect operational data. Analyzing such data, it can be
● For example, big data provides valuable insights into customers that companies can use to refine their marketing, ● Customer Satisfaction: With scalable solutions, businesses can ensure uninterrupted services even when the user predicted how long machine will work without any problem when it requires repairing so that company can take action
advertising and promotions in order to increase customer engagement and conversion rates. base grows, keeping customers happy. before the situation when machine facing a lot of issues or gets totally down. Thus, the cost to replace the whole
● Both historical and real-time data can be analysed to assess the evolving preferences of consumers or corporate machine can be saved.
buyers, enabling businesses to become more responsive to customer wants and needs. Why is Scalability Crucial in Data Analytics? 8. Education Sector: Online educational course conducting organization utilize big data to search candidate,
interested in that course. If someone searches for YouTube tutorial video on a subject, then online or offline course
● Volume Management: It enables the handling of increasing data volumes smoothly. provider organization on that subject send ad online to that person about their course.
Here are some more examples of how big data is used by organizations: 9. Energy Sector: Smart electric meter read consumed power every 15 minutes and sends this read data to the
● Efficiency and Performance: It assures that growing data doesn’t compromise the system's efficiency and
● In the energy industry, big data helps oil and gas companies identify potential drilling locations and monitor server, where data analysed and it can be estimated what is the time in a day when the power load is less throughout the
performance.
pipeline operations; likewise, utilities use it to track electrical grids. city.
● Business Growth: Scalable analytics systems are pivotal for businesses aiming for growth and expansion.
● Financial services firms use big data systems for risk management and real-time analysis of market data. 10. Media and Entertainment Sector: Media and entertainment service providing company like Netflix, Amazon
● Manufacturers and transportation companies rely on big data to manage their supply chains and optimize Prime, Spotify do analysis on data collected from their users.
delivery routes. Types of Scalability: Scalability isn’t a one-size-fits-all concept! There are mainly two types:
● Other government uses include emergency response, crime prevention and smart city initiatives. Big Data Characteristics
Horizontal Scalability: This is like adding more tables to your restaurant to serve more customers. In tech terms, it
What are examples of big data? means adding more machines or nodes to your system to manage increased load. For instance, if a server is overloaded
Big data comes from sources -- some examples are transaction processing systems, customer databases, Big Data contains a large amount of data that is not being processed by traditional data storage or the processing
with requests, adding more servers will balance the load.
documents, emails, medical records, internet clickstream logs, mobile apps and social networks. unit. It is used by many multinational companies to process the data and business of many organizations. The data
It also includes machine-generated data, such as network and server log files and data from sensors on Vertical Scalability: Think of this as replacing your small car with a bus to accommodate more people. In the data flow would exceed 150 exabytes per day before replication.
manufacturing machines, industrial equipment and internet of things devices. world, it means increasing the capacity of an existing machine, like adding more RAM or a faster processor to handle
more data. Basically, replacing your iPhone 14 with iPhone 15. There are five v's of Big Data that explains the characteristics.
Where is Big data?
Examples:
Big data is often stored in a data lake. While data warehouses are commonly built on relational databases and 5 V's of Big Data
contain structured data only, data lakes can support various data types and typically are based on Hadoop clusters, ● Horizontal: Adding more servers to a database system to manage increased traffic.
cloud object storage services, NoSQL databases or other big data platforms. o Volume
● Vertical: Upgrading a server's hardware specifications to enhance its data processing capacity. o Veracity
o Variety
Data Scalability: Data Scalability is a topic that’s extremely crucial in the world of data but doesn’t get enough o Value
The Relationship between Scalability and Performance o Velocity
limelight! It’s the backbone that helps businesses, large and small, stand tall and manage their data effectively, no matter o Volume
how much it grows. Here’s where the rubber meets the road. Scalability doesn’t just mean growing; it means how efficiently you grow. It
significantly impacts the performance of data analytics solutions.
The name Big Data itself is related to an enormous size. Big Data is a vast 'volumes' of data generated from many
What is Scalability in Data Analytics? Efficient Resource Utilization: sources daily, such as business processes, machines, social media platforms, networks, human interactions, and
Scalability, in simple terms, is the ability of a system, network, or process to handle a growing amount of work or its many more.
Effective scalability ensures that resources are used judiciously, preventing waste and ensuring optimum
potential to accommodate growth. In data analytics, it means the capability of a system to increase its capacity to performance. It's like using just the right amount of water to grow a plant not too little, not too much!
process, analyse, and interpret more and more data smoothly and efficiently. It ensures that as your business grows, 1. Variety
your data solutions grow along with it without crashing or slowing down. ‍Enhancing Data Processing
Big Data can be structured, unstructured, and semi-structured that are being collected from different sources.
Let’s talk about how scalability makes data processing faster and more efficient. Scalable solutions allow businesses
Importance of Handling Large Volumes of Data: Data will only be collected from databases and sheets in the past, But these days the data will comes in array forms,
to crunch numbers and analyze data at lightning speed, ensuring that the decision-makers get the insights they need
that are PDFs, Emails, audios, SM posts, photos, videos, etc.
Imagine you have to manage a city’s entire water supply without any leaks or disruptions. It’s exactly the same case when they need them.
with handling large volumes of data effectively. How Does Scalability Impact Real-Time Data Analysis?

● Speed: It allows for swift data processing, making real-time analysis possible.

2. Privacy and Security: ● Big Data is also variable because of the multitude of data dimensions resulting from multiple disparate data
● It is another most important challenge with Big Data. This challenge includes sensitive, conceptual, technical types and sources.
as well as legal significance. ● Example: Data in bulk could create confusion whereas less amount of data could convey half or Incomplete
● Most of the organizations are unable to maintain regular checks due to large amounts of data generation. Information.
However, it should be necessary to perform security checks and observation in real time because it is most 5. Value:
beneficial. ● After having the 4 V’s into account there comes one more V which stands for Value! The bulk of Data having no
3. Analytical Challenges: Value is of no good to the company, unless you turn it into something useful.
● There are some huge analytical challenges in big data which arise some main challenges questions like how ● Data in itself is of no use or importance but it needs to be converted into something valuable to extract
to deal with a problem if data volume gets too large? Information. Hence, you can state that Value! is the most important V of all the 6V’s.
● Or how to find out the important data points? 6. Variability:
4. Technical challenges: ● How fast or available data that extent is the structure of your data is changing?
● Quality of data: ● How often does the meaning or shape of your data change?
1. When there is a collection of a large amount of data and storage of this data, it comes at a cost. Big ● Example: if you are eating same ice-cream daily and the taste just keep changing.
companies, business leaders and IT leaders always want large data storage.
2. For better results and conclusions, Big data rather than having irrelevant data, focuses on quality data storage.
3. This further arise a question that how it can be ensured that data is relevant, how much data would be enough Data Science: Getting Value out of Big Data
The data is categorized as below: for decision making and whether the stored data is accurate or not. The primary reason why Big Data has developed rapidly over the last years is because it provides long-term
● Fault tolerance: enterprise value. Value is captured both, in terms of immediate social or monetary gain, and in the form of a strategic
a. Structured data: In Structured schema, along with all the required columns. It is in a tabular form. Structured 1. Fault tolerance is another technical challenge and fault tolerance computing is extremely hard, involving competitive advantage.
Data is stored in the relational database management system. intricate algorithms.
2. Nowadays some of the new technologies like cloud computing and big data always intended that whenever the
a. Semi-structured: In Semi-structured, the schema is not appropriately defined, e.g., JSON, XML, CSV, TSV, Due to its wide range of applications, Big Data is embraced by all types of industries, ranging from healthcare,
failure occurs the damage done should be within the acceptable threshold that is the whole task should not
and email. OLTP (Online Transaction Processing) systems are built to work with semi-structured data. It is stored in finance and insurance, to the academic and non-profit sectors. There are various ways in which value can be captured
begin from the scratch.
relations, i.e., tables. through Big Data and how enterprises can leverage to facilitate growth or become more efficient. Enterprises can
● Scalability:
b. Unstructured Data: All the unstructured files, log files, audio files, and image files are included in the 1. Big data projects can grow and evolve rapidly. The scalability issue of Big Data has lead towards cloud capture value from Big Data in one of the following five ways:
unstructured data. Some organizations have much data available, but they did not know how to derive the value of data computing.
since the data is raw. 2. It leads to various challenges like how to run and execute various jobs so that goal of each workload can be 1) Creating transparency
c. Quasi-structured Data: The data format contains textual data with inconsistent data formats that are formatted achieved cost-effectively.
with effort and time with some tools. Using the data of an organization to determine future decisions makes an organization increasingly more transparent
6 Vs of Big Data and breaks down the silos between different departments. Big Data is analysed across different boundaries and can
Example: Web server logs, i.e., the log file is created and maintained by some server that contains a list identify a variety of inefficiencies. In manufacturing organizations, for example, Big Data can help identify improvement
of activities. In recent years, Big Data was defined by the “3Vs” but now there is “6Vs” of Big Data which are also termed as the opportunities across R&D, engineering and production departments in order to bring new products faster to market.
characteristics of Big Data as follows:
2. Veracity : Veracity means how much the data is reliable. It has many ways to filter or translate the data. Veracity is 1. Volume: 2) Data driven discovery
the process of being able to handle and manage data efficiently. Big Data is also essential in business development. ● The name ‘Big Data’ itself is related to a size which is enormous.
● Volume is a huge amount of data.
As enterprises create and store more and more transactional data in digital forms, more performance data becomes
3. Value ● To determine the value of data, size of data plays a very crucial role. If the volume of data is very large, then it is
available. Big Data can provide tremendous new insights that might have not been identified previously by finding
actually considered as a ‘Big Data’. This means whether a particular data can actually be considered as a Big Data or
patterns or trends in data sets. In the insurance industry for example, Big Data can help to determine profitable products
not, is dependent upon the volume of data.
Value is an essential characteristic of big data. It is not the data that we process or store. It and provide improved ways to calculate insurance premiums.
● Hence while dealing with Big Data it is necessary to consider a characteristic ‘Volume’.
is valuable and reliable data that we store, process, and also analyze. ● Example: In the year 2016, the estimated global mobile traffic was 6.2 Exabytes (6.2 billion GB) per month. Also,
by the year 2020 we will have almost 40000 Exabytes of data. 3) Segmentation and customization
4. Velocity 2. Velocity:
● Velocity refers to the high speed of accumulation of data. The analysis of Big Data provides an improved opportunity to customize product-market offerings to specified
Velocity plays an important role compared to others. Velocity creates the speed by which the data is created ● In Big Data velocity data flows in from sources like machines, networks, social media, mobile phones etc. segments of customers in order to increase revenues. Data about user or customer behaviour makes it possible to build
in real-time. It contains the linking of incoming data sets speeds, rate of change, and activity bursts. The primary ● There is a massive and continuous flow of data. This determines the potential of data that how fast the data is different customer profiles that can be targeted accordingly. Online retailers, for example, can tailor the product offering
aspect of Big Data is to provide demanding data rapidly. Big data velocity deals with the speed at the data flows from generated and processed to meet the demands. on their websites to match the current customer and increase their conversion rates.
sources like application logs, business processes, networks, and social media sites, sensors, mobile ● Sampling data can help in dealing with the issue like ‘velocity’.
devices, etc. ● Example: There are more than 3.5 billion searches per day are made on Google. Also, Facebook users are
4) The power of automation
increasing by 22%(Approx.) year by year.
5. Volume: This refers to the sheer amount of data being generated, stored, and processed. It encompasses the 3. Variety:
It refers to nature of data that is structured, semi-structured and unstructured data. The underlying algorithms that analyse Big Data sets can be used to replace manual decisions and labour-intensive
scale of data, which can range from terabytes to zettabytes. ●
● It also refers to heterogeneous sources. calculations by automated decisions. Automation can optimize enterprise processes and improve accuracy or response
● Variety is basically the arrival of data from new sources that are both inside and outside of an enterprise. It can times. Retailers, for example, can leverage Big Data algorithms to make purchasing decisions or determine how much
Challenges of Big Databases stock will provide an optimal rate of return.
be structured, semi-structured and unstructured.
The challenges in Big Data are the real implementation hurdles. These require immediate attention and need to be ● Structured data: This data is basically an organized data. It generally refers to data that has defined the length
handled because if not handled then the failure of the technology may take place which can also lead to some and format of data. 5) Innovation and new products.
unpleasant result. Big data challenges include the storing, analysing the extremely large and fast-growing data. ● Semi- Structured data: This data is basically a semi-organised data. It is generally a form of data that do not
Some of the Big Data challenges are: conform to the formal structure of data. Log files are the examples of this type of data. Big data can unearth patterns that identify the need of new products or increase the design of current products or
1. Sharing and Accessing Data: ● Unstructured data: This data basically refers to unorganized data. It generally refers to data that doesn’t fit services. By analysing purchasing data or search volumes, organizations can identify demand for products that the
● Perhaps the most frequent challenge in big data efforts is the inaccessibility of data sets from external neatly into the traditional row and column structure of the relational database. Texts, pictures, videos etc. are the organization might be unaware of.
sources. examples of unstructured data which can’t be stored in the form of rows and columns.
● Sharing data can cause substantial challenges. 4. Veracity:
It refers to inconsistencies and uncertainty in data, that is data which is available can sometimes get messy and Steps in the Data science process
● It include the need for inter and intra- institutional legal documents. ●

● Accessing data from public repositories leads to multiple difficulties. quality and accuracy are difficult to control.

What is Data Science Process? DFS has two components: ● Data integrity: the file system can ensure that the data stored in the files is accurate and has not been corrupted.
Data Science is all about a systematic process used by Data Scientists to analyse, visualize and model large ● File migration: the file system can move files from one location to another without interrupting access to the files.
amounts of data. A data science process helps data scientists use the tools to find unseen patterns, extract data, and ● Location Transparency- ● Data consistency: changes made to a file by one user are immediately visible to all other users.
convert information to actionable insights that can be meaningful to the company. This aids companies and businesses Location Transparency achieves through the namespace component. Applications:
in making decisions that can help in customer retention and profits. Further, a data science process helps in discovering ● Redundancy – ● NFS –
hidden patterns of structured and unstructured raw data. Redundancy is done through a file replication component. NFS stands for Network File System. It is a client-server architecture that allows a computer user to view, store, and
The process helps in turning a problem into a solution by treating the business problem as a project. So, let us learn In the case of failure and heavy load, these components together improve data availability by allowing the sharing of update files remotely. The protocol of NFS is one of the several distributed file system standards for Network-Attached
what is data science process is in detail and what are the steps involved in a data science process. data in different locations to be logically grouped under one folder, which is known as the “DFS root”. Storage (NAS).
The six steps of the data science process are as follows: Features of DFS: ● CIFS –
1. Frame the problem ● Transparency : CIFS stands for Common Internet File System. CIFS is an accent of SMB. That is, CIFS is an application of SIMB
2. Collect the raw data needed for your problem Structure transparency- protocol, designed by Microsoft.
3. Process the data for analysis There is no need for the client to know about the number or locations of file servers and the storage devices. Multiple file ● SMB –
4. Explore the data servers should be provided for performance, adaptability, and dependability. SMB stands for Server Message Block. It is a protocol for sharing a file and was invented by IMB. The SMB protocol was
5. Perform in-depth analysis ● Access transparency – created to allow computers to perform read and write operations on files to a remote host over a Local Area Network
6. Communicate results of the analysis Both local and remote files should be accessible in the same manner. The file system should be automatically located on (LAN). The directories present in the remote host can be accessed via SMB and are called as “shares”.
As the data science process stages help in converting raw data into monetary gains and overall profits, any data the accessed file and send it to the client’s side. ● Hadoop –
scientist should be well aware of the process and its significance. Now, let us discuss these steps in detail. ● Naming transparency – Hadoop is a group of open-source software services. It gives a software framework for distributed storage and operating
There should not be any hint in the name of the file to the location of the file. Once a name is given to the file, it should of big data using the Map Reduce programming model. The core of Hadoop contains a storage part, known as Hadoop
Steps in Data Science Process not be changed during transferring from one node to another. Distributed File System (HDFS), and an operating part which is a Map Reduce programming model.
A data science process can be more accurately understood through data science online courses and certifications on ● Replication transparency – ● NetWare –
data science. But, here is a step-by-step guide to help you get familiar with the process. If a file is copied on multiple nodes, both the copies of the file and their locations should be hidden from one node to NetWare is an abandon computer network operating system developed by Novell, Inc. It primarily used combined
another. multitasking to run different services on a personal computer, using the IPX network protocol.
Step 1: Framing the Problem
Before solving a problem, the pragmatic thing to do is to know what exactly the problem is. Data questions must be ● User mobility:
first translated to actionable business questions. People will more than often give ambiguous inputs on their issues. It will automatically bring the user’s home directory to the node where the user logs in.
A great way to go through this step is to ask questions like: ● Performance: Working of DFS:
● Who the customers are? Performance is based on the average amount of time needed to convince the client requests. This time covers the CPU There are two ways in which DFS can be implemented:
● How to identify them? time + time taken to access secondary storage + network access time. It is advisable that the performance of the
● What is the sale process right now? ● Standalone DFS namespace –
● Why are they interested in your products? Distributed File System be similar to that of a centralized file system. It allows only for those DFS roots that exist on the local computer and are not using Active Directory. A Standalone DFS
● What products they are interested in? ● Simplicity and ease of use: can only be acquired on those computers on which it is created.
The user interface of a file system should be simple and the number of commands in the file should be small. ● Domain-based DFS namespace –
Step 2: Collecting the Raw Data for the Problem ● High availability:
After defining the problem, you will need to collect the requisite data to derive insights and turn the business problem It stores the configuration of DFS in Active Directory, creating the DFS namespace root accessible
into a probable solution. The process involves thinking through your data and finding ways to collect and get the data A Distributed File System should be able to continue in case of any partial failures like a link failure, a node failure, or a at \\<domainname>\<dfsroot> or \\<FQDN>\<dfsroot>
you need. It can include scanning your internal databases or purchasing databases from external sources. storage drive crash.
A high authentic and adaptable distributed file system should have different and independent file servers for controlling
Step 3: Processing the Data to Analyse different and independent storage devices.
After the first and second steps, when you have all the data you need, you will have to process it before going further ● Scalability:
and analysing it. Data can be messy if it has not been appropriately maintained, leading to errors that easily corrupt the
analysis. These issues can be values set to null when they should be zero or the exact opposite, missing values, Since growing the network by adding new machines or joining two networks together is routine, the distributed system
duplicate values, and many more. will inevitably grow over time. As a result, a good distributed file system should be built to scale quickly as the number of
The most common errors that you can encounter and should look out for are: nodes and users in the system grows.
1. Missing values ● High reliability:
2. Corrupted values like invalid entries
3. Time zone differences The likelihood of data loss should be minimized as much as feasible in a suitable distributed file system. That is,
4. Date range errors like a recorded sale before the sales even started because of the system’s unreliability, users should not feel forced to make backup copies of their files. Rather, a file
system should create backup copies of key files that can be used if the originals are lost.
Step 4: Exploring the Data ● Data integrity :
In this step, you will have to develop ideas that can help identify hidden patterns and insights. You will have to find
more interesting patterns in the data, such as why sales of a particular product or service have gone up or down. Multiple users frequently share a file system. The integrity of data saved in a shared file must be guaranteed by the file
system. That is, concurrent access requests from many users who are competing for access to the same file must be
Step 5: Performing In-depth Analysis correctly synchronized using a concurrency control method.
This step will test your mathematical, statistical, and technological knowledge. You must use all the data science tools ● Security:
to crunch the data successfully and discover every insight you can. You might have to prepare a predictive model that A distributed file system should be secure so that its users may trust that their data will be kept private. To safeguard the
can compare your average customer with those who are underperforming.
information contained in the file system from unwanted & unauthorized access, security mechanisms must be
Step 6: Communicating Results of this Analysis implemented. Advantages:
After all these steps, it is vital to convey your insights and findings to the sales head and make them understand their ● Heterogeneity:
importance. It will help if you communicate appropriately to solve the problem you have been given. Proper ● DFS allows multiple user to access or store the data.
communication will lead to action. In contrast, improper contact may lead to inaction. Heterogeneity in distributed systems is unavoidable as a result of huge scale. Users of heterogeneous distributed ● It allows the data to be share remotely.
systems have the option of using multiple computer platforms for different purposes. ● It improved the availability of file, access time, and network efficiency.
Distributed file systems Properties: ● Improved the capacity to change the size of the data and also improves the ability to exchange the data.
A Distributed File System (DFS) as the name suggests, is a file system that is distributed on multiple file servers or ● File transparency: users can access files without knowing where they are physically stored on the network. ● Distributed File System provides transparency of data even if server or disk fails.
multiple locations. It allows programs to access or store isolated files as they do with the local ones, allowing ● Load balancing: the file system can distribute file access requests across multiple computers to improve Disadvantages:
programmers to access files from any network or computer. performance and reliability. ● In Distributed File System nodes and connections needs to be secured therefore we can say that security is at
● Data replication: the file system can store copies of files on multiple computers to ensure that the files are stake.
The main purpose of the Distributed File System (DFS) is to allow users of physically distributed systems to share available even if one of the computers fails. ● There is a possibility of lose of messages and data in the network while movement from one node to another.
their data and resources by using a Common File System. ● Security: the file system can enforce access control policies to ensure that only authorized users can access ● Database connection in case of Distributed File System is complicated.
files. ● Also handling of the database is not easy in Distributed File System as compared to a single user system.
A collection of workstations and mainframes connected by a Local Area Network (LAN) is a configuration on ● Scalability: the file system can support a large number of users and a large number of files.
Distributed File System. A DFS is executed as a part of the operating system. In DFS, a namespace is created and this ● There are chances that overloading will take place if all nodes tries to send data at once.
● Concurrent access: multiple users can access and modify the same file at the same time.
process is transparent for the clients. ● Fault tolerance: the file system can continue to operate even if one or more of its components fail.

You might also like