MSc ISNS Definition • Database replication is the process of creating copies of a database and storing them across various on-premises or cloud destinations • it improves data availability and accessibility. Every user connected to the system can access copies of the same (up-to-date) Cont.. Cont.. Benefits • Higher data availability. Your overall system will still be able to perform adequately even if one of your replicated databases becomes unavailable because you’ll have a copy of the database. • Reduced server load. A replicated, distributed database requires less processing for each server. This means higher performance for queries. Cont.. • More reliable data. As part of the replication process, data in target systems is processed and updated to match that of the source system which helps ensure data integrity. • Less data movement. Having a distributed database allows for versions of the data to be closer to the point of transaction or data entry. Cont.. • Better protection. Achieve redundancy to safeguard the read performance and availability of mission- critical databases and ensure business continuity. • Lower latency. Having copies of your data in multiple locations means more localized data access, which can improve your network performance. This is especially helpful to employees in satellite offices. • Better application performance. Improve the scalability and availability of database-dependent applications. Challenges for Database Replication
• Inconsistent data. Some of your data may not
correctly sync with the rest of your distributed system when you’re copying data between multiple sites at different intervals. • Lost data. Some of your data may be lost if database objects are incorrectly configured within the source database or if the primary key you use to verify data integrity in the replica is incorrect. Types of Database Replication a. Full-table replication b. Key-based incremental replication c. Log-based replication Full-table replication • Full-table replication copies every piece of data within a table from the database to the cloud destination; this includes new, existing and updated data. • Advantages: Because this replicates the entire table, you will always have the correct data set after each sync and can ensure that all inserts, updates and deletes are captured. Cont.. • Disadvantages: This is the least efficient type of database replication and rather resource intensive as you are copying every piece of data within a table whether it has changed or not. • This can also lead to a burst load on the source depending on the size and volume of data within the tables. Key-based incremental replication
• Key-based incremental replication is a
database replication method that uses a replication key to identify new and updated records based on a timestamp or integer key.. • Advantages: Key-based incremental replication is an efficient type of database replication as it only replicates updated and inserted rows thus using fewer resources. Cont.. • Disadvantages: Any data that’s hard-deleted from a database won’t be replicated in your destination of choice without a lot of time and effort put into processes that could identify deletes. Log-based replication
• Log-based replication copies changes based on
a database’s binary log files — a file that records patterns, activities and operations within a database. • Advantages: This type of database replication is the most efficient, as it reads directly from the binary log files and doesn’t compete with other database queries. Cont.. • Disadvantages: Log-based replication is only available for certain databases or you may not have access to your database’s logs if it is hosted by a third-party. Also, setting up log- based replication can be very time-intensive, difficult, and bug-prone if you build it yourself. Database Replication Method
• There are multiple methods for replicating
data from your database. The extensive list of database replication methods allows you to determine a method that suits your infrastructure Methods • Log-Based Incremental Replication. • Key-Based Incremental Replication. • Full Table Replication. • Snapshot Replication. • Transactional Replication. • Merge Replication. • Bidirectional Replication. Replication process Cont.. • Identify your data source The first step is to identify your primary data source where data from your organization originates. This could be any kind of database on-premises or in the cloud. Next, determine the destination you’ll replicate the data to. Potential destinations are major cloud data warehouses, data lakes or even another database. Cont.. • Determine the scope of your database replication The next step is to consider the data you need to replicate from your database. If you need to replicate an entire database, you should opt for a full-table database replication scheme. This ensures that all of your data is available in your destination. However, if you only need certain aspects of a database replicated (e.g., analytical data), you would select the source tables and columns to only replicate part of your database. Cont.. • Decide on a database replication frequency How often do you need the data replicated? Synchronous replication allows for simultaneous updates in real-time. This is typically used for transactional applications that require near real-time data updates. It uses more bandwidth, but it keeps data across the network synchronized. Cont.. Asynchronous replication means that data is written to the primary database first. Then the data is replicated to the destination in batches anywhere from every few minutes to daily. It is more cost-effective to have the data in your database sync on a scheduled timeframe, but there’s also the risk of data loss if recent changes aren’t properly replicated. Cont.. • Choose a database replication type and method Decide on your database replication type: full- table, key-based or log-based. The right choice will depend on factors like your source and destination pairing, the amount of data you need to replicate and the resources available for your database replication. Cont.. • Use a database replication tool Database replication improves the availability of your data by storing it in multiple locations and potentially reducing the load on your source database. To ensure your data is properly replicated, you’ll need to select the right database replication tool for your use case. This will keep your systems running smoothly and ensure you can get the greatest value out of your data.