All You Need to Know about Isolation Levels and Read Inconsistencies
Just like real life, the world of computer science is replete with trade-offs. Relational databases are no exception. When interacting with relational databases, we face the dilemma between data consistency and transaction concurrency. The former guarantees the data is trustworthy, while the latter ensures relational databases can conduct transactions swiftly. Both are desirable qualities of relational databases, but we cannot simultaneously achieve them to the fullest extent....CAP Theorem But Better? Introduce the PACELC Theorem
In the previous blog, I introduced the famous CAP Theorem (please give it a read if you haven’t already before you start reading this one). It involves a trilemma of needing to give up one of the following three qualities: consistency, availability, and partition tolerance. Since all three are desirable features of modern-day distributed systems, determining which one to relinquish has become one of the most important and delicate decisions for designers of complicated distributed systems....What You Need to Know about the CAP Theorem
The world that we live in is far from perfect. We constantly find ourselves in dilemmas, sometimes even trilemmas, that require us to make trade-offs. When shopping, we can only choose two out of “cheap,” “fast,” and “good.” In economics, a government cannot enjoy “sovereign monetary policy,” “fixed exchange rate,” and “free capital flow” at the same time....A Brief Introduction to Kafka
There’s no denying that we have already ushered in the era of big data. An enormous amount of information is generated every second. While decision-makers can gain invaluable insights from this ever-growing data, its sheer volume also poses considerable challenges to data engineers–greater demand for storage spaces, the need to handle increasingly complex data formats, and highly unpredictable network traffic....A Brief Introduction to the Inner Working of MapReduce
As a data engineer, you probably have heard about Hadoop. It is one of the most popular frameworks for distributed processing of large data sets. It is less costly and more secure than other frameworks. At its center is a programming model called MapReduce. Today we will take a closer look at MapReduce to understand the inner working of Hadoop....ETL vs. ELT: Pick the Most Suitable Data Integration Method for Your Project
As a data engineer, you probably have heard of the data integration methodology called ETL (Extract-Transform-Load). It has been around for a while, and many data engineers have used this methodology to build data pipelines. However, ETL is not the only option up our sleeves. Recently, ELT has also been gaining a lot of popularity....