



The third chapter goes into a lot of detail regarding the building blocks of different types of database systems: the data structures and algorithms used for the different systems shown in the previous chapter are described you get to know hash indexes, SSTables (Sorted String Tables), Log-Structured Merge trees (LSM-trees), B-trees, and other data structures. This chapter is a solid foundation for understanding the difference between the relational data model, document data model, graph data model, as well as the languages used for processing data stored using these models. Second chapter gives a brief overview of different data models and shows the suitability of them to different use cases, using modern challenges that companies such as Twitter faced. Martin Kleppmann starts out by solidly giving the reader the conceptual framework in the first chapter: what does reliability mean? How is it defined? What is the difference between "fault" and "failure"? How do you describe load on a data intensive system? How do you talk about performance and scalability in a meaningful way? What does it mean to have a "maintainable" system? But if you want to understand the main principles, issues, as well as the challenges of data intensive and distributed system, you've come to the right place.

If you are after the obscure details of a particular product, or some tutorials and "how-to"s, go elsewhere. What the author does is to lay down the principles of current distributed big data systems, and he does a very fine job of it. But it is not a practice or a cookbook for a particular Big Data, NoSQL or newSQL product. Like a specialized encyclopedia, it covers a broad field in considerable detail. I consider this book a mini-encyclopedia of modern data engineering.
