List: Data Engineering | Curated by Shailesh Kumar

Feb 6, 2025

18 stories

Data Engineering
Hugo Lu
Databricks $10bn Series J: Meta on the Cap Table spells the final endgameDatabricks is going after probably one of the most ambitious exits in the history of tech
Jan 28
4
Jan 28
4
In
Dev Genius
by
Sutanu Dutta
Why Kafka ditched ZookeeperFor many years, Apache Kafka relied on Apache ZooKeeper to manage metadata, cluster configurations and maintain a distributed state across…
Nov 3, 2024
Nov 3, 2024
In
Data Engineer Things
by
Dani
Apache Iceberg: The Hadoop of the Modern Data Stack?The bigger they are the harder they fall.
Dec 12, 2024
6
Dec 12, 2024
6
In
Pinterest Engineering Blog
by
Pinterest Engineering
Change Data Capture at PinterestLiang Mou; Staff Software Engineer, Logging Platform | Elizabeth (Vi) Nguyen; Software Engineer I, Logging Platform |
Nov 18, 2024
1
Nov 18, 2024
1
Chunting Wu
Is there an Alternative to Debezium + Kafka?Evaluating open-source options to improve performance and scalability in CDC pipelines
Nov 4, 2024
2
Nov 4, 2024
2
In
Data Engineer Things
by
Yingjun Wu
Kafka Has Reached a Turning PointIs Kafka still relevant in today’s evolving tech landscape? And where is Kafka headed in the future?
Sep 23, 2024
14
Sep 23, 2024
14
In
Level Up Coding
by
Dr. Ashish Bamania
Google’s New Algorithms Just Made Searching Vector Databases Faster Than EverA Deep Dive into how Google’s ScaNN and SOAR Search algorithms supercharge the performance of Vector Databases
Jun 18, 2024
4
Jun 18, 2024
4
Kai Waehner
When NOT to Use Apache Kafka? (Lightboard Video)When NOT so use Apache Kafka: DOs and DONTs; no matter if you use open source, Confluent, Amazon MSK, Event Hubs, Redpanda, Warpstream, etc
Jun 21, 2024
2
Jun 21, 2024
2
In
TDS Archive
by
Bernd Wessely
Deliver Your Data as a Product, But Not as an ApplicationData as a product is an intriguing concept, but beware of the application trap
Jul 12, 2024
2
Jul 12, 2024
2
In
TDS Archive
by
Bernd Wessely
Avoid Building a Data Platform in 2024Why articles about ‘Building a Data Platform’ are mostly misleading
Aug 13, 2024
12
Aug 13, 2024
12
In
TDS Archive
by
Dario Radečić
DuckDB and AWS — How to Aggregate 100 Million Rows in 1 MinuteProcess huge volumes of data with Python and DuckDB — An AWS S3 example.
Apr 25, 2024
8
Apr 25, 2024
8
In
Data Engineer Things
by
Leo Godin
No, Data Engineers Don’t NEED dbt.But It Sure Does Solve a Lot of Problems
Jul 19, 2024
26
Jul 19, 2024
26
In
FAUN — Developer Community 🐾
by
Aymen El Amri
The Hottest Open Source Projects Of 2023This article was originally posted on faun.dev.
Dec 28, 2023
12
Dec 28, 2023
12
Shawn Gordon
What the Heck is Apache Paimon?Introduction
Dec 5, 2023
2
Dec 5, 2023
2
Shawn Gordon
What the heck is GlareDB?Introduction
Sep 20, 2023
Sep 20, 2023
Florian Tieben
Fluvio: A Kafka + Flink Built Using Rust + WASMFluvio is a new open-source streaming platform that is built using Rust and WebAssembly (WASM). It is a combination of Apache Kafka and…
Oct 5, 2023
3
Oct 5, 2023
3
Kieran Healey
Cached Takes: 80% of Companies do not need Snowflake or DatabricksThe cost for something that can be replicated free and open source is absurd. The Fortune 100 have a use case for these companies, the rest…
Jul 14, 2023
21
Jul 14, 2023
21
Rafael "Auyer" Passos
How I Decreased ETL Cost by Leveraging the Apache Arrow EcosystemIn the field of Data Engineering, the Apache Spark framework is one of the most known and powerful ways to extract and process data.
It is…
Feb 15, 2023
1
Feb 15, 2023
1

Data Engineering

Databricks $10bn Series J: Meta on the Cap Table spells the final endgame

Databricks is going after probably one of the most ambitious exits in the history of tech

Why Kafka ditched Zookeeper

For many years, Apache Kafka relied on Apache ZooKeeper to manage metadata, cluster configurations and maintain a distributed state across…

Apache Iceberg: The Hadoop of the Modern Data Stack?

The bigger they are the harder they fall.

Change Data Capture at Pinterest

Liang Mou; Staff Software Engineer, Logging Platform | Elizabeth (Vi) Nguyen; Software Engineer I, Logging Platform |

Is there an Alternative to Debezium + Kafka?

Evaluating open-source options to improve performance and scalability in CDC pipelines

Kafka Has Reached a Turning Point

Is Kafka still relevant in today’s evolving tech landscape? And where is Kafka headed in the future?

Google’s New Algorithms Just Made Searching Vector Databases Faster Than Ever

A Deep Dive into how Google’s ScaNN and SOAR Search algorithms supercharge the performance of Vector Databases

When NOT to Use Apache Kafka? (Lightboard Video)

When NOT so use Apache Kafka: DOs and DONTs; no matter if you use open source, Confluent, Amazon MSK, Event Hubs, Redpanda, Warpstream, etc

Deliver Your Data as a Product, But Not as an Application

Data as a product is an intriguing concept, but beware of the application trap

Avoid Building a Data Platform in 2024

Why articles about ‘Building a Data Platform’ are mostly misleading

DuckDB and AWS — How to Aggregate 100 Million Rows in 1 Minute

Process huge volumes of data with Python and DuckDB — An AWS S3 example.

No, Data Engineers Don’t NEED dbt.

But It Sure Does Solve a Lot of Problems

The Hottest Open Source Projects Of 2023

This article was originally posted on faun.dev.

What the Heck is Apache Paimon?

Introduction

What the heck is GlareDB?

Introduction

Fluvio: A Kafka + Flink Built Using Rust + WASM

Fluvio is a new open-source streaming platform that is built using Rust and WebAssembly (WASM). It is a combination of Apache Kafka and…

Cached Takes: 80% of Companies do not need Snowflake or Databricks

The cost for something that can be replicated free and open source is absurd. The Fortune 100 have a use case for these companies, the rest…

How I Decreased ETL Cost by Leveraging the Apache Arrow Ecosystem

In the field of Data Engineering, the Apache Spark framework is one of the most known and powerful ways to extract and process data. It is…

Shailesh Kumar