Bibliography

DevOps

Blog Posts

5 Lessons Learned From Writing Over 300,000 Lines of Infrastructure Code: A concise masterclass on how to write infrastructure code (Yevgeniy Brikman)

Docker

Note: Many of these resources were shared with us by the SpaceCraft REPL team. A hat tip to them!

Blog Posts

Import SQL dump in postgres docker container in 10 min (Deeksha Sharma)

Courses

Docker and Kubernetes: The Complete Guide (Stephen Grider)

Videos

Virtual Machines vs Docker Containers - Dive Into Docker (Nick Janetakis)

Succinct and clear explanation; analogy at the end is quite useful

What is a Container? (Ben Corrie)

Deeper dive into what a container is, which is useful given that it is an overloaded term. Visuals are very helpful

Event Data / Metadata / Analytics

Blog Posts

Analytics For Hackers: How To Think About Event Data (Michelle Wetzler)

Great overview of event data and how it differs from entity data

Event Data vs Entity Data — How to store user properties in Keen IO (Michelle Wetzler)

Talks about how to store entity data in an event storage database (specifically Keen.io). Also has a brief description of event data vs entity data

An introduction to event data modeling (Yali Sassoon)

Good for it's distinction between atomic data and modeled data. Lots of info on how we work with aggregated modeled data

Videos

(Event) Data is Everywhere (Taylor Barnett)

Brief introduction to event data. Largely useful for her clear definition of event data and how it differs from entity data

Whitepapers

Build vs. Buy Gets Easier with APIs: A CTO's Guide to Getting Data Strategy Right (Keen IO)

The Death of Web Analytics (Heap Analytics)

Mixpanel System Architecture (Vijay Jayaram)

MOOCdb: Developing Standards and Systems to Support MOOC Data Science (Kalyan Veeramachaneni et al.)

An account of a "solution to centralizing and generalizing MOOC data organization". Essentially, the group develops a set of schema to handle event data, i.e. online students engagement with their web-based courses

Time Series Databases (Dmitry Namiot)

Event-series/Time-series Databases

Blog Posts

Row Store and Column Store Databases (Rick Golba)

Clear discussion of relative advantages and common use cases of columnar and row-based databases. Emphasis on transaction and query type (and not, e.g. scalability, consistency, etc)

SlicingDice.com Blog (SlicingDice)

Slicingdice is a competitor with Keen.io. The company's blog has a series of posts about their infrastructure that are quite dense. They also have a post with a lot of links to concepts (usually wiki pages), books, and white papers that helped them to build their time-based database

SQL or NoSQL, That Is the Question (Jordan Baker)

Survey results that showed that most developers are still using SQL, but that NoSQL is on the rise

Time-Series Database Requirements (Baron Schwartz)

Baron Schwartz talks about requirements for time series databases

TSDBs at Scale - Part One (Fred Moyer)

TSDBs at Scale - Part Two (Fred Moyer)

Why Not to Build a Time Series Database (David Gildeh)

Slideshows

Behavior Databases - Next Generation NoSQL Analytics (Ben Johnson)

Slides describing Ben Johnson's behavior database. Main distinction between entity and event data that Keen uses first made here

Store JSON in Cassandra the Hard Way (Josh Dzielak)

Explains Keen.io's method for getting JSON data into Cassandra. Even though it's just the slides, you can get the overall gist from them

Videos

Intro to Time Series Databases & Data | Getting Started 1 of 7 (Michael DeSa)

Introduction to time series data and demonstration of one of time series databases, InfluxDB

Whitepapers

The Design and Implementation of Modern Column-Oriented Database Systems (Daniel Abadi, Peter Boncz, Stavros Harizopoulos, Stratos Idreos, Samuel Madden)

Why All Column Stores Are Not the Same: Twelve Low-Level Features That Offer High Value to Analysts (Vertica)

Event Streaming Architecture

Books

Making Sense of Stream Processing: The Philosophy Behind Apache Kafka and Scalable Stream Data Platforms (Martin Kleppmann)

More high level than Streaming Systems, but don't let that fool you: Kleppmann more focuses on drawing out the big picture implications of stream data platforms -- paradigm shifting implications

Streaming Architecture: New Designs Using Apache Kafka and MapR Streams (Ted Dunning & Ellen Friedman)

Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing(Tyler Akidau, Slava Chernyak, & Reuvan Lax)

This is the book to read if you are interested in this topic

Blog Posts

Stream processing, Event sourcing, Reactive, CEP… and making sense of it all (Martin Kleppmann)

Very informative post about Stream processing, Event sourcing. Also talks about use cases and trade offs of storing raw event data vs aggregated data

Top 10 Time Series Databases (Outlyer)

Spreadsheet included in this blog post is very useful for a comparison of features

The world beyond batch: Streaming 101 (Tyler Akidau)

The world beyond batch: Streaming 102 (Tyler Akidau)

Trends in Event Stream Processing Products (Roy Schulte & David Luckham)

Covers 6 major trends in CEP services and gives a large listing of them

Questioning the Lambda Architecture (Jay Kreps)

Briefly describes the Lambda Architecture and then compares it with the "Kappa" architecture which just uses a stream processor that runs an additional job

A Note About Rate Limits (Keen IO)

Brief post that describes some of the (older) rate limits at Keen and what they're for. May be useful when we have to implement our own rate limiting Uncategorized

Architecture of Giants: Data Stacks at Facebook, Netflix, Airbnb, and Pinterest (Michelle Wetzler)

Gives simplified layouts of a few major internet giants. What's relevant is there is a visual diagram of Keen.io's IO architecture as well as a brief description. Not as thorough as some of the other links, but it's nice to have a Visual

Post-REST (Tim Bray)

Notes some of the problems with RESTful APIs and then details some of the post-REST APIs coming in the future

Videos

Handling trillions of events daily and conquering scaling issues with Keen CTO (Christophe Limpalair)

Interview with Keen.io's CTO Dan Kador. Lots of good information on Keen.io's architecture here

Podcasts

Kafka, Storm, and Cassandra: Keen IO's Analytic Architecture with Dan Kador (Software Engineering Daily)

Another interview with Dan Kador; I think the video is more useful, but still really good. A bit more technical detail about data transformation here than in the video

Whitepapers

Complex Event Processing (Alejandro Buchmann, TU Darmstadt, Boris Koldehofe)

Complex Event Processing Distributed Systems (David C. Luckham & Brian Frasca)

Complex-Event Processing Poised for Growth (Neal Leavitt)

Continuous Queries over Data Streams (Shivnath Babu & Jennifer Widom)

Distributed and Heterogeneous Event-based Monitoring in Smart Cyber-Physical Systems (Lászió Balogh, István Dávid, István Ráth, Dániel Varró, Andras Vörös)

Event based classification of Web 2.0 text streams (Andreas Bauer & Christian Wolff)

Kafka

Books

Designing Event-Driven Systems: Concepts and Patterns for Streaming Services with Apache Kafka (Ben Stopford)

Kafka: The Definitive Guide (Neha Narkhede, Gwen Shapira, & Todd Palino)

Blog Posts

Apache Kafka vs Amazon Kinesis to Build a High Performance Distributed System (Kyle Wild)

Comparison between Kafka and Kinesis for building something akin to Keen.io

Kafka in a Nutshell (Kevin Sookocheff)

Explains how Kafka works (Kafka topic, replication, Producers and Consumers, Partitions and Brokers, etc)

Should you put several event types in the same Kafka topic? (Martin Kleppmann)

Great article where Kleppmann describes the different situations when you should or shouldn't combine events into the same topic. Make sure to the read the grokbase post he links to in point #4

Understanding When to use RabbitMQ or Apache Kafka (Pieter Humphrey)

This post offers an assessment of the most popular messaging choices today: RabbitMQ and Apache Kafka. Use cases

Introducing Kafka Streams: Stream Processing Made Simple (Jay Kreps)

Introduction to the Kafka Streams API with discussions on Table/Stream theory, windowing, etc.

Courses

Apache Kafka Series - Learn Apache Kafka for Beginners v2 (Stéphane Maarek)

Phenomenal course; if it's on sale, make sure to grab it!

Apache Kafka Series - Kafka Connect Hands-on Learning (Stéphane Maarek)

Videos

Is Kafka a Database? (Martin Kleppmann)

Kafka and Event-Oriented Architecture (Jay Kreps)

Kafka at Scale in the Cloud (Allen Wang)

2016 presentation explaining some of the challenges Netflix had with scaling Kafka in the cloud and their solutions. Slides can be found here

What's New in Kafka 2.1? (Stéphane Maarek)

New features in the recent Kafka 2.1 release

Testing and Benchmarking

Blog Posts

NoSQL Performance Benchmark 2018 – MongoDB, PostgreSQL, OrientDB, Neo4j and ArangoDB (ArangoDB)

Post describing how ArangoDB benchmarked its product, with instructions for using AWS, and scripts

Projects

Stanford Network Analysis Project (Jure Leskovec)

Big data sets that may be useful in automated testing. Used by the benchmarking process for ArangoDB