Paper Shelf
Each week, I read one white paper related to topics which I either find interesting or curious to deep dive more into it. Following are the papers I've read till now:
Scalability! But at what COST?
COST or 'Configuration that Outperforms a Single Thread'.It examines whether multi-threaded systems always justify their complexity by comparing them to single-threaded alternatives. The paper explores whether high-COST systems are truly necessary for every problem.
Wavenet: A Generative Model for Raw Audio
It presents RNN(recurrent neural network) algorithm to generate audio waveform using past audio waveform. Context: In Oracle, my team works for anomaly detection on usage & cost dataset of OCI customers. While exploring algorithms used by competitors, I found Azure use wavenet for anomaly detection. Hence, I started reading this complex yet insightful paper.
MapReduce: Simplified Data Processing on Large cluster
It presents programmatic way to process large dataset on a big cluster of machines. In 2003, Google introduced this model showcasing its fault tolerance, scalability and how they've implemented this parallel processing in their thousands of existing systems.
Spark: Cluster Computing with working sets
Spark is basically extension of MapReduce where it focus on reusing working dataset across multiple parallel operations while keeping it fault tolerant.
The Google File System
It talks about scalable distributed file system, its design overview on how they make it highly available, fault tolerant, maintaining data integrity, efficient garbage collection and system interactions.
Kafka: A distributed messaging system
Kafka is a distributed pub/sub messaging system created by LinkedIn engineers for online and offline analytics on huge data generated by their frontend and services. It focusses on 'pull' data approach to make kafka clusters stateless, and they also made tradeoffs in inorder to overall increase the throughput. Kafka internally uses Zookeeper to coordinate among brokers(servers) and consumers in distributed environment.