# Backend framework comparison

An overview of data processing technologies and ecosystems that might be interesting for us.

## [Spark](http://spark.apache.org/)

* Seems to be the current favorite. Everyone seems to recommend it over hadoop.
* Has model for both streaming and batch (map-reduce)
* Supports explorative queries. Spark SQL. Designed to support ML algorithms.
* Supported on Amazon straight off the box (Elastic mapreduce)
* Very strong community
* No ruby. Scala or Java, and they don't seem to have a plan for JRuby
* Has [beautiful support](https://www.elastic.co/guide/en/elasticsearch/hadoop/master/spark.html) for elasticsearch

## [Fluentd](http://www.fluentd.org/)

* Data collection framework made for logs.
* can split data into several endpoints, one being hdfs
* In memory aggregations?
* <http://docs.fluentd.org/articles/cep-norikra>
* Complex event processing with JRuby, including SQL queries of streams

## [Hadoop](https://hadoop.apache.org/)

* No concept of streams
* Old. Familiar. Mature.

## Tutorials

* <https://www.youtube.com/watch?v=Txjp37mR7xw>
* Provides a tutorial on big data processing on google cloud with FluentD and Norikra. Go to 1hr 30min in it.
