# Interface to EMR Hadoop jobs

## Goal

Create an "interface" (written in Ruby) between OMA's processors boxes and Amazon's Elastic Map Reduce service.

The interface should allow to:

* specify a job flow (collection of related jobs)
* provide parameters to the job flow
* specify callbacks (on success and on failure)

Additional options (should be taken into account, but not implemented immediately):

* ability to monitor active jobs (flows)
* ability to shutdown active jobs (flows)

## Implementation

### ActiveRecord based implementation (rejected)

Create an ActiveRecord model represented a single job flow instance. Create flow models for each flow kind using AR STI.

A cron task (oma-processors) each hour will check active/finished job flow records and call callbacks for finished.

Possible usage:

```
# oma-models/lib/models/postgres/job_flows/emr_base.rb

module JobFlows
  class EmrBase < ActiveRecord::Base
  ...

# oma-models/lib/models/postgres/job_flows/pages_es_index_updater.rb
module job_flows
  class PagesEsIndexUpdater < ActiveRecord::Base
  ...

# oma-processors/...

job_flow = ::JobFlows::PagesEsIndexUpdater.create!(domain_id: domain.id)
job_flow.run

active_flow = ::JobFlows::PagesEsIndexUpdater.active.first
```

JobFlows::EmrBase (and subclasses) uses rslifka/elasticity gem under the hood.

### Pros

* History. Already finished jobs stored in Postgres. It provides info about initial arguments, final statuses, created artifacts (URL of created files etc.).

### Cons

* New ActiveRecord class pollutes oma-models with information about processor implementation details. In particular oma-models depends on rslifka/elasticity gem
* Callbacks (on job flow success or failure) are implemented as methods of an AR class. Thus there is no advantages of closures.

### S3 based implementation (rejected)

Create a ruby class (module?) represented a single job flow instance. Use Amazon S3 as a persistence layer. Save a list of actual job flows (not finished) as a file on S3 (CSV?). Create a ruby class for each particular job flow kind.
