Automation Pipeline Overview
The automation pipeline is based heavily on Huginn, a data pipeline tool for having "automated agents" that operate on your behalf. The best way to understand what the pipeline is trying to accomplish is to read and watch the tutorial as provided by Huginn.
From the baseline understanding provided above, here are the BIG things that are different about how our system works versus Huginn.
Events are stored in ElasticSearch versus PostgreSQL. The implications of this are significant. How Huginn sorted and queries for which events to process was primarily based on, and counting on, the idea that an event ID would be an integer. So, it would only retrieve events past a sequence number. Since ElasticSearch IDs are strings, they cannot be used this way. So, we had to change the query/retrieve system to be timestamp based. To ensure an event isn't processed twice, we have a "processed ids" on the vent. An agent can only process an event 1 time.
Our system can "split" the stream to do split tests, Huginn could not. In Huginn, everything has to be passed to every downstream agent. Obviously for use cases like a traditional marketing split test, this doesn't work. So, we added the concept of "event groups" to Huginn. These event groups essentially let you split the stream. Downstream agents can choose to take "all" of the events from their upstream agents. Or, they can listen to a subset. This is handled with an "event group". This also means we have had to implement things like "merge" to merge them all back together again.
Workspaces. Huginn is all in one "space." There is no concept of walled off events or sections. In our automation system, you have a "Workspace" and that workspace contains an automation. This keeps them separate, and lets you measure metrics/performance for each workflow you're running.
GUI. Huginn does not have a GUI. Instead, it relies on you to configure JSON configuration hashes in its UI. We have a drag and drop graphical workflow builder.
THE PLAN
The plan is to try to remain as close to Huginn as possible, so that when Huginn has a new agent, we can, with minimal effort, port that Agent to our system.
Last updated