EngineeringWiki
  • Introduction
  • Top level Overview of the application
  • FAQs
  • Back End
    • Agent Pipeline
    • Mention Pipeline
    • Profile Pipeline
    • Errors
    • Overview of the Mention/Profile/Cluster Process
    • Adding a New Service
    • Activity and Status Tracking
  • Setup
    • Overview
    • How to Setup Your Local Machine
    • Setup - Cloud Machine
    • Infrastructure
    • Docker
    • Bash Commands
    • Setting up front end in Ubuntu 16.04 desktop
  • Gems/Libraries
    • Bax
    • Creating fixtures for Unit Tests
    • Audienti-Retriever
    • Scour
    • Haystack
    • Audienti-Indexer
    • Audienti-Api
    • Handler
    • Blackbook
    • Allusion
  • Code
    • Multi-step Modal Wizard
    • Structure
    • Audienti DataTables
    • Javascript
      • Passing Props From Root
      • Looping in JS
      • Binding Actions to App
      • CSSTransitionGroup
      • Code Best Practices
      • Reducer Updating an Array with Item in Middle
      • Organizing Javascript
      • Filter Array by Id
    • Design Language
  • Working
    • PostgresSQL
    • S3
    • Terminology
    • Interview Tests
    • Application Descriptions
    • Best Practices
      • Code Organization
      • Code Documentation (using Yard)
      • Git Workflow
      • Tasks and Queues
      • Working in Backend
    • Profiles & Enrichment
      • Profile ID Rules
  • Management
    • API Management
    • Bastion
    • Splash Proxy
    • Rancher
      • OpenVPN Server
      • Traefik Reverse Proxy
  • Teams & Interviews
    • Interview Questions
  • Culture
    • What Makes a World Class Engineer
  • Situational Statuses
    • 2017-11-03
    • 2018-01-09
  • Operations
Powered by GitBook
On this page
  • Introduction
  • Back end

Top level Overview of the application

PreviousIntroductionNextFAQs

Last updated 7 years ago

Introduction

This document is the overview of upcoming Audienti app. The idea is to give a top level overview of the application. The application has typically two parts - Back end part and front end part.

Back end is mainly number of jobs and tables.

Back end

Back end has following important sections:

  • Pipelines

  • Maintenance

  • Routing

  • Front end

  • Automation

  • API

Pipeline or mention is basically the entry point of a data (Most of the data comes from these mention pipelines).

Typically a user comes and creates a project first. After project creation, she adds keywords to that project. A job then launched then to create word master.

  • Keyword - A keyword is within a project

  • Word Master - Keyword is associated with word master. With word master, the same data needn't to retrieved across different projects if same keyword is used.

When there is a keyword existing in word master then that keyword get associated with that word master. If there is no word present in word master then it creates a new word master. Similarly during delete of a keyword, if the word master associated is getting used with other projects then word master doesn't get deleted but that word master is not been used by any project then it gets deleted as well. Word master is all dynamic and it gets created and deleted dynamically.

Say user added a keyword wine or beer, then it shall get associated with a word master if similar keyword is available already and if not it shall get created. By the concept of word master, the data retrieval process need not to be repeated for same keyword again and again. The whole process becomes more efficient in terms of performance.

When word master is created, it triggers a job (a back ground task) called afterwordmastercreate. It triggers 30 retrieve jobs for a particular keyword. This is been handled through QueueJobs.

Following streams of information are scanned for mentions:

  • Facebook, Twitter, LinkedIn, Tumbler, Pinterest, Instagram & other social media

  • Forums, blogs

  • Broadcast - Radio, TV

  • e-Commerce review and other reviews

Retrieve jobs are usually daily jobs and always look for new items. Retrieve jobs are retrieving the most recent 100 relevant jobs. Retrieve job eventually triggers 100 convert jobs. Convert jobs converts them into mentions.

SegmentJob: it has filter associated with raw mention. we can have as many mention as possible. Like I want all food where su-se is mentioned.

CountJob - it counts the number of hash tab or number people in the mentions. We provide analytics through this CountJobs.

Profile & Enrich Job

One of the task during mentionjob, it also looks for its profile. Like a twitter account is associated to a profile. If the profile doesn't exist then it will be created with data available. And with that profile data, enrich data will be called. With the profile data, it looks for additional data. (Like how many profiles you have, email address and other additional informations)

If the profile is already enriched then it won't be enriched again so that we don't keep enriching a profile again and again. Profile enrichment happens ONLY after certain given date like may be after 30 days etc.

Data Services (For additional information against a word master/keyword)

After enrichment, we try to find additional data about the mention. With a keyword, we also launch some additional jobs like Majestic, SEMRush etc. Like a keyword wine, how many times this word is mentioned in Google search etc. So that people can decide for keywords. Trying to find out search volume, etc.

Route Jobs

When mention is created, then route mention jobs kicks in and figures out what to do and where need to go. Similarly for profile enrich job. RouteActivity is job is mention activity. Based on activity, some certain action can be done. RouteEventJobs are for automation work.

RouteProfileJob decides when data richment needs to be done. Actually decides whether data richment done or not. Actually it is route jobs which decides what to do.