Top level Overview of the application
Last updated
Last updated
This document is the overview of upcoming Audienti app. The idea is to give a top level overview of the application. The application has typically two parts - Back end part and front end part.
Back end is mainly number of jobs and tables.
Back end has following important sections:
Pipelines
Maintenance
Routing
Front end
Automation
API
Pipeline or mention is basically the entry point of a data (Most of the data comes from these mention pipelines).
Typically a user comes and creates a project first. After project creation, she adds keywords to that project. A job then launched then to create word master.
Keyword - A keyword is within a project
Word Master - Keyword is associated with word master. With word master, the same data needn't to retrieved across different projects if same keyword is used.
When there is a keyword existing in word master then that keyword get associated with that word master. If there is no word present in word master then it creates a new word master. Similarly during delete of a keyword, if the word master associated is getting used with other projects then word master doesn't get deleted but that word master is not been used by any project then it gets deleted as well. Word master is all dynamic and it gets created and deleted dynamically.
Say user added a keyword wine or beer, then it shall get associated with a word master if similar keyword is available already and if not it shall get created. By the concept of word master, the data retrieval process need not to be repeated for same keyword again and again. The whole process becomes more efficient in terms of performance.
When word master is created, it triggers a job (a back ground task) called afterwordmastercreate. It triggers 30 retrieve jobs for a particular keyword. This is been handled through QueueJobs.
Following streams of information are scanned for mentions:
Facebook, Twitter, LinkedIn, Tumbler, Pinterest, Instagram & other social media
Forums, blogs
Broadcast - Radio, TV
e-Commerce review and other reviews
Retrieve jobs are usually daily jobs and always look for new items. Retrieve jobs are retrieving the most recent 100 relevant jobs. Retrieve job eventually triggers 100 convert jobs. Convert jobs converts them into mentions.
SegmentJob: it has filter associated with raw mention. we can have as many mention as possible. Like I want all food where su-se is mentioned.
CountJob - it counts the number of hash tab or number people in the mentions. We provide analytics through this CountJobs.
Profile & Enrich Job
One of the task during mentionjob, it also looks for its profile. Like a twitter account is associated to a profile. If the profile doesn't exist then it will be created with data available. And with that profile data, enrich data will be called. With the profile data, it looks for additional data. (Like how many profiles you have, email address and other additional informations)
If the profile is already enriched then it won't be enriched again so that we don't keep enriching a profile again and again. Profile enrichment happens ONLY after certain given date like may be after 30 days etc.
Data Services (For additional information against a word master/keyword)
After enrichment, we try to find additional data about the mention. With a keyword, we also launch some additional jobs like Majestic, SEMRush etc. Like a keyword wine, how many times this word is mentioned in Google search etc. So that people can decide for keywords. Trying to find out search volume, etc.
Route Jobs
When mention is created, then route mention jobs kicks in and figures out what to do and where need to go. Similarly for profile enrich job. RouteActivity is job is mention activity. Based on activity, some certain action can be done. RouteEventJobs are for automation work.
RouteProfileJob decides when data richment needs to be done. Actually decides whether data richment done or not. Actually it is route jobs which decides what to do.