guide
  • Introduction
  • Guiding Principles
    • Mission Statement
    • Conflict Resolution Process
  • Operating Model
    • Working Together
    • Holacracy
      • Meetings
      • Specific Roles
      • Terms and Definitions
      • Finer Points
      • Holacracy-Asana Key
    • Getting Things Done
      • Daily, Weekly, Monthly, and Annual Reviews
      • GTD-Asana Key
    • Transparency
    • Language
    • Budgeting
    • By Department
      • Engineering Operations
  • General Guidelines
  • Employment Policies
    • Equal Opportunity Employment
    • At-Will Employment
    • Code of Conduct in the Community
    • Complaint Policy
    • Drug and Alcohol Policy
    • Vacation, Holiday, and Paid Time Off (PTO) Policy
    • Supplemental Policies for Remote Employees and Contractors
    • Supplemental Policy for Bonus, Commissions, and other Performance-based Payments
    • Supplemental Policies for Hourly International Contractors or Workers
    • Supplemental Policies for Hourly International Contractors or Workers
    • Disputes and Arbitration
  • Benefits and Perks
    • Health Care
    • Vacation, Holiday and Paid Time Off (PTO) Policy
    • Holiday List
  • Hiring Documents
    • Acknowledgement of Receipt
    • Partner Proprietary Information and Inventions Agreement
  • Engineering Wiki
    • Code Snippets
      • Front End Code Snippets
    • Setup
      • 1: Overview of development using Audienti
      • 2: How to setup your dev environment on Docker
      • 2a: Setting up on our cloud your dev server
      • 3: Connect to Production using the VPN
      • 4: Import data into your development environment
    • Deployment
      • Docker based deployment of back end (manual)
    • Culture
      • How our development team works
      • Code Best Practices
    • Tips
      • Setting up a new development machine
      • Importing data to Development environment
      • GIT workflow and work tracking
      • Using Slack
      • Using Rubocop
      • Our Code Standards
      • General suggested best practices
      • Tracking your time
      • Naming Iterations
    • Migrations
      • Postgres
      • ElasticSearch
      • Redis
    • Database and System Maintenance
      • Redis Howtos
      • Elasticsearch HowTos
      • Postgres HowTos
      • Administration recipes
      • App maintenance crash course notes
    • Front End
      • 2016 Plan
      • Deploy
      • Assets
      • SearchLogic
      • How to create UI components
      • OMA Standard Tables
    • Monitoring and Alerting
      • Monitoring Systems
      • Monitoring individual controller actions
      • Get notified when a metric reaches a certain threshold
      • Instrumenting your models using Oma Stats
      • Configuring Graphite Charts
      • Tracking your results with StatsD
      • Logging Fields
      • Updating Kibana Filtering
    • Testing
      • Coverage
      • Elasticsearch mapping config synchronization
      • Testing Gotchas
      • Rspec Preloader
      • Test Best Practices
    • Models
      • Backlinks
    • Queueing and Worker System
      • Queueing and Job Overview
    • Processors
      • Rebuilding Spot Instances
      • Deploying processors
      • Running processors in development
      • Reverting to the previous build on a failed deployment
    • Processors / Opportunity Pipeline
      • Opportunity Pipeline
      • Diagram
    • Processors / Enrichment Pipeline
      • Diagram
      • Clustering
    • Processors / Backlink Pipeline
      • Diagram
      • Backlink Pipeline external APIs
      • Backlink pipeline logic
    • Processors / Automation Pipeline
      • Diagram
      • Automation Pipeline Overview
      • Agents
      • Running in development
    • Messaging and Social Accounts
      • Overview
    • API
      • Audienti API
    • Algorithms
    • Troubleshooting
      • Elasticsearch
    • Big Data Pipeline Stuff
      • Spark
    • Our Product
      • Feature synopsis of our product
    • Research
      • Backend framework comparison
      • Internet marketing Saas companies
    • Code snippets
      • Commonly Used
      • Not Used
    • Miscellaneous
      • Proxies and Bax
    • Legacy & Deprecated
      • Search criteria component
      • Classes list
      • Target Timeline
      • Twitter processor
      • Asset compilation
      • Test related information
      • Interface to EMR Hadoop jobs
      • Mongo Dex Indexes to be Built
      • Mongodb errors
      • Opportunity pipeline scoring
      • Graph Page
      • Lead scoring
      • Insights
      • Shard keys
      • Setting up OMA on local
      • Clone project to local machine
      • Getting around our servers in AWS
  • Acknowledgements
  • Documents That Receiving Your First Payment Triggers Acknowledgement and Acceptanace
Powered by GitBook
On this page
  • Attributes
  • SEO analysis of the backlink
  1. Engineering Wiki
  2. Processors / Backlink Pipeline

Backlink pipeline logic

{
title_words: [
""
],
data_source: "SeoMoz",
source_url: "http://www.pcworld.com/downloads/file/fid,157007-order,4-page,1/download.html?page=14493",
host_match_links_count: 1,
data: {
backlink_id: "4fdc252a93546d02ca001ea1",
mirrored_at: "2012-06-16T06:18:18.000Z"
},
link_status: "live",
market_rank: 1,
destination_url: "http://www.novastor.com",
kind: "momentum",
source_host: "www.pcworld.com",
anchor_words: [
"novastor"
],
title: "",
domain_pr: 0,
backlinks_count: 30,
_id: "4fdc252a93546d02ca001ea1",
locations: { },
page_title_array: [
"novatuneup",
"download,",
"downloads",
"browse",
"page",
"14493",
"downloads",
"list",
"by",
"30",
"day",
"change",
"|",
"pcworld",
"|",
"pcworld"
],
alt: "",
juice: 0.7317073170731707,
created_at: "2012-06-16T06:18:18.000Z",
links_count: 41,
page_digest: "854a6060df71bf6932b0233ca13901e1",
follow: true,
tags: [
"momentum"
],
match_type: "path",
status: "Active",
path_match_links_count: 1,
domain_id: "682",
destination_digest: "6df7146b4ac3966839f0c8207fcaa26f",
source_digest: "e9b28f80ae43fd8714bf75e714af3ad6",
destination_host: "www.novastor.com",
code: 200,
link_value: 0.04878048780487805,
image_link: false,
anchor_text: "NovaStor",
ip_address: "70.42.185.10",
updated_at: "2012-06-16T06:18:18.000Z",
destination_page_code: 200,
page_pr: 2,
page_title: "NovaTuneUp download, Downloads Browse Page 14493 Downloads List By 30 Day Change | PCWorld | PCWorld"
}

Attributes

URL specific attributes

  • title (String) title attribute

  • title_words (Array) tokenized title attribute

  • anchor_text (String) anchor text

  • anchor_words (Array) tokenized anchor text

  • alt (String) alt text

  • follow (boolean) the link is a follow-link yes/no

  • image_link (boolean) the link is an image link yes/no

  • locations (Hash) The location where the link is found (in body, in navigation, ...)

Connection attributes

  • source_url (String)

  • source_host (String)

  • destination_url (String)

  • destination_host (String)

  • code (integer)

  • destination_page_code (integer)

  • ip_address (String)

SEO analysis of the backlink

  • match_type (String) (no_match, host_match, path_match)

  • host_match_link_count (integer) Destination links to the domain we want, but not the path (this is when your money domain is pbs.org/kids/)

  • path_match_link_count (integer) Destination links to the full path of the configured domain (pbs.org/kids)

  • backlinks_count (integer) data retrieved from an api that counts the backlinks to the source_url. This is needed for our own market_rank calculation and link juice/value calculations.

  • links_count (integer) Amount of links on the source page

  • juice (float) How much SEO-value this link represents

  • link_value (float) not sure what the difference is with juice, might be redundant

  • market_rank (integer) OMA custom calculation of pagerank

  • page_pr (integer) Pagerank of source_url

attributes about the source page

  • content_type (String) what the content_type of the page is

  • page_title (String) page title of the page

  • page_title_array (Array) tokenized page title

Metadata

  • domain_id (String)

  • created_at (date)

  • updated_at (date)

  • id (mongodb id)

  • tags (Array) -

    >

    tags is an array that contains the kind and match_type

  • data_source (String) Seomoz or ahrefs source, not used in frontend

  • kind (String) This contains the classification (missing, momentum, relevance, authority, strategic)

  • link_status (String) missing or live

Useless data

  • status (String) Status is always "Active"

  • page_digest (String) Who cares about the page digest really?

  • data (Hash)

  • destination_digest (String) Hash of destination, no value afaik

  • source_digest (String) Hash of source, no value afaik

  • history (Array) Seems like there's a history record on some, this needs to be solved differently imo.

  • domain_pr (integer) This is some bogus calculation that isn't grounded in SEO, in fact the method calculating it contained a comment saying so:

def domain_pr
  # this is wrong.. we should be pulling it from some other service, need to think
  # about how to fix this
  MarketFu::Calculator.forecasted_pagerank(host_matching_links.count)
end
PreviousBacklink Pipeline external APIsNextProcessors / Automation Pipeline

Last updated 7 years ago