Backlink pipeline logic

{
title_words: [
""
],
data_source: "SeoMoz",
source_url: "http://www.pcworld.com/downloads/file/fid,157007-order,4-page,1/download.html?page=14493",
host_match_links_count: 1,
data: {
backlink_id: "4fdc252a93546d02ca001ea1",
mirrored_at: "2012-06-16T06:18:18.000Z"
},
link_status: "live",
market_rank: 1,
destination_url: "http://www.novastor.com",
kind: "momentum",
source_host: "www.pcworld.com",
anchor_words: [
"novastor"
],
title: "",
domain_pr: 0,
backlinks_count: 30,
_id: "4fdc252a93546d02ca001ea1",
locations: { },
page_title_array: [
"novatuneup",
"download,",
"downloads",
"browse",
"page",
"14493",
"downloads",
"list",
"by",
"30",
"day",
"change",
"|",
"pcworld",
"|",
"pcworld"
],
alt: "",
juice: 0.7317073170731707,
created_at: "2012-06-16T06:18:18.000Z",
links_count: 41,
page_digest: "854a6060df71bf6932b0233ca13901e1",
follow: true,
tags: [
"momentum"
],
match_type: "path",
status: "Active",
path_match_links_count: 1,
domain_id: "682",
destination_digest: "6df7146b4ac3966839f0c8207fcaa26f",
source_digest: "e9b28f80ae43fd8714bf75e714af3ad6",
destination_host: "www.novastor.com",
code: 200,
link_value: 0.04878048780487805,
image_link: false,
anchor_text: "NovaStor",
ip_address: "70.42.185.10",
updated_at: "2012-06-16T06:18:18.000Z",
destination_page_code: 200,
page_pr: 2,
page_title: "NovaTuneUp download, Downloads Browse Page 14493 Downloads List By 30 Day Change | PCWorld | PCWorld"
}

Attributes

URL specific attributes

  • title (String) title attribute

  • title_words (Array) tokenized title attribute

  • anchor_text (String) anchor text

  • anchor_words (Array) tokenized anchor text

  • alt (String) alt text

  • follow (boolean) the link is a follow-link yes/no

  • image_link (boolean) the link is an image link yes/no

  • locations (Hash) The location where the link is found (in body, in navigation, ...)

Connection attributes

  • source_url (String)

  • source_host (String)

  • destination_url (String)

  • destination_host (String)

  • code (integer)

  • destination_page_code (integer)

  • ip_address (String)

  • match_type (String) (no_match, host_match, path_match)

  • host_match_link_count (integer) Destination links to the domain we want, but not the path (this is when your money domain is pbs.org/kids/)

  • path_match_link_count (integer) Destination links to the full path of the configured domain (pbs.org/kids)

  • backlinks_count (integer) data retrieved from an api that counts the backlinks to the source_url. This is needed for our own market_rank calculation and link juice/value calculations.

  • links_count (integer) Amount of links on the source page

  • juice (float) How much SEO-value this link represents

  • link_value (float) not sure what the difference is with juice, might be redundant

  • market_rank (integer) OMA custom calculation of pagerank

  • page_pr (integer) Pagerank of source_url

attributes about the source page

  • content_type (String) what the content_type of the page is

  • page_title (String) page title of the page

  • page_title_array (Array) tokenized page title

Metadata

  • domain_id (String)

  • created_at (date)

  • updated_at (date)

  • id (mongodb id)

  • tags (Array) -

    >

    tags is an array that contains the kind and match_type

  • data_source (String) Seomoz or ahrefs source, not used in frontend

  • kind (String) This contains the classification (missing, momentum, relevance, authority, strategic)

  • link_status (String) missing or live

Useless data

  • status (String) Status is always "Active"

  • page_digest (String) Who cares about the page digest really?

  • data (Hash)

  • destination_digest (String) Hash of destination, no value afaik

  • source_digest (String) Hash of source, no value afaik

  • history (Array) Seems like there's a history record on some, this needs to be solved differently imo.

  • domain_pr (integer) This is some bogus calculation that isn't grounded in SEO, in fact the method calculating it contained a comment saying so:

def domain_pr
  # this is wrong.. we should be pulling it from some other service, need to think
  # about how to fix this
  MarketFu::Calculator.forecasted_pagerank(host_matching_links.count)
end

Last updated