guide
  • Introduction
  • Guiding Principles
    • Mission Statement
    • Conflict Resolution Process
  • Operating Model
    • Working Together
    • Holacracy
      • Meetings
      • Specific Roles
      • Terms and Definitions
      • Finer Points
      • Holacracy-Asana Key
    • Getting Things Done
      • Daily, Weekly, Monthly, and Annual Reviews
      • GTD-Asana Key
    • Transparency
    • Language
    • Budgeting
    • By Department
      • Engineering Operations
  • General Guidelines
  • Employment Policies
    • Equal Opportunity Employment
    • At-Will Employment
    • Code of Conduct in the Community
    • Complaint Policy
    • Drug and Alcohol Policy
    • Vacation, Holiday, and Paid Time Off (PTO) Policy
    • Supplemental Policies for Remote Employees and Contractors
    • Supplemental Policy for Bonus, Commissions, and other Performance-based Payments
    • Supplemental Policies for Hourly International Contractors or Workers
    • Supplemental Policies for Hourly International Contractors or Workers
    • Disputes and Arbitration
  • Benefits and Perks
    • Health Care
    • Vacation, Holiday and Paid Time Off (PTO) Policy
    • Holiday List
  • Hiring Documents
    • Acknowledgement of Receipt
    • Partner Proprietary Information and Inventions Agreement
  • Engineering Wiki
    • Code Snippets
      • Front End Code Snippets
    • Setup
      • 1: Overview of development using Audienti
      • 2: How to setup your dev environment on Docker
      • 2a: Setting up on our cloud your dev server
      • 3: Connect to Production using the VPN
      • 4: Import data into your development environment
    • Deployment
      • Docker based deployment of back end (manual)
    • Culture
      • How our development team works
      • Code Best Practices
    • Tips
      • Setting up a new development machine
      • Importing data to Development environment
      • GIT workflow and work tracking
      • Using Slack
      • Using Rubocop
      • Our Code Standards
      • General suggested best practices
      • Tracking your time
      • Naming Iterations
    • Migrations
      • Postgres
      • ElasticSearch
      • Redis
    • Database and System Maintenance
      • Redis Howtos
      • Elasticsearch HowTos
      • Postgres HowTos
      • Administration recipes
      • App maintenance crash course notes
    • Front End
      • 2016 Plan
      • Deploy
      • Assets
      • SearchLogic
      • How to create UI components
      • OMA Standard Tables
    • Monitoring and Alerting
      • Monitoring Systems
      • Monitoring individual controller actions
      • Get notified when a metric reaches a certain threshold
      • Instrumenting your models using Oma Stats
      • Configuring Graphite Charts
      • Tracking your results with StatsD
      • Logging Fields
      • Updating Kibana Filtering
    • Testing
      • Coverage
      • Elasticsearch mapping config synchronization
      • Testing Gotchas
      • Rspec Preloader
      • Test Best Practices
    • Models
      • Backlinks
    • Queueing and Worker System
      • Queueing and Job Overview
    • Processors
      • Rebuilding Spot Instances
      • Deploying processors
      • Running processors in development
      • Reverting to the previous build on a failed deployment
    • Processors / Opportunity Pipeline
      • Opportunity Pipeline
      • Diagram
    • Processors / Enrichment Pipeline
      • Diagram
      • Clustering
    • Processors / Backlink Pipeline
      • Diagram
      • Backlink Pipeline external APIs
      • Backlink pipeline logic
    • Processors / Automation Pipeline
      • Diagram
      • Automation Pipeline Overview
      • Agents
      • Running in development
    • Messaging and Social Accounts
      • Overview
    • API
      • Audienti API
    • Algorithms
    • Troubleshooting
      • Elasticsearch
    • Big Data Pipeline Stuff
      • Spark
    • Our Product
      • Feature synopsis of our product
    • Research
      • Backend framework comparison
      • Internet marketing Saas companies
    • Code snippets
      • Commonly Used
      • Not Used
    • Miscellaneous
      • Proxies and Bax
    • Legacy & Deprecated
      • Search criteria component
      • Classes list
      • Target Timeline
      • Twitter processor
      • Asset compilation
      • Test related information
      • Interface to EMR Hadoop jobs
      • Mongo Dex Indexes to be Built
      • Mongodb errors
      • Opportunity pipeline scoring
      • Graph Page
      • Lead scoring
      • Insights
      • Shard keys
      • Setting up OMA on local
      • Clone project to local machine
      • Getting around our servers in AWS
  • Acknowledgements
  • Documents That Receiving Your First Payment Triggers Acknowledgement and Acceptanace
Powered by GitBook
On this page
  • headers
  • userinfo
  • issue_category
  • scheme:
  • duplication_types
  • issue_ids
  • description
  • links_count
  • link_urls
  • redirect_to_string
  • last_crawled
  • count_of_words
  • duplicate_page_ids
  • page_inlinks_count
  • host
  • keywords
  • status:
  • s3_histories:
  • title_simhash
  • description_simhash
  • code
  • url
  • content
  • updated_at
  • response_time
  • links_search
  • images
  • duplicate_simhash
  • redirect_to:
  • port:
  • opengraph_site_name
  • query
  • data
  • potential_keywords
  • digest:
  • kind
  • authority
  • title
  • title_duplicate_pages
  • opengraph_type
  • path
  • created_at
  • absolute_url
  • duplicate_pages
  • fragment
  • full_text
  • canonical
  • tags
  • backlinks
  • inlink_list
  • last_rechecked
  • domain_id
  • links
  • body_digest
  • links_info
  • robot_tag
  • description_duplicate_pages
  • simhash
  • anchor_texts
  1. Engineering Wiki
  2. Legacy & Deprecated

Graph Page

(by nicholas, last updated: jun 2013)

headers

headers returned by server

userinfo

?

issue_category

?

scheme:

url scheme (http/https/ftp/...)

duplication_types

?

issue_ids

issues this page has by id

description

meta-description from this page

links_count

total count of links on the page

link_urls

all urls for all the links on the page

redirect_to_string

redirect location if 30x

last_crawled

last crawl date

count_of_words

word count

duplicate_page_ids

reference to duplicate pages

page_inlinks_count

inbound links count internal links count. total amount of links to this page in this domain.

host

host this page is found on

keywords

no idea, seems always empty

status:

Active/inactive No idea what it stands for

s3_histories:

reference to the histories stored on S3

title_simhash

simularity hash for the title

description_simhash

description simularity hash

code

returned status code

url

url of the page

content

No idea, is always empty

updated_at

last updated timestamp

response_time

response time of the page

links_search

no idea

images

list of images on the page

duplicate_simhash

simhash to detect duplicate pages (body)

redirect_to:

redirect location if 30x (duplicate of redirect_to_string)

port:

port used to fetch page

opengraph_site_name

open graph site name if present

query

query of the url if present

data

??

potential_keywords

list of potential keywords

digest:

hash of cleaned body

kind

This is always "Page" TODO: Ask william

authority

classification of authority

  • siphon

  • sinkhole

  • conversion

  • bridge

title

page title

title_duplicate_pages

title of duplicate pages

opengraph_type

opengraph type if present on page

path

url path

created_at

creation timestamp

absolute_url

absolute url

duplicate_pages

duplicate pages ids (?? duplicate_page_ids dupe?),

fragment

fragment of url

full_text

full text of the page

canonical

no idea TODO: Ask william

tags

list of tags (duplicate of body, conversion, ...)

backlinks

backlink count

inlink_list

list of inlinks

last_rechecked

timestamp of last recheck

domain_id

domain it belongs to

links

(??? always empty)

body_digest

hash of body

links_info

list of links on the page with detail information

robot_tag

robot tags attached to this page (all, index, nofollow)...(??? how is this calculated)

description_duplicate_pages

list of duplicate description pages

simhash

total simhash

anchor_texts

list of anchor texts on the page

PreviousOpportunity pipeline scoringNextLead scoring

Last updated 7 years ago