Graph Page
(by nicholas, last updated: jun 2013)
headers
headers returned by server
userinfo
?
issue_category
?
scheme:
url scheme (http/https/ftp/...)
duplication_types
?
issue_ids
issues this page has by id
description
meta-description from this page
links_count
total count of links on the page
link_urls
all urls for all the links on the page
redirect_to_string
redirect location if 30x
last_crawled
last crawl date
count_of_words
word count
duplicate_page_ids
reference to duplicate pages
page_inlinks_count
inbound links count internal links count. total amount of links to this page in this domain.
host
host this page is found on
keywords
no idea, seems always empty
status:
Active/inactive No idea what it stands for
s3_histories:
reference to the histories stored on S3
title_simhash
simularity hash for the title
description_simhash
description simularity hash
code
returned status code
url
url of the page
content
No idea, is always empty
updated_at
last updated timestamp
response_time
response time of the page
links_search
no idea
images
list of images on the page
duplicate_simhash
simhash to detect duplicate pages (body)
redirect_to:
redirect location if 30x (duplicate of redirect_to_string)
port:
port used to fetch page
opengraph_site_name
open graph site name if present
query
query of the url if present
data
??
potential_keywords
list of potential keywords
digest:
hash of cleaned body
kind
This is always "Page" TODO: Ask william
authority
classification of authority
siphon
sinkhole
conversion
bridge
title
page title
title_duplicate_pages
title of duplicate pages
opengraph_type
opengraph type if present on page
path
url path
created_at
creation timestamp
absolute_url
absolute url
duplicate_pages
duplicate pages ids (?? duplicate_page_ids dupe?),
fragment
fragment of url
full_text
full text of the page
canonical
no idea TODO: Ask william
tags
list of tags (duplicate of body, conversion, ...)
backlinks
backlink count
inlink_list
list of inlinks
last_rechecked
timestamp of last recheck
domain_id
domain it belongs to
links
(??? always empty)
body_digest
hash of body
links_info
list of links on the page with detail information
robot_tag
robot tags attached to this page (all, index, nofollow)...(??? how is this calculated)
description_duplicate_pages
list of duplicate description pages
simhash
total simhash
anchor_texts
list of anchor texts on the page
Last updated