Skip to topic | Skip to bottom
Home
Main
Main.WebSpamTaxonomyr1.2 - 16 Nov 2007 - 03:17 - HenryFeildtopic end

Start of topic | Skip to actions
Date Place Author Keyword(s)
2005 AIRWeb Gyongyi & Garcia-Molina web spam, spam farm

Summary

The problem of techniques designed to circumvent proper operation ranking algorithms is formalized and categorized here.

Boosting

Boosting describes the process of improving the relevance or importance of a page (or set of pages) without actually improving the quality of the content.

Term Spamming

Location-based

  • Body Spam
  • Title Spam
  • Meta Tag Spam
  • Anchor Text Spam
  • URL Spam

Content-based

  • Repetition
  • Dumping
  • Weaving
  • Phrsae Stitching

Link Spamming

This describes manipulation of incoming/outgoing links to alter relevance or importance.
  • inaccessible pages are pages that the spammer has no control over.
  • accessible pages are where the spammer can exert some control. There are m of such resources.
  • own pages are owned by the spammer. A group of owned pages is a spam farm. There are n owned pages.
  • t represents the target page that the spammer would like to boost.

Outgoing Links

Example: directory cloning

Incoming Links

  • honey pots
  • infiltration
  • social network spamming
  • Link exchanges
  • Recovering expired domains
  • Create your own spam farm

Hiding

  • Content Hiding
  • Cloaking
  • Redirection

Contribution

  • Provides a structured framework and lexicon to facilitate the discussion on web spamming and countermeasure techniques.

Comment

  • Nice, easy-to-read paper that gives a high-level overview of web spam.

Reference

-- MarcCartright - 15 Nov 2007
to top


You are here: Main > Fall2007ReadingGroup > WebSpamTaxonomy

to top

Copyright © 1999-2008 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback