com.armatiek.infofuze.stream.filesystem.webcrawl
Class CrawlState

java.lang.Object
  extended by com.armatiek.infofuze.stream.filesystem.webcrawl.CrawlState

public class CrawlState
extends java.lang.Object

Class containing state information about a crawling process.

Author:
Maarten Kroon

Constructor Summary
CrawlState(java.lang.String seedURI, int maxDepth, int wait, boolean followImages, boolean followScripts, boolean followLinks)
           
 
Method Summary
 boolean getFollowImages()
          Returns whether or not to follow links defined in the src attribute of img tags.
 boolean getFollowLinks()
          Returns whether or not to follow links defined in the href attribute of link tags.
 boolean getFollowScripts()
          Returns whether or not to follow links defined in the src attribute of script tags.
 int getMaxDepth()
          Returns the maximum depth to crawl.
 java.lang.String getSeedURI()
          ´ Returns the seed uri of the current crawl.
 java.util.Set<java.lang.String> getVisitedURLSet()
          Returns the set of visited URLs.
 int getWait()
          Returns ths number of milliseconds to wait after each request.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CrawlState

public CrawlState(java.lang.String seedURI,
                  int maxDepth,
                  int wait,
                  boolean followImages,
                  boolean followScripts,
                  boolean followLinks)
           throws org.apache.commons.httpclient.URIException,
                  java.net.URISyntaxException
Throws:
org.apache.commons.httpclient.URIException
java.net.URISyntaxException
Method Detail

getFollowImages

public boolean getFollowImages()
Returns whether or not to follow links defined in the src attribute of img tags.


getFollowScripts

public boolean getFollowScripts()
Returns whether or not to follow links defined in the src attribute of script tags.


getFollowLinks

public boolean getFollowLinks()
Returns whether or not to follow links defined in the href attribute of link tags.


getMaxDepth

public int getMaxDepth()
Returns the maximum depth to crawl.


getSeedURI

public java.lang.String getSeedURI()
´ Returns the seed uri of the current crawl.


getVisitedURLSet

public java.util.Set<java.lang.String> getVisitedURLSet()
Returns the set of visited URLs.


getWait

public int getWait()
Returns ths number of milliseconds to wait after each request.