com.armatiek.infofuze.stream.filesystem.webcrawl
Class WebCrawlReader

java.lang.Object
  extended by java.io.Reader
      extended by com.armatiek.infofuze.stream.SourceReader
          extended by com.armatiek.infofuze.stream.filesystem.FileSystemReader
              extended by com.armatiek.infofuze.stream.filesystem.webcrawl.WebCrawlReader
All Implemented Interfaces:
java.io.Closeable, java.lang.Readable

public class WebCrawlReader
extends FileSystemReader

Reader implementation that provides an XML representation of the (filtered) contents of (a) website(s) providing one or seed URLs. The XML will contain as much structured data as possible regarding the properties, metadata, binary and textual contents of the files.

Author:
Maarten Kroon

Constructor Summary
WebCrawlReader(FileIf[] files, java.util.List<FileExtractor> fileExtractors, Definitions.TransformMode transformMode, long lastIndexed, java.lang.String systemId, java.lang.String publicId)
           
 
Method Summary
 
Methods inherited from class com.armatiek.infofuze.stream.filesystem.FileSystemReader
close, read
 
Methods inherited from class java.io.Reader
mark, markSupported, read, read, read, ready, reset, skip
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WebCrawlReader

public WebCrawlReader(FileIf[] files,
                      java.util.List<FileExtractor> fileExtractors,
                      Definitions.TransformMode transformMode,
                      long lastIndexed,
                      java.lang.String systemId,
                      java.lang.String publicId)