com.armatiek.infofuze.stream.filesystem.webcrawl
Class WebCrawlReader
java.lang.Object
java.io.Reader
com.armatiek.infofuze.stream.SourceReader
com.armatiek.infofuze.stream.filesystem.FileSystemReader
com.armatiek.infofuze.stream.filesystem.webcrawl.WebCrawlReader
- All Implemented Interfaces:
- java.io.Closeable, java.lang.Readable
public class WebCrawlReader
- extends FileSystemReader
Reader implementation that provides an XML representation of the (filtered)
contents of (a) website(s) providing one or seed URLs. The XML will contain
as much structured data as possible regarding the properties,
metadata, binary and textual contents of the files.
- Author:
- Maarten Kroon
Methods inherited from class java.io.Reader |
mark, markSupported, read, read, read, ready, reset, skip |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
WebCrawlReader
public WebCrawlReader(FileIf[] files,
java.util.List<FileExtractor> fileExtractors,
Definitions.TransformMode transformMode,
long lastIndexed,
java.lang.String systemId,
java.lang.String publicId)