WebCrawlReader

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.armatiek.infofuze.stream.filesystem.webcrawl
Class WebCrawlReader

java.lang.Object
  java.io.Reader
      com.armatiek.infofuze.stream.SourceReader
          com.armatiek.infofuze.stream.filesystem.FileSystemReader
              com.armatiek.infofuze.stream.filesystem.webcrawl.WebCrawlReader

All Implemented Interfaces:: java.io.Closeable, java.lang.Readable

public class WebCrawlReader
extends FileSystemReader
extends FileSystemReader

Reader implementation that provides an XML representation of the (filtered) contents of (a) website(s) providing one or seed URLs. The XML will contain as much structured data as possible regarding the properties, metadata, binary and textual contents of the files.

Author:: Maarten Kroon

Constructor Summary
`WebCrawlReader(FileIf[] files, java.util.List<FileExtractor> fileExtractors, Definitions.TransformMode transformMode, long lastIndexed, java.lang.String systemId, java.lang.String publicId)`

Method Summary

Methods inherited from class com.armatiek.infofuze.stream.filesystem.FileSystemReader
`close, read`

Methods inherited from class java.io.Reader
`mark, markSupported, read, read, read, ready, reset, skip`

Methods inherited from class java.lang.Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

WebCrawlReader

public WebCrawlReader(FileIf[] files,
                      java.util.List<FileExtractor> fileExtractors,
                      Definitions.TransformMode transformMode,
                      long lastIndexed,
                      java.lang.String systemId,
                      java.lang.String publicId)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.armatiek.infofuze.stream.filesystem.webcrawl Class WebCrawlReader

WebCrawlReader

com.armatiek.infofuze.stream.filesystem.webcrawl
Class WebCrawlReader