corelib.services.web.searchengine.scanners
Interface DocumentScanner

All Known Implementing Classes:
WebPageScanner

public interface DocumentScanner

This interface represents the concept of document scanner for a given file type.

CAUTION: an implementation of DocumentScanner interface must allow backtracking exceptions of InterruptedException type. Otherwise, the Indexing Service could not be stopped in case of redeployment of the considered web application on the server.

Author:
Dominique Liard

Method Summary
 java.lang.String getDocumentTitle()
          Return the title of the scanned document, if exists, otherwise return an empty string.
 WordDictionary scanWebPage(java.lang.String documentFilename)
          This method starts the scanning process for the considered document.
 

Method Detail

getDocumentTitle

java.lang.String getDocumentTitle()
Return the title of the scanned document, if exists, otherwise return an empty string.

Returns:
The document title.

scanWebPage

WordDictionary scanWebPage(java.lang.String documentFilename)
                           throws DocumentScannerException,
                                  java.lang.InterruptedException
This method starts the scanning process for the considered document.

Parameters:
documentFilename - the document filename
Returns:
A set of words and associated scores.
Throws:
DocumentScannerException - This exception is thrown if the scanner can not handle the document. In this case, it is likely that the file extension does not correspond with the exact nature of the considered document.
java.lang.InterruptedException - This exception is thrown if the current web application thread pool is shutdown for redeployment considerations.


CAUTION: Ellipse is proposed to you in BETA version to allow evaluation of this framework. Infini Software is released from any responsibility for the use of Ellipse Framework.

Copyright 2012 Infini Software - All Rights Reserved.