Lucene.Net

This interface describes a character stream that maintains line and column number positions of the characters. It also has the capability to backup the stream to some extent. An implementation of this interface is used in the TokenManager implementation generated by JavaCCParser. All the methods except backup can be implemented in any fashion. backup needs to be implemented correctly for the correct operation of the lexer. Rest of the methods are all used to get information like line number, column number and the String that constitutes a token and are not used by the lexer. Hence their implementation won't affect the generated lexer's operation.

Returns the next character from the selected input. The method of selecting the input is the responsibility of the class implementing this interface. Can throw any java.io.IOException.

Returns the column number of the last character for current token (being matched after the last call to BeginTOken).

Returns the line number of the last character for current token (being matched after the last call to BeginTOken).

Returns the column number of the first character for current token (being matched after the last call to BeginTOken).

Returns the line number of the first character for current token (being matched after the last call to BeginTOken).

Backs up the input stream by amount steps. Lexer calls this method if it had already read some characters, but could not use them to match a (longer) token. So, they will be used again as the prefix of the next token and it is the implemetation's responsibility to do this right.

Returns the next character that marks the beginning of the next token. All characters must remain in the buffer between two successive calls to this method to implement backup correctly.

Returns a string made up of characters from the marked token beginning to the current buffer position. Implementations have the choice of returning anything that they want to. For example, for efficiency, one might decide to just return null, which is a valid implementation.

Returns an array of characters that make up the suffix of length 'len' for the currently matched token. This is used to build up the matched string for use in actions in the case of MORE. A simple and inefficient implementation of this is as follows : { String t = GetImage(); return t.substring(t.length() - len, t.length()).toCharArray(); }

The lexer calls this function to indicate that it is done with the stream and hence implementations can free any resources held by this class. Again, the body of this function can be just empty and it will not affect the lexer's operation.

Constructs from a Reader.

This exception is thrown when parse errors are encountered. You can explicitly create objects of this exception type by calling the method generateParseException in the generated parser. You can modify this class to customize your error reporting mechanisms so long as you retain the public fields.

This constructor is used by the method "generateParseException" in the generated parser. Calling this constructor generates a new object of this type with the fields "currentToken", "expectedTokenSequences", and "tokenImage" set. The boolean flag "specialConstructor" is also set to true to indicate that this constructor was used to create this object. This constructor calls its super class with the empty string to force the "toString" method of parent class "Throwable" to print the error message in the form: ParseException: <result of getMessage>

The following constructors are for use by you for whatever purpose you can think of. Constructing the exception in this manner makes the exception behave in the normal way - i.e., as documented in the class "Throwable". The fields "errorToken", "expectedTokenSequences", and "tokenImage" do not contain relevant information. The JavaCC generated code does not use these constructors.

This variable determines which constructor was used to create this object and thereby affects the semantics of the "getMessage" method (see below).

This is the last token that has been consumed successfully. If this object has been created due to a parse error, the token followng this token will (therefore) be the first error token.

Each entry in this array is an array of integers. Each array of integers represents a sequence of tokens (by their ordinal values) that is expected at this point of the parse.

This is a reference to the "tokenImage" array of the generated parser within which the parse error occurred. This array is defined in the generated ...Constants interface.

The end of line string for this machine.

Used to convert raw characters to their escaped version when these raw version cannot be used as part of an ASCII string literal.

This method has the standard behavior when this object has been created using the standard constructors. Otherwise, it uses "currentToken" and "expectedTokenSequences" to generate a parse error message and returns it. If this object has been created due to a parse error, and you do not catch it (it gets thrown from the parser), then this method is called during the printing of the final stack trace, and hence the correct error message gets displayed.

Filters {@link StandardTokenizer} with {@link StandardFilter}, {@link LowerCaseFilter} and {@link StopFilter}, using a list of English stop words.

$Id: StandardAnalyzer.java 219090 2005-07-14 20:36:28Z dnaber $

Creates a TokenStream which tokenizes all the text in the provided Reader. Default implementation forwards to tokenStream(Reader) for compatibility with older version. Override to allow Analyzer to choose strategy based on document and/or field. Must be able to handle null field name for backward compatibility.

Invoked before indexing a Field instance if terms have already been added to that field. This allows custom analyzers to place an automatic position increment gap between Field instances using the same field name. The default value position increment gap is 0. With a 0 position increment gap and the typical default token position increment of 1, all terms in a field, including across Field instances, are in successive positions, allowing exact PhraseQuery matches, for instance, across Field instance boundaries.

Field name being indexed. position increment gap, added to the next token emitted from {@link #TokenStream(String,Reader)}

An array containing some common English words that are usually not useful for searching.

Builds an analyzer with the default stop words ({@link #STOP_WORDS}).

Builds an analyzer with the given stop words.

Builds an analyzer with the stop words from the given file.

Builds an analyzer with the stop words from the given reader.

Constructs a {@link StandardTokenizer} filtered by a {@link StandardFilter}, a {@link LowerCaseFilter} and a {@link StopFilter}.

Normalizes tokens extracted with {@link StandardTokenizer}.

Returns the next token in the stream, or null at EOS.

Releases resources associated with this stream.

The source of tokens for this filter.

Construct a token stream filtering the given input.

Close the input TokenStream.

Construct filtering in.

The text source for this Tokenizer.

Construct a tokenizer with null input.

Construct a token stream processing the given input.

By default, closes the input Reader.

Constructs a tokenizer for this Reader.

By default, closes the input Reader.

Describes the input token stream.

An integer that describes the kind of this token. This numbering system is determined by JavaCCParser, and a table of these numbers is stored in the file ...Constants.java.

beginLine and beginColumn describe the position of the first character of this token; endLine and endColumn describe the position of the last character of this token.

The string image of the token.

A reference to the next regular (non-special) token from the input stream. If this is the last token from the input stream, or if the token manager has not read tokens beyond this one, this field is set to null. This is true only if this token is also a regular token. Otherwise, see below for a description of the contents of this field.

This field is used to access special tokens that occur prior to this token, but after the immediately preceding regular (non-special) token. If there are no such special tokens, this field is set to null. When there are more than one such special token, this field refers to the last of these special tokens, which in turn refers to the next previous special token through its specialToken field, and so on until the first special token (whose specialToken field is null). The next fields of special tokens refer to other special tokens that immediately follow it (without an intervening regular token). If there is no such token, this field is null.

Returns the image.

Returns a new Token object, by default. However, if you want, you can create and return subclass objects based on the value of ofKind. Simply add the cases to the switch for all those special cases. For example, if you have a subclass of Token called IDToken that you want to create if ofKind is ID, simlpy add something like : case MyParserConstants.ID : return new IDToken(); to the following switch statement. Then you can cast matchedToken variable to the appropriate type and use it in your lexical actions.

Lexical error occured.

An attempt wass made to create a second instance of a static token manager.

Tried to change to an invalid lexical state.

Detected (and bailed out of) an infinite loop in the token manager.

Indicates the reason why the exception is thrown. It will have one of the above 4 values.

Replaces unprintable characters by their espaced (or unicode escaped) equivalents in the given string

Returns a detailed message for the Error when it is thrown by the token manager to indicate a lexical error. Parameters : EOFSeen : indicates if EOF caused the lexicl error curLexState : lexical state in which this error occured errorLine : line number when the error occured errorColumn : column number when the error occured errorAfter : prefix that was seen before this error occured curchar : the offending character Note: You can customize the lexical error message by modifying this method.

You can also modify the body of this method to customize your error messages. For example, cases like LOOP_DETECTED and INVALID_LEXICAL_STATE are not of end-users concern, so you can return something like : "Internal Error : Please file a bug report .... " from this method for such cases in the release version of your parser.

An abstract base class for simple, character-oriented tokenizers.

Returns true iff a character should be included in a token. This tokenizer generates as tokens adjacent sequences of characters which satisfy this predicate. Characters for which this is false are used to define token boundaries and are not included in tokens.

Called on each token character to normalize it before it is added to the token. The default implementation does nothing. Subclasses may use this to, e.g., lowercase tokens.

Returns the next token in the stream, or null at EOS.

To replace accented characters in a String by unaccented equivalents.

"Tokenizes" the entire stream as a single token. This is useful for data like zip codes, ids, and some product names.

Emits the entire input as a single token.

Removes words that are too long and too short from the stream.

David Spencer $Id: LengthFilter.java 347992 2005-11-21 21:41:43Z dnaber $

Build a filter that removes words that are too long or too short from the text.

Returns the next input Token whose termText() is the right len

A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate. Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

Construct a new LetterTokenizer.

Collects only characters which satisfy {@link Character#isLetter(char)}.

Normalizes token text to lower case.

$Id: LowerCaseFilter.java 150259 2004-03-29 22:48:07Z cutting $

Construct a new LowerCaseTokenizer.

Collects only characters which satisfy {@link Character#isLetter(char)}.

Constructs with default analyzer.

Any fields not specifically defined to use a different analyzer will use the one provided here.

Defines an analyzer to use for the specified field.

field name requiring a non-default analyzer non-default analyzer to use for field

Returns the next input Token, after being stemmed

Stemmer, implementing the Porter Stemming Algorithm The Stemmer class transforms a word into its root form. The input word can be provided a character at time (by calling add()), or at once by calling one of the various stem(something) methods.

reset() resets the stemmer so it can stem another word. If you invoke the stemmer by calling add(char) and then Stem(), you must call reset() before starting another word.

Add a character to the word being stemmed. When you are finished adding characters, you can call Stem(void) to process the word.

After a word has been stemmed, it can be retrieved by toString(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.)

Returns the length of the word resulting from the stemming process.

Returns a reference to a character buffer containing the results of the stemming process. You also need to consult getResultLength() to determine the length of the result.

Stem a word provided as a String. Returns the result as a String.

Stem a word contained in a char[]. Returns true if the stemming process resulted in a word different from the input. You can retrieve the result with getResultLength()/getResultBuffer() or toString().

Stem a word contained in a portion of a char[] array. Returns true if the stemming process resulted in a word different from the input. You can retrieve the result with getResultLength()/getResultBuffer() or toString().

Stem a word contained in a leading portion of a char[] array. Returns true if the stemming process resulted in a word different from the input. You can retrieve the result with getResultLength()/getResultBuffer() or toString().

Stem the word placed into the Stemmer buffer through calls to add(). Returns true if the stemming process resulted in a word different from the input. You can retrieve the result with getResultLength()/getResultBuffer() or toString().

Test program for demonstrating the Stemmer. It reads a file and stems each word, writing the result to standard out. Usage: Stemmer file-name

An Analyzer that filters LetterTokenizer with LowerCaseFilter.

Filters LetterTokenizer with LowerCaseFilter and StopFilter.

An array containing some common English words that are not usually useful for searching.

Builds an analyzer which removes words in ENGLISH_STOP_WORDS.

Builds an analyzer with the stop words from the given set.

Builds an analyzer which removes words in the provided array.

Builds an analyzer with the stop words from the given file.

Builds an analyzer with the stop words from the given reader.

Filters LowerCaseTokenizer with StopFilter.

Removes stop words from a token stream.

Construct a token stream filtering the given input.

Constructs a filter which removes words from the input TokenStream that are named in the array of words.

Construct a token stream filtering the given input.

The set of Stop Words, as Strings. If ignoreCase is true, all strings should be lower cased -Ignore case when stopping. The stopWords set must be setup to contain only lower case words

Constructs a filter which removes words from the input TokenStream that are named in the Set. It is crucial that an efficient Set implementation is used for maximum performance.

Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.

If true, all words are lower cased first. a Set containing the words

Returns the next input Token whose termText() is not a stop word.

A Token is an occurence of a term from the text of a field. It consists of a term's text, the start and end offset of the term in the text of the field, and a type string. The start and end offsets permit applications to re-associate a token with its source text, e.g., to display highlighted query terms in a document browser, or to show matching text fragments in a KWIC (KeyWord In Context) display, etc. The type is an interned string, assigned by a lexical analyzer (a.k.a. tokenizer), naming the lexical or syntactic class that the token belongs to. For example an end of sentence marker token might be implemented with type "eos". The default token type is "word".

Returns the position increment of this Token.

Returns the Token's term text.

Returns this Token's starting offset, the position of the first character corresponding to this token in the source text. Note that the difference between endOffset() and startOffset() may not be equal to termText.length(), as the term text may have been altered by a stemmer or some other filter.

Returns this Token's ending offset, one greater than the position of the last character corresponding to this token in the source text.

Returns this Token's lexical type. Defaults to "word".

An Analyzer that uses WhitespaceTokenizer.

A WhitespaceTokenizer is a tokenizer that divides text at whitespace. Adjacent sequences of non-Whitespace characters form tokens.

Construct a new WhitespaceTokenizer.

Collects only characters which do not satisfy {@link Character#isWhitespace(char)}.

Loader for text files that represent a list of stopwords.

Gerhard Schwarz $Id: WordlistLoader.java 192989 2005-06-22 19:59:03Z dnaber $

Loads a text file and adds every line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the file should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

File containing the wordlist A HashSet with the file's words

Reads lines from a Reader and adds every line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Reader containing the wordlist A HashSet with the reader's words

Builds a wordlist table, using words as both keys and values for backward compatibility.

stopword set

Converts a Date to a string suitable for indexing.

RuntimeException if the date specified in the

method argument is before 1970

Converts a millisecond time to a string suitable for indexing.

RuntimeException if the time specified in the

method argument is negative, that is, before 1970

Converts a string-encoded date into a millisecond time.

Converts a string-encoded date into a Date object.

Converts a Date to a string suitable for indexing.

the date to be converted the desired resolution, see {@link #Round(Date, DateTools.Resolution)} a string in format yyyyMMddHHmmssSSS or shorter, depeding on resolution

Converts a millisecond time to a string suitable for indexing.

the date expressed as milliseconds since January 1, 1970, 00:00:00 GMT the desired resolution, see {@link #Round(long, DateTools.Resolution)} a string in format yyyyMMddHHmmssSSS or shorter, depeding on resolution; using UTC as timezone

Converts a string produced by timeToString or DateToString back to a time, represented as the number of milliseconds since January 1, 1970, 00:00:00 GMT.

the date string to be converted the number of milliseconds since January 1, 1970, 00:00:00 GMT ParseException if dateString is not in the

expected format

Converts a string produced by timeToString or DateToString back to a time, represented as a Date object.

the date string to be converted the parsed time as a Date object ParseException if dateString is not in the

expected format

Limit a date's resolution. For example, the date 2004-09-21 13:50:11 will be changed to 2004-09-01 00:00:00 when using Resolution.MONTH.

The desired resolution of the date to be returned the date with all values more precise than resolution set to 0 or 1

Limit a date's resolution. For example, the date 1095767411000 (which represents 2004-09-21 13:50:11) will be changed to 1093989600000 (2004-09-01 00:00:00) when using Resolution.MONTH.

The desired resolution of the date to be returned the date with all values more precise than resolution set to 0 or 1, expressed as milliseconds since January 1, 1970, 00:00:00 GMT

Specifies the time granularity.

Constructs a new document with no fields.

Returns the number of fields in this document

Added as a helper for Lucene.Net

Adds a field to a document. Several fields may be added with the same name. In this case, if the fields are indexed, their text is treated as though appended for the purposes of search.

Note that add like the removeField(s) methods only makes sense prior to adding a document to an index. These methods cannot be used to change the content of an existing index! In order to achieve this, a document has to be deleted from an index and a new changed version of that document has to be added.

Removes field with the specified name from the document. If multiple fields exist with this name, this method removes the first field that has been added. If there is no field with the specified name, the document remains unchanged.

Note that the removeField(s) methods like the add method only make sense prior to adding a document to an index. These methods cannot be used to change the content of an existing index! In order to achieve this, a document has to be deleted from an index and a new changed version of that document has to be added.

Removes all fields with the given name from the document. If there is no field with the specified name, the document remains unchanged.

Returns a field with the given name if any exist in this document, or null. If multiple fields exists with this name, this method returns the first value added.

Returns the string value of the field with the given name if any exist in this document, or null. If multiple fields exist with this name, this method returns the first value added. If only binary fields with this name exist, returns null.

Returns an Enumeration of all the fields in a document.

Returns an array of {@link Field}s with the given name. This method can return null.

the name of the field a Field[] array

Returns an array of values of the field specified as the method parameter. This method can return null.

the name of the field a String[] of field values

Returns an array of byte arrays for of the fields that have the name specified as the method parameter. This method will return null if no binary fields with the specified name are available.

the name of the field a byte[][] of binary field values.

Returns an array of bytes for the first (or only) field that has the name specified as the method parameter. This method will return null if no binary fields with the specified name are available. There may be non-binary fields with the same name.

the name of the field. a byte[] containing the binary field value.

Prints the fields of a document for human consumption.

A field is a section of a Document. Each field has two parts, a name and a value. Values may be free text, provided as a String or as a Reader, or they may be atomic keywords, which are not further processed. Such keywords may be used to represent dates, urls, etc. Fields are optionally stored in the index, so that they may be returned with hits on the document.

Returns the name of the field as an interned string. For example "date", "title", "body", ...

The value of the field as a String, or null. If null, the Reader value or binary value is used. Exactly one of stringValue(), readerValue(), and binaryValue() must be set.

The value of the field as a Reader, or null. If null, the String value or binary value is used. Exactly one of stringValue(), readerValue(), and binaryValue() must be set.

The value of the field in Binary, or null. If null, the Reader or String value is used. Exactly one of stringValue(), readerValue() and binaryValue() must be set.

Create a field by specifying its name, value and how it will be saved in the index. Term vectors will not be stored in the index.

The name of the field The string to process Whether value should be stored in the index Whether the field should be indexed, and if so, if it should be tokenized before indexing NullPointerException if name or value is null IllegalArgumentException if the field is neither stored nor indexed

Create a field by specifying its name, value and how it will be saved in the index.

The name of the field The string to process Whether value should be stored in the index Whether the field should be indexed, and if so, if it should be tokenized before indexing Whether term vector should be stored NullPointerException if name or value is null IllegalArgumentException in any of the following situations:

the field is neither stored nor indexed
the field is not indexed but termVector is TermVector.YES

Create a tokenized and indexed field that is not stored. Term vectors will not be stored.

The name of the field The reader with the content NullPointerException if name or reader is null

Create a tokenized and indexed field that is not stored, optionally with storing term vectors.

The name of the field The reader with the content Whether term vector should be stored NullPointerException if name or reader is null

Create a stored field with binary value. Optionally the value may be compressed.

The name of the field The binary value How value should be stored (compressed or not.) IllegalArgumentException if store is Store.NO

True iff the value of the field is to be stored in the index for return with search hits. It is an error for this to be true if a field is Reader-valued.

True iff the value of the field is to be indexed, so that it may be searched on.

True iff the value of the field should be tokenized as text prior to indexing. Un-tokenized fields are indexed as a single word and may not be Reader-valued.

True if the value of the field is stored and compressed within the index

True iff the term or terms used to index this field are stored as a term vector, available from {@link IndexReader#GetTermFreqVector(int,String)}. These methods do not provide access to the original content of the field, only to terms used to index it. If the original content must be preserved, use the stored attribute instead.

True iff terms are stored as term vector together with their offsets (start and end positon in source text).

True iff terms are stored as term vector together with their token positions.

True iff the value of the filed is stored as binary

True if norms are omitted for this indexed field

Expert: If set, omit normalization factors associated with this indexed field. This effectively disables indexing boosts and length normalization for this field.

Prints a Field for human consumption.

Specifies whether and how a field should be stored.

A serializable Enum class.

Resolves the deserialized instance to the local reference for accurate equals() and == comparisons.

a reference to Parameter as resolved in the local VM ObjectStreamException

Store the original field value in the index in a compressed form. This is useful for long documents and for binary valued fields.

Store the original field value in the index. This is useful for short texts like a document's title which should be displayed with the results. The value is stored in its original form, i.e. no analyzer is used before it is stored.

Do not store the field value in the index.

Specifies whether and how a field should be indexed.

Do not index the field value. This field can thus not be searched, but one can still access its contents provided it is {@link Field.Store stored}.

Index the field's value so it can be searched. An Analyzer will be used to tokenize and possibly further normalize the text before its terms will be stored in the index. This is useful for common text.

Index the field's value without using an Analyzer, so it can be searched. As no analyzer is used the value will be stored as a single term. This is useful for unique Ids like product numbers.

Index the field's value without an Analyzer, and disable the storing of norms. No norms means that index-time boosting and field length normalization will be disabled. The benefit is less memory usage as norms take up one byte per indexed field for every document in the index.

Specifies whether and how a field should have term vectors.

Do not store term vectors.

Store the term vectors of each document. A term vector is a list of the document's terms and their number of occurences in that document.

Store the term vector + token position information

Store the term vector + Token offset information

Store the term vector + Token position and offset information

Equivalent to longToString(Long.MIN_VALUE)

Equivalent to longToString(Long.MAX_VALUE)

The length of (all) strings returned by {@link #longToString}

Converts a long to a String suitable for indexing.

Converts a String that was returned by {@link #longToString} back to a long.

IllegalArgumentException

if the input is null

NumberFormatException

if the input does not parse (it was not a String returned by longToString()).

Class for accessing a compound stream. This class implements a directory, but is limited to only read operations. Directory methods that would normally modify data throw an exception.

Dmitry Serebrennikov $Id: CompoundFileReader.java 208905 2005-07-03 10:40:01Z dnaber $

Returns an array of strings, one for each file in the directory.

Returns true iff a file with the given name exists.

Returns the time the named file was last modified.

Set the modified time of an existing file to now.

Removes an existing file in the directory.

Renames an existing file in the directory. If a file already exists with the new name, then it is replaced. This replacement should be atomic.

Returns the length of a file in the directory.

Creates a new, empty file in the directory with the given name. Returns a stream writing this file.

Returns a stream reading an existing file.

Construct a {@link Lock}.

the name of the lock file

Closes the store.

Returns an array of strings, one for each file in the directory.

Returns true iff a file with the given name exists.

Returns the time the compound file was last modified.

Set the modified time of the compound file to now.

Not implemented

UnsupportedOperationException

Not implemented

UnsupportedOperationException

Returns the length of a file in the directory.

IOException if the file does not exist

Not implemented

UnsupportedOperationException

Not implemented

UnsupportedOperationException

Implementation of an IndexInput that reads from a portion of the compound file. The visibility is left as "package" *only* because this helps with testing since JUnit test cases in a different class can then access package fields of this class.

Base implementation class for buffered {@link IndexInput}.

Abstract base class for input from a file in a {@link Directory}. A random-access input stream. Used for all Lucene index input operations.

Reads and returns a single byte.

Reads a specified number of bytes into an array at the specified offset.

the array to read bytes into the offset in the array to start storing bytes the number of bytes to read

Reads four bytes and returns an int.

Reads an int stored in variable-length format. Reads between one and five bytes. Smaller values take fewer bytes. Negative numbers are not supported.

Reads eight bytes and returns a long.

Reads a long stored in variable-length format. Reads between one and nine bytes. Smaller values take fewer bytes. Negative numbers are not supported.

Reads a string.

Reads UTF-8 encoded characters into an array.

the array to read characters into the offset in the array to start storing characters the number of characters to read

Closes the stream to futher operations.

Returns the current position in this file, where the next read will occur.

Sets current position in this file, where the next read will occur.

The number of bytes in the file.

Expert: implements buffer refill. Reads bytes from the current position in the input.

the array to read bytes into the offset in the array to start storing bytes the number of bytes to read

Expert: implements seek. Sets current position in this file, where the next {@link #ReadInternal(byte[],int,int)} will occur.

Expert: implements buffer refill. Reads bytes from the current position in the input.

the array to read bytes into the offset in the array to start storing bytes the number of bytes to read

Expert: implements seek. Sets current position in this file, where the next {@link #ReadInternal(byte[],int,int)} will occur.

Closes the stream to further operations.

Create the compound stream in the specified file. The file name is the entire name (no extensions are added).

NullPointerException if dir or name is null

Returns the directory of the compound file.

Returns the name of the compound file.

Add a source stream. file is the string by which the sub-stream will be known in the compound stream.

IllegalStateException if this writer is closed NullPointerException if file is null IllegalArgumentException if a file with the same name

has been added already

Merge files with the extensions added up to now. All files with these extensions are combined sequentially into the compound stream. After successful merge, the source files are deleted.

IllegalStateException if close() had been called before or

if no file has been added to this object

Copy the contents of the file with specified extension into the provided output stream. Use the provided buffer for moving data to reduce memory allocation.

source file

temporary holder for the start of directory entry for this file

temporary holder for the start of this file's data section

This ctor used by test code only.

The directory to write the document information to The analyzer to use for the document The Similarity function The maximum number of tokens a field may have

If non-null, a message will be printed to this if maxFieldLength is reached.

Access to the Field Info file that describes document fields and whether or not they are indexed. Each segment has a separate Field Info file. Objects of this class are thread-safe for multiple readers, but only one thread can be adding documents at a time, with no other reader or writer threads accessing this object.

Construct a FieldInfos object using the directory and the name of the file IndexInput

The directory to open the IndexInput from The name of the file to open the IndexInput from in the Directory IOException

Adds field info for a Document.

Add fields that are indexed. Whether they have termvectors has to be specified.

The names of the fields Whether the fields store term vectors or not treu if positions should be stored. true if offsets should be stored

Assumes the fields are not storing term vectors.

The names of the fields Whether the fields are indexed or not

Calls 5 parameter add with false for all TermVector parameters.

The name of the Field true if the field is indexed

Calls 5 parameter add with false for term vector positions and offsets.

The name of the field true if the field is indexed true if the term vector should be stored

If the field is not yet known, adds it. If it is known, checks to make sure that the isIndexed flag is the same as was given previously for this field. If not - marks it as being indexed. Same goes for the TermVector parameters.

The name of the field true if the field is indexed true if the term vector should be stored true if the term vector with positions should be stored true if the term vector with offsets should be stored

Return the fieldName identified by its number.

the fieldName or an empty string when the field with the given number doesn't exist.

Return the fieldinfo object referenced by the fieldNumber.

the FieldInfo object or null when the given fieldNumber doesn't exist.

Class responsible for access to stored document fields. It uses <segment>.fdt and <segment>.fdx; files.

$Id: FieldsReader.java 329524 2005-10-30 05:38:46Z yonik $

A FilterIndexReader contains another IndexReader, which it uses as its basic source of data, possibly transforming the data along the way or providing additional functionality. The class FilterIndexReader itself simply implements all abstract methods of IndexReader with versions that pass all requests to the contained index reader. Subclasses of FilterIndexReader may further override some of these methods and may also provide additional methods and fields.

Constructor used if IndexReader is not owner of its directory. This is used for IndexReaders that are used within other IndexReaders that take care or locking directories.

Directory where IndexReader files reside.

Constructor used if IndexReader is owner of its directory. If IndexReader is owner of its directory, it locks its directory in case of write operations.

Directory where IndexReader files reside. Used for write-l

Returns an IndexReader reading the index in an FSDirectory in the named path.

Returns an IndexReader reading the index in the given Directory.

Returns the directory this index resides in.

Returns the time the index in the named directory was last modified. Do not use this to check whether the reader is still up-to-date, use {@link #IsCurrent()} instead.

Reads version number from segments files. The version number is initialized with a timestamp and then increased by one for each change of the index.

where the index resides. version number. IOException if segments file cannot be read

Reads version number from segments files. The version number is initialized with a timestamp and then increased by one for each change of the index.

where the index resides. version number. IOException if segments file cannot be read

Reads version number from segments files. The version number is initialized with a timestamp and then increased by one for each change of the index.

where the index resides. version number. IOException if segments file cannot be read.

Version number when this IndexReader was opened.

Check whether this IndexReader still works on a current version of the index. If this is not the case you will need to re-open the IndexReader to make sure you see the latest changes made to the index.

IOException

Return an array of term frequency vectors for the specified document. The array contains a vector for each vectorized field in the document. Each vector contains terms and frequencies for all terms in a given vectorized field. If no such fields existed, the method returns null. The term vectors that are returned my either be of type TermFreqVector or of type TermPositionsVector if positions or offsets have been stored.

document for which term frequency vectors are returned array of term frequency vectors. May be null if no term vectors have been stored for the specified document. IOException if index cannot be accessed

Return a term frequency vector for the specified document and field. The returned vector contains terms and frequencies for the terms in the specified field of this document, if the field had the storeTermVector flag set. If termvectors had been stored with positions or offsets, a TermPositionsVector is returned.

document for which the term frequency vector is returned field for which the term frequency vector is returned. term frequency vector May be null if field does not exist in the specified document or term vector was not stored. IOException if index cannot be accessed

Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it. false is returned.

the directory to check for an index true if an index exists; false otherwise

Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it.

the directory to check for an index true if an index exists; false otherwise

Returns true if an index exists at the specified directory. If the directory does not exist or if there is no index in it.

the directory to check for an index true if an index exists; false otherwise IOException if there is a problem with accessing the index

Returns the number of documents in this index.

Returns one greater than the largest possible document number. This may be used to, e.g., determine how big to allocate an array which will have an element for every document number in an index.

Returns the stored fields of the n^th Document in this index.

Returns true if document n has been deleted

Returns true if any documents have been deleted

Returns true if there are norms stored for this field.

Returns the byte-encoded normalization factor for the named field of every document. This is used by the search code to score documents.

Reads the byte-encoded normalization factor for the named field of every document. This is used by the search code to score documents.

Expert: Resets the normalization factor for the named field of the named document. The norm represents the product of the field's {@link Field#SetBoost(float) boost} and its {@link Similarity#LengthNorm(String, int) length normalization}. Thus, to preserve the length normalization values when resetting this, one should base the new value upon the old.

Implements setNorm in subclass.

Expert: Resets the normalization factor for the named field of the named document.

Returns an enumeration of all the terms in the index. The enumeration is ordered by Term.compareTo(). Each term is greater than all that precede it in the enumeration.

Returns an enumeration of all terms after a given term. The enumeration is ordered by Term.compareTo(). Each term is greater than all that precede it in the enumeration.

Returns the number of documents containing the term t.

Returns an unpositioned {@link TermDocs} enumerator.

Returns an unpositioned {@link TermPositions} enumerator.

Tries to acquire the WriteLock on this directory. this method is only valid if this IndexReader is directory owner.

IOException If WriteLock cannot be acquired.

Deletes the document numbered docNum. Once a document is deleted it will not appear in TermDocs or TermPostitions enumerations. Attempts to read its field with the {@link #document} method will result in an error. The presence of this document may still be reflected in the {@link #docFreq} statistic, though this will be corrected eventually as the index is further modified.

Implements deletion of the document numbered docNum. Applications should call {@link #DeleteDocument(int)} or {@link #DeleteDocuments(Term)}.

Deletes all documents containing term. This is useful if one uses a document field to hold a unique ID string for the document. Then to delete such a document, one merely constructs a term with the appropriate field and the unique ID string as its text and passes it to this method. See {@link #Delete(int)} for information about when this deletion will become effective.

the number of documents deleted

Undeletes all documents currently marked as deleted in this index.

Implements actual undeleteAll() in subclass.

Commit changes resulting from delete, undeleteAll, or setNorm operations

IOException

Implements commit.

Closes files associated with this index. Also saves any new deletions to disk. No other methods should be called after this has been called.

Implements close.

Release the write lock, if needed.

Get a list of unique field names that exist in this index and have the specified field option information.

specifies which field option should be available for the returned fields Collection of Strings indicating the names of the fields.

Returns true iff the index in the named directory is currently locked.

the directory to check for a lock IOException if there is a problem with accessing the index

Returns true iff the index in the named directory is currently locked.

the directory to check for a lock IOException if there is a problem with accessing the index

Prints the filename and size of each file within a given compound file. Add the -extract flag to extract files to the current working directory. In order to make the extracted version of the index work, you have to copy the segments file from the compound index into the directory where the extracted files are stored.

Usage: Lucene.Net.index.IndexReader [-extract] <cfsfile>

Utility class for executing code with exclusive access.

Attempts to obtain exclusive access and immediately return upon success or failure.

true iff exclusive access is obtained

Attempts to obtain an exclusive lock within amount of time given. Currently polls once per second until lockWaitTimeout is passed.

length of time to wait in ms true if lock was obtained IOException if lock wait times out or obtain() throws an IOException

Releases exclusive access.

Returns true if the resource is currently locked. Note that one must still call {@link #Obtain()} before using the resource.

Utility class for executing code with exclusive access.

Constructs an executor that will grab the named lock.

Code to execute with exclusive access.

Calls {@link #doBody} while lock is obtained. Blocks if lock cannot be obtained immediately. Retries to obtain lock once per second until it is obtained, or until it has tried ten times. Lock is released when {@link #doBody} exits.

Construct a FilterIndexReader based on the specified base reader. Directory locking for delete, undeleteAll, and setNorm operations is left to the base reader.

Note that base reader is closed if this FilterIndexReader is closed.

specified base reader.

Base class for filtering {@link TermDocs} implementations.

Sets this to the data for a term. The enumeration is reset to the start of the data for this term.

Sets this to the data for the current term in a {@link TermEnum}. This may be optimized in some implementations.

Frees associated resources.

Base class for filtering {@link TermPositions} implementations.

Base class for filtering {@link TermEnum} implementations.

Increments the enumeration to the next element. True if one exists.

Returns the current Term in the enumeration.

Returns the docFreq of the current Term in the enumeration.

Closes the enumeration to further activity, freeing resources.

Filename filter that accept filenames and extensions only created by Lucene.

Daniel Naber / Bernhard Messer $rcs = ' $Id: Exp $ ' ;

Useful constants representing filenames and extensions used by lucene

Bernhard Messer $rcs = ' $Id: Exp $ ' ;

Name of the index segment file

Name of the index deletable file

This array contains all filename extensions used by Lucene's index files, with one exception, namely the extension made up from .f + a number. Also note that two of Lucene's files (deletable and segments) don't have any filename extension.

File extensions of old-style index files

File extensions for term vector support

Open an index with write access.

the index directory the analyzer to use for adding new documents true to create the index or overwrite the existing one; false to append to the existing index

Open an index with write access.

the index directory the analyzer to use for adding new documents true to create the index or overwrite the existing one; false to append to the existing index

Open an index with write access.

the index directory the analyzer to use for adding new documents true to create the index or overwrite the existing one; false to append to the existing index

Initialize an IndexWriter.

IOException

Throw an IllegalStateException if the index is closed.

IllegalStateException

Close the IndexReader and open an IndexWriter.

IOException

Close the IndexWriter and open an IndexReader.

IOException

Make sure all changes are written to disk.

IOException

Adds a document to this index, using the provided analyzer instead of the one specific in the constructor. If the document contains more than {@link #SetMaxFieldLength(int)} terms for a given field, the remainder are discarded.

IllegalStateException if the index is closed

Adds a document to this index. If the document contains more than {@link #SetMaxFieldLength(int)} terms for a given field, the remainder are discarded.

IllegalStateException if the index is closed

the number of documents deleted IllegalStateException if the index is closed

Deletes the document numbered docNum.

IllegalStateException if the index is closed

Returns the number of documents currently in this index.

IllegalStateException if the index is closed

Merges all segments together into a single segment, optimizing an index for search.

IllegalStateException if the index is closed IOException

Setting to turn on usage of a compound file. When on, multiple files for each segment are merged into a single file once the segment creation is finished. This is done regardless of what directory is in use.

IllegalStateException if the index is closed IOException

The maximum number of terms that will be indexed for a single field in a document. This limits the amount of memory required for indexing, so that collections with very large files will not crash the indexing process by running out of memory.

Note that this effectively truncates large documents, excluding from the index terms that occur further in the document. If you know your source documents are large, be sure to set this value high enough to accomodate the expected size. If you set it to Integer.MAX_VALUE, then the only limit is your memory, but you should anticipate an OutOfMemoryError.

By default, no more than 10,000 terms will be indexed for a field.

IllegalStateException if the index is closed IOException

By default, no more than 10,000 terms will be indexed for a field.

IllegalStateException if the index is closed IOException IOException

Close this index, writing all pending changes to disk.

IllegalStateException if the index has been closed before already

Default value for the write lock timeout (1,000).

Default value for the commit lock timeout (10,000).

Default value is 10. Change using {@link #SetMergeFactor(int)}.

Default value is 10. Change using {@link #SetMaxBufferedDocs(int)}.

Default value is 10,000. Change using {@link #SetMaxFieldLength(int)}.

Default value is 128. Change using {@link #SetTermIndexInterval(int)}.

Default value is {@link Integer#MAX_VALUE}. Change using {@link #SetMaxMergeDocs(int)}.

Use compound file setting. Defaults to true, minimizing the number of files used. Setting this to false may improve indexing performance, but may also cause file handle problems.

Get the current setting of whether to use the compound file format. Note that this just returns the value you set with setUseCompoundFile(boolean) or the default. You cannot use this to query the status of an existing index.

Expert: Set the Similarity implementation used by this IndexWriter.

Expert: Set the interval between indexed terms. Large values cause less memory to be used by IndexReader, but slow random-access to terms. Small values cause more memory to be used by an IndexReader, and speed random-access to terms. This parameter determines the amount of computation required per query term, regardless of the number of documents that contain that term. In particular, it is the maximum number of other terms that must be scanned before a term is located and its frequency and position information may be processed. In a large index with user-entered query terms, query processing time is likely to be dominated not by term lookup but rather by the processing of frequency and positional data. In a small index or when many uncommon query terms are generated (e.g., by wildcard queries) term lookup may become a dominant cost. In particular, numUniqueTerms/interval terms are read into memory by an IndexReader, and, on average, interval/2 terms must be scanned for each random term access.

Expert: Return the interval between indexed terms.

Constructs an IndexWriter for the index in path. Text will be analyzed with a. If create is true, then a new, empty index will be created in path, replacing the index already there, if any.

the path to the index directory the analyzer to use true to create the index or overwrite the existing one; false to append to the existing index IOException if the directory cannot be read/written to, or

if it does not exist, and create is false

Constructs an IndexWriter for the index in path. Text will be analyzed with a. If create is true, then a new, empty index will be created in path, replacing the index already there, if any.

if it does not exist, and create is false

Constructs an IndexWriter for the index in d. Text will be analyzed with a. If create is true, then a new, empty index will be created in d, replacing the index already there, if any.

the index directory the analyzer to use true to create the index or overwrite the existing one; false to append to the existing index IOException if the directory cannot be read/written to, or

if it does not exist, and create is false

By default, no more than 10,000 terms will be indexed for a field.

If non-null, information about merges and a message when maxFieldLength is reached will be printed to this.

Sets the maximum time to wait for a commit lock (in milliseconds).

Sets the maximum time to wait for a write lock (in milliseconds).

Flushes all changes to an index and closes all associated files.

Release the write lock, if needed.

Returns the Directory used by this index.

Returns the analyzer used by this index.

Returns the number of documents currently in this index.

By default, no more than 10,000 terms will be indexed for a field.

Adds a document to this index. If the document contains more than {@link #SetMaxFieldLength(int)} terms for a given field, the remainder are discarded.

Adds a document to this index, using the provided analyzer instead of the value of {@link #GetAnalyzer()}. If the document contains more than {@link #SetMaxFieldLength(int)} terms for a given field, the remainder are discarded.

If non-null, information about merges will be printed to this.

Merges all segments together into a single segment, optimizing an index for search.

Merges the provided indexes into this index.

After this completes, the index is optimized.

The provided IndexReaders are not closed.

Merges all RAM-resident segments.

Incremental segment merger.

Pops segments off of segmentInfos stack down to minSegment, merges them, and pushes the merged index onto the top of the segmentInfos stack.

Merges the named range of segments, replacing them in the stack with a single segment.

Describe class MultipleTermPositions here.

Anders Nielsen 1.0

Creates a new MultipleTermPositions instance.

Not implemented.

UnsupportedOperationException

Not implemented.

UnsupportedOperationException

Not implemented.

UnsupportedOperationException

A PriorityQueue maintains a partial ordering of its elements such that the least element can always be found in constant time. Put()'s and pop()'s require log(size) time.

Determines the ordering of objects in this priority queue. Subclasses must define this one method.

Subclass constructors must call this.

Adds an Object to a PriorityQueue in log(size) time. If one tries to add more objects than maxSize from initialize a RuntimeException (ArrayIndexOutOfBound) is thrown.

Adds element to the PriorityQueue in log(size) time if either the PriorityQueue is not full, or not lessThan(element, top()).

true if element is added, false otherwise.

Returns the least element of the PriorityQueue in constant time.

Removes and returns the least element of the PriorityQueue in log(size) time.

Should be called when the Object at top changes values. Still log(n) worst case, but it's at least twice as fast to

            { pq.top().change(); pq.adjustTop(); }

instead of

            { o = pq.pop(); o.change(); pq.push(o); }

Returns the number of elements currently stored in the PriorityQueue.

Removes all entries from the PriorityQueue.

An IndexReader which reads multiple indexes, appending their content.

$Id: MultiReader.java 355181 2005-12-08 19:53:06Z cutting $

Construct a MultiReader aggregating the named set of (sub)readers. Directory locking for delete, undeleteAll, and setNorm operations is left to the subreaders.

Note that all subreaders are closed if this Multireader is closed.

set of (sub)readers IOException

Construct reading the named set of readers.

Return an array of term frequency vectors for the specified document. The array contains a vector for each vectorized field in the document. Each vector vector contains term numbers and frequencies for all terms in a given vectorized field. If no such fields existed, the method returns null.

Optimized implementation.

As yet unoptimized implementation.

Construct a ParallelReader.

Add an IndexReader.

Add an IndexReader whose stored fields will not be returned. This can accellerate search when stored fields are only needed from a subset of the IndexReaders.

IllegalArgumentException if not all indexes contain the same number

of documents

IllegalArgumentException if not all indexes have the same value

of {@link IndexReader#MaxDoc()}

The file format version, a negative number.

counts how often the index has been changed by adding or deleting docs. starting with the current time in milliseconds forces to create unique version numbers.

version number when this SegmentInfos was generated.

Current version number from segments file.

This ctor used only by test code.

The Directory to merge the other segments into The name of the new segment

Add an IndexReader to the collection of readers that are to be merged

The index of the reader to return The ith reader to be merged

Merges the readers specified by the {@link #add} method into the directory passed to the constructor

The number of documents that were merged IOException

close all IndexReaders that have been added. Should not be called before merge().

IOException

The number of documents in all of the readers IOException

Merge the TermVectors from each of the segments into the new one.

IOException

Merge one term found in one or more segments. The array smis contains segments that are positioned at the same term. N is the number of cells in the array actually occupied.

array of segments number of cells in the array actually occupied

Process postings from multiple segments all positioned on the same term. Writes out merged entries into freqOutput and the proxOutput streams.

array of segments number of cells in the array actually occupied number of documents across all segments where this term was found $Id: SegmentReader.java 329523 2005-10-30 05:37:11Z yonik $

The class which implements SegmentReader.

Read norms into a pre-allocated array.

Create a clone from the initial TermVectorsReader and store it in the ThreadLocal.

TermVectorsReader

Return a term frequency vector for the specified document and field. The vector returned contains term numbers and frequencies for all terms in the specified field of this document, if the field had storeTermVector flag set. If the flag was not set, the method returns null.

IOException

Optimized implementation.

Overridden by SegmentTermPositions to skip in prox stream.

Optimized implementation.

Increments the enumeration to the next element. True if one exists.

Optimized scan, without allocating new terms.

Returns the current Term in the enumeration. Initially invalid, valid after next() called for the first time.

Returns the previous Term enumerated. Initially null.

Returns the current TermInfo in the enumeration. Initially invalid, valid after next() called for the first time.

Sets the argument to the current TermInfo in the enumeration. Initially invalid, valid after next() called for the first time.

Returns the docFreq from the current TermInfo in the enumeration. Initially invalid, valid after next() called for the first time.

Closes the enumeration to further activity, freeing resources.

Called by super.skipTo().

Provides access to stored term vector of a document field.

The field this vector is associated with. The number of terms in the term vector. An Array of term texts in ascending order.

Array of term frequencies. Locations of the array correspond one to one to the terms in the array obtained from getTerms method. Each location in the array contains the number of times this term occurs in the document or the document field.

Return an index in the term numbers array returned from getTerms at which the term with the specified term appears. If this term does not appear in the array, return -1.

Just like indexOf(int) but searches for a number of terms at the same time. Returns an array that has the same size as the number of terms searched for, each slot containing the result of searching for that term number.

array containing terms to look for index in the array where the list of terms starts the number of terms in the list

The number of the field this vector is associated with

Extends TermFreqVector to provide additional information about positions in which each of the terms is found. A TermPositionVector not necessarily contains both positions and offsets, but at least one of these arrays exists.

Returns an array of positions in which the term is found. Terms are identified by the index at which its number appears in the term String array obtained from the indexOf method. May return null if positions have not been stored.

Returns an array of TermVectorOffsetInfo in which the term is found. May return null if offsets have not been stored.

The position in the array to get the offsets from An array of TermVectorOffsetInfo objects or the empty list

Returns an array of TermVectorOffsetInfo in which the term is found.

The position in the array to get the offsets from An array of TermVectorOffsetInfo objects or the empty list

Returns an array of positions in which the term is found. Terms are identified by the index at which its number appears in the term String array obtained from the indexOf method.

A Term represents a word from text. This is the unit of search. It is composed of two elements, the text of the word, as a string, and the name of the field that the text occured in, an interned string. Note that terms may represent more than words from text fields, but also things like dates, email addresses, urls, etc.

Constructs a Term with the given field and text.

Returns the field of this term, an interned string. The field indicates the part of a document which this term came from.

Returns the text of this term. In the case of words, this is simply the text of the word. In the case of dates and other types, this is an encoding of the object as a string.

Optimized construction of new Terms by reusing same field as this Term - avoids field.intern() overhead

The text of the new term (field is implicitly same as this Term instance) A new Term

Compares two terms, returning true iff they have the same field and text.

Combines the hashCode() of the field and the text.

Compares two terms, returning a negative integer if this term belongs before the argument, zero if this term is equal to the argument, and a positive integer if this term belongs after the argument. The ordering of terms is first by field, then by text.

Resets the field and text of a Term.

A TermInfo is the record of information stored for a term.

The number of documents which contain the term.

Returns the number of term/value pairs in the set.

Returns the offset of the greatest index entry which is less than or equal to term.

Returns the TermInfo for a Term in the set, or null.

Scans within block for matching term.

Returns the nth term in the set.

Returns the position of a Term in the set or -1.

Returns an enumeration of all the Terms and TermInfos in the set.

Returns an enumeration of terms starting at or after the named term.

The file format version, a negative number.

Expert: The fraction of terms in the "dictionary" which should be stored in RAM. Smaller values use more memory, but make searching slightly faster, while larger values use less memory and make searching slightly slower. Searching is typically not dominated by dictionary lookup, so tweaking this is rarely useful.

Expert: The fraction of {@link TermDocs} entries stored in skip tables, used to accellerate {@link TermDocs#SkipTo(int)}. Larger values result in smaller indexes, greater acceleration, but fewer accelerable cases, while smaller values result in bigger indexes, less acceleration and more accelerable cases. More detailed experiments would be useful here.

Called to complete TermInfos creation.

$Id: TermVectorsReader.java 170226 2005-05-15 15:04:39Z bmesser $

The number of documents in the reader

Retrieve the term vector for the given document and field

The document number to retrieve the vector for The field within the document to retrieve The TermFreqVector for the document and field or null if there is no termVector for this field. IOException if there is an error reading the term vector files

Return all term vectors stored for this document or null if the could not be read in.

The document number to retrieve the vector for All term frequency vectors IOException if there is an error reading the term vector files

The field to read in The pointer within the tvf file where we should start reading The TermVector located at that position IOException

Writer works by opening a document and then opening the fields within the document and then writing out the vectors for each field. Rough usage:


            for each document
            {
            writer.openDocument();
            for each field on the document
            {
            writer.openField(field);
            for all of the terms
            {
            writer.addTerm(...)
            }
            writer.closeField
            }
            writer.closeDocument()    
            }

$Id: TermVectorsWriter.java 150689 2004-11-29 21:42:02Z bmesser $

Start processing a field. This can be followed by a number of calls to addTerm, and a final call to closeField to indicate the end of processing of this field. If a field was previously open, it is closed automatically.

Finished processing current field. This should be followed by a call to openField before future calls to addTerm.

Return true if a field is currently open.

Add term to the field's term vector. Field must already be open. Terms should be added in increasing order of terms, one call per unique termNum. ProxPointer is a pointer into the TermPosition file (prx). Freq is the number of times this term appears in this field, in this document.

IllegalStateException if document or field is not open

Add a complete document specified by all its term vectors. If document has no term vectors, add value for tvx.

IOException

Close all streams.

Returns the next character from the selected input. The method of selecting the input is the responsibility of the class implementing this interface. Can throw any java.io.IOException.

Returns the column number of the last character for current token (being matched after the last call to BeginTOken).

Returns the line number of the last character for current token (being matched after the last call to BeginTOken).

Returns the column number of the first character for current token (being matched after the last call to BeginTOken).

Returns the line number of the first character for current token (being matched after the last call to BeginTOken).

Returns the next character that marks the beginning of the next token. All characters must remain in the buffer between two successive calls to this method to implement backup correctly.

Constructs from a Reader.

A QueryParser which constructs queries to search multiple fields.

Kelvin Tan, Daniel Naber $Revision: 295117 $

Alternative form of QueryParser.Operator.AND

Alternative form of QueryParser.Operator.OR

The actual operator that parser uses to combine query terms

Constructs a query parser.

the default field for query terms. used to find terms in the query text.

Parses a query string, returning a {@link Lucene.Net.search.Query}.

the query string to be parsed. ParseException if the parsing fails Returns the analyzer. Returns the field.

Get the minimal similarity for fuzzy queries.

Set the minimum similarity for fuzzy queries. Default is 0.5f.

Get the prefix length for fuzzy queries.

Returns the fuzzyPrefixLength.

Set the prefix length for fuzzy queries. Default is 0.

The fuzzyPrefixLength to set.

Sets the default slop for phrases. If zero, then exact phrase matches are required. Default value is zero.

Gets the default slop for phrases.

Sets the boolean operator of the QueryParser. In default mode (OR_OPERATOR) terms without any modifiers are considered optional: for example capital of Hungary is equal to capital OR of OR Hungary.
In AND_OPERATOR mode terms are considered to be in conjuction: the above mentioned query is parsed as capital AND of AND Hungary

Gets implicit operator setting, which will be either AND_OPERATOR or OR_OPERATOR.

Whether terms of wildcard, prefix, fuzzy and range queries are to be automatically lower-cased or not. Default is true.

Set locale used by date range parsing.

Returns current locale, allowing access by subclasses.

throw in overridden method to disallow

Base implementation delegates to {@link #GetFieldQuery(String,String)}. This method may be overridden, for example, to return a SpanNearQuery instead of a PhraseQuery.

throw in overridden method to disallow throw in overridden method to disallow

Factory method for generating query, given a set of clauses. By default creates a boolean query composed of clauses passed in. Can be overridden by extending classes, to modify query being returned.

Vector that contains {@link BooleanClause} instances to join. Resulting {@link Query} object. throw in overridden method to disallow

Factory method for generating query, given a set of clauses. By default creates a boolean query composed of clauses passed in. Can be overridden by extending classes, to modify query being returned.

Vector that contains {@link BooleanClause} instances to join. true if coord scoring should be disabled. Resulting {@link Query} object. throw in overridden method to disallow

Factory method for generating a query (similar to {@link #getWildcardQuery}). Called when parser parses an input term token that has the fuzzy suffix (~) appended.

Name of the field query will use. Term token to use for building term for the query Resulting {@link Query} built for the term throw in overridden method to disallow

Returns a String where the escape char has been removed, or kept only once if there was a double escape.

Returns a String where those characters that QueryParser expects to be escaped are escaped by a preceding \.

The default operator for parsing queries. Use {@link QueryParser#setDefaultOperator} to change it.

Creates a MultiFieldQueryParser.

It will, when parse(String query) is called, construct a query like this (assuming the query consists of two terms and you specify the two fields title and body):


            (title:term1 body:term1) (title:term2 body:term2)

When setDefaultOperator(AND_OPERATOR) is set, the result will be:


            +(title:term1 body:term1) +(title:term2 body:term2)

In other words, all the query's terms must appear, but it doesn't matter in what fields they appear.

This variable determines which constructor was used to create this object and thereby affects the semantics of the "getMessage" method (see below).

This is the last token that has been consumed successfully. If this object has been created due to a parse error, the token followng this token will (therefore) be the first error token.

Each entry in this array is an array of integers. Each array of integers represents a sequence of tokens (by their ordinal values) that is expected at this point of the parse.

This is a reference to the "tokenImage" array of the generated parser within which the parse error occurred. This array is defined in the generated ...Constants interface.

The end of line string for this machine.

Used to convert raw characters to their escaped version when these raw version cannot be used as part of an ASCII string literal.

Describes the input token stream.

An integer that describes the kind of this token. This numbering system is determined by JavaCCParser, and a table of these numbers is stored in the file ...Constants.java.

beginLine and beginColumn describe the position of the first character of this token; endLine and endColumn describe the position of the last character of this token.

The string image of the token.

Returns the image.

Lexical error occured.

An attempt wass made to create a second instance of a static token manager.

Tried to change to an invalid lexical state.

Detected (and bailed out of) an infinite loop in the token manager.

Indicates the reason why the exception is thrown. It will have one of the above 4 values.

Replaces unprintable characters by their espaced (or unicode escaped) equivalents in the given string

Expert: an enumeration of span matches. Used to implement span searching. Each span represents a range of term positions within a document. Matches are enumerated in order, by increasing document number, within that by increasing start position and finally by increasing end position.

Move to the next match, returning true iff any such exists.

Returns the document number of the current match. Initially invalid.

Returns the start position of the current match. Initially invalid.

Returns the end position of the current match. Initially invalid.

Wraps a Spans, and can be used to form a linked list.

Matches spans near the beginning of a field.

Base class for span-based queries.

Sets the boost for this query clause to b. Documents matching this clause will (in addition to the normal weightings) have their score multiplied by b.

Gets the boost for this clause. Documents matching this clause will (in addition to the normal weightings) have their score multiplied by b. The boost is 1.0 by default.

Prints a query to a string.

Expert: Constructs and initializes a Weight for a top-level query.

Expert: called to re-write queries into primitive queries. For example, a PrefixQuery will be rewritten into a BooleanQuery that consists of TermQuerys.

Expert: called when re-writing queries under MultiSearcher. Create a single query suitable for use by all subsearchers (in 1-1 correspondence with queries). This is an optimization of the OR of all queries. We handle the common optimization cases of equal queries and overlapping clauses of boolean OR queries (as generated by MultiTermQuery.rewrite() and RangeQuery.rewrite()). Be careful overriding this method as queries[0] determines which method will be called and is not necessarily of the same type as the other queries.

Expert: adds all terms occuring in this query to the terms set. Only works if this query is in its {@link #rewrite rewritten} form.

UnsupportedOperationException if this query is not yet rewritten

Expert: Returns the Similarity implementation to be used for this query. Subclasses may override this method to specify their own Similarity implementation, perhaps one that delegates through that of the Searcher. By default the Searcher's Similarity implementation is returned.

Returns a clone of this query.

Expert: Returns the matches for this query in an index. Used internally to search for spans.

Returns the name of the field matched by this query.

Returns a collection of all terms matched by this query.

use extractTerms instead

Construct a SpanFirstQuery matching spans in match whose end position is less than or equal to end.

Return the SpanQuery whose matches are filtered.

Return the maximum end position permitted in a match.

Returns a collection of all terms matched by this query.

use ExtractTerms instead

Matches spans which are near one another. One can specify slop, the maximum number of intervening unmatched positions, as well as whether matches are required to be in-order.

Construct a SpanNearQuery. Matches spans matching a span from each clause, with up to slop total unmatched positions between them. * When inOrder is true, the spans from each clause must be * ordered as in clauses.

Return the clauses whose spans are matched.

Return the maximum number of intervening unmatched positions permitted.

Return true if matches are required to be in-order.

Returns a collection of all terms matched by this query.

use extractTerms instead

Returns true iff o is equal to this.

Removes matches which overlap with another SpanQuery.

Construct a SpanNotQuery matching spans from include which have no overlap with spans from exclude.

Return the SpanQuery whose matches are filtered.

Return the SpanQuery whose matches must not overlap those returned.

Returns a collection of all terms matched by this query.

use extractTerms instead

Returns true iff o is equal to this.

Matches the union of its clauses.

Construct a SpanOrQuery merging the provided clauses.

Return the clauses whose spans are matched.

Returns a collection of all terms matched by this query.

use ExtractTerms instead

Constructs a Scorer.

The Similarity implementation used by this scorer.

Returns the Similarity implementation used by this scorer.

Expert: Collects matching documents in a range. Hook for optimization. Note that {@link #Next()} must be called once before this method is called for the first time.

The collector to which all matching documents are passed through {@link HitCollector#Collect(int, float)}. Do not score documents past this. true if more matching documents may remain.

Returns the current document number matching the query. Initially invalid, until {@link #Next()} is called the first time.

Returns the score of the current document matching the query. Initially invalid, until {@link #Next()} or {@link #SkipTo(int)} is called the first time.

Matches spans containing a term.

Construct a SpanTermQuery matching the named term's spans.

Return the term whose spans are matched.

Returns a collection of all terms matched by this query.

use extractTerms instead

Returns true iff o is equal to this.

Returns a hash code value for this object.

The query that this concerns.

The weight for this query.

The sum of squared weights of contained query clauses.

Assigns the query normalization factor to this.

Constructs a scorer for this.

An explanation of the score computation for the named document.

A clause in a BooleanQuery.

The query whose matching documents are combined by the boolean query.

Constructs a BooleanClause.

Returns true iff o is equal to this.

Returns a hash code value for this object.

Specifies how terms may occur in matching documents.

Use this operator for terms that must appear in the matching documents.

Use this operator for terms that should appear in the matching documents. For a BooleanQuery with two SHOULD subqueries, at least one of the queries must appear in the matching documents.

Use this operator for terms that must not appear in the matching documents. Note that it is not possible to search for queries that only consist of a MUST_NOT query.

A Query that matches documents matching boolean combinations of other queries, e.g. {@link TermQuery}s, {@link PhraseQuery}s or other BooleanQuerys.

Return the maximum number of clauses permitted, 1024 by default. Attempts to add more than the permitted number of clauses cause {@link TooManyClauses} to be thrown.

Constructs an empty boolean query.

Constructs an empty boolean query. {@link Similarity#Coord(int,int)} may be disabled in scoring, as appropriate. For example, this score factor does not make sense for most automatically generated queries, like {@link WildcardQuery} and {@link FuzzyQuery}.

disables {@link Similarity#Coord(int,int)} in scoring.

Returns true iff {@link Similarity#Coord(int,int)} is disabled in scoring for this query instance.

Specifies a minimum number of the optional BooleanClauses which must be satisifed.

By default no optional clauses are neccessary for a match (unless there are no required clauses). If this method is used, then the specified numebr of clauses is required.

Use of this method is totally independant of specifying that any specific clauses are required (or prohibited). This number will only be compared against the number of matching optional clauses.

EXPERT NOTE: Using this method will force the use of BooleanWeight2, regardless of wether setUseScorer14(true) has been called.

the number of optional clauses that must match

Gets the minimum number of the optional BooleanClauses which must be satisifed.

Adds a clause to a boolean query.

TooManyClauses if the new number of clauses exceeds the maximum clause number

Adds a clause to a boolean query.

TooManyClauses if the new number of clauses exceeds the maximum clause number

Returns the set of clauses in this query.

Indicates whether to use good old 1.4 BooleanScorer.

Prints a user-readable version of this query.

Returns true iff o is equal to this.

Returns a hash code value for this object.

Expert: Delegating scoring implementation. Useful in {@link Query#GetSimilarity(Searcher)} implementations, to override only certain methods of a Searcher's Similiarty implementation..

The Similarity implementation used by default.

Set the default Similarity implementation used by indexing and search code.

Cache of decoded bytes.

Decodes a normalization factor stored in an index.

Returns a table for decoding normalization bytes.

Construct a {@link Similarity} that delegates all methods to another.

the Similarity implementation to delegate to

Thrown when an attempt is made to add more than {@link #GetMaxClauseCount()} clauses. This typically happens if a PrefixQuery, FuzzyQuery, WildcardQuery, or RangeQuery is expanded to many terms during search.

A good old 1.4 Scorer An alternative Scorer that uses and provides skipTo(), and scores documents in document number order.

A simple hash table of document scores within a range.

The scorer to which all scoring will be delegated, except for computing and using the coordination factor.

The number of optionalScorers that need to match (if there are any)

Create a BooleanScorer2.

The similarity to be used. The minimum number of optional added scorers that should match during the search. In case no required scorers are added, at least one of the optional scorers will have to match during the search.

Create a BooleanScorer2. In no required scorers are added, at least one of the optional scorers will have to match during the search.

The similarity to be used.

Returns the scorer to be used for match counting and score summing. Uses requiredScorers, optionalScorers and prohibitedScorers.

Returns the scorer to be used for match counting and score summing. Uses the given required scorer and the prohibitedScorers.

A required scorer already built.

Throws an UnsupportedOperationException. TODO: Implement an explanation of the coordination factor.

The document number for the explanation. UnsupportedOperationException

A Scorer for OR like queries, counterpart of Lucene's ConjunctionScorer. This Scorer implements {@link Scorer#SkipTo(int)} and uses skipTo() on the given Scorers.

The number of subscorers.

The subscorers.

The minimum number of scorers that should match.

The document number of the current match.

The number of subscorers that provide the current match.

Construct a DisjunctionScorer, using one as the minimum number of matching subscorers.

Called the first time next() or skipTo() is called to initialize scorerQueue.

Returns the score of the current document matching the query. Initially invalid, until {@link #Next()} is called the first time.

Returns the number of subscorers matching the current document. Initially invalid, until {@link #Next()} is called the first time.

Gives and explanation for the score of a given document.

Show the resulting score. See BooleanScorer.explain() on how to do this.

A PriorityQueue that orders by {@link Scorer#Doc()}.

Scorer for conjunctions, sets of queries, all of which are required.

Count a scorer as a single match.

Wraps another filter's result and caches it. The caching behavior is like {@link QueryFilter}. The purpose is to allow filters to simply filter, and then wrap with this class to add caching, keeping the two concerns decoupled yet composable.

Abstract base class providing a mechanism to restrict searches to a subset of an index.

Returns a BitSet with true for documents which should be permitted in search results, and false for those that should not.

What about serialization in RemoteSearchable? Caching won't work.

Should transient be removed?

Filter to cache results of

A query that wraps a filter and simply returns a constant score equal to the query boost for every document in the filter.

yonik $Id$

Prints a user-readable version of this query.

Returns true if o is equal to this.

Returns a hash code value for this object.

Returns the field name for this query

Returns the value of the lower endpoint of this range query, null if open ended

Returns the value of the upper endpoint of this range query, null if open ended

Returns true if the lower endpoint is inclusive

Returns true if the upper endpoint is inclusive

Prints a user-readable version of this query.

Returns true if o is equal to this.

Returns a hash code value for this object.

Expert: Default scoring implementation.

Implemented as 1/sqrt(numTerms).

Implemented as 1/sqrt(sumOfSquaredWeights).

Implemented as sqrt(freq).

Implemented as 1 / (distance + 1).

Implemented as log(numDocs/(docFreq+1)) + 1.

Implemented as overlap / maxOverlap.

A query that generates the union of the documents produced by its subqueries, and that scores each document as the maximum score for that document produced by any subquery plus a tie breaking increment for any additional matching subqueries. This is useful to search for a word in multiple fields with different boost factors (so that the fields cannot be combined equivalently into a single search field). We want the primary score to be the one associated with the highest boost, not the sum of the field scores (as BooleanQuery would give). If the query is "albino elephant" this ensures that "albino" matching one field and "elephant" matching another gets a higher score than "albino" matching both fields. To get this result, use both BooleanQuery and DisjunctionMaxQuery: for each term a DisjunctionMaxQuery searches for it in each field, while the set of these DisjunctionMaxQuery's is combined into a BooleanQuery. The tie breaker capability allows results that include the same term in multiple fields to be judged better than results that include this term in only the best of those multiple fields, without confusing this with the better case of two different terms in the multiple fields.

Chuck Williams

Creates a new empty DisjunctionMaxQuery. Use add() to add the subqueries.

this score of each non-maximum disjunct for a document is multiplied by this weight and added into the final score. If non-zero, the value should be small, on the order of 0.1, which says that 10 occurrences of word in a lower-scored field that is also in a higher scored field is just as good as a unique word in the lower scored field (i.e., one that is not in any higher scored field.

Add a subquery to this disjunction

the disjunct added

Optimize our representation and our subqueries representations

the IndexReader we query an optimized copy of us (which may not be a copy if there is nothing to optimize)

Create a shallow copy of us -- used in rewriting if necessary

a copy of us (but reuse, don't copy, our subqueries)

Prettyprint us.

the field to which we are applied a string that shows what we do, of the form "(disjunct1 | disjunct2 | ... | disjunctn)^boost"

Return true iff we represent the same query as o

another object true iff o is a DisjunctionMaxQuery with the same boost and the same subqueries, in the same order, as us

Compute a hash code for hashing us

the hash code

The Scorer for DisjunctionMaxQuery's. The union of all documents generated by the the subquery scorers is generated in document number order. The score for each document is the maximum of the scores computed by the subquery scorers that generate that document, plus tieBreakerMultiplier times the sum of the scores for the other subqueries that generate the document.

Chuck Williams

Creates a new instance of DisjunctionMaxScorer

Multiplier applied to non-maximum-scoring subqueries for a document as they are summed into the result. -- not used since our definition involves neither coord nor terms directly

Add the scorer for a subquery

the scorer of a subquery of our associated DisjunctionMaxQuery

Generate the next document matching our associated DisjunctionMaxQuery.

true iff there is a next document

Determine the current document number. Initially invalid, until {@link #Next()} is called the first time.

the document number of the currently generated document

Determine the current document score. Initially invalid, until {@link #Next()} is called the first time.

the score of the current generated document

Advance to the first document beyond the current whose number is greater than or equal to target.

the minimum number of the next desired document true iff there is a document to be generated whose number is at least target

Explain a score that we computed. UNSUPPORTED -- see explanation capability in DisjunctionMaxQuery.

the number of a document we scored the Explanation for our score

Expert: Describes the score computation for document and query.

The value assigned to this explanation node.

Sets the value assigned to this explanation node.

A description of this explanation node.

Sets the description of this explanation node.

The sub-nodes of this explanation node.

Adds a sub-node to this explanation node.

Render an explanation as text.

Render an explanation as HTML.

All the term values, in natural order.

For each document, an index into the lookup array.

Creates one of these objects

Indicator for StringIndex values in the cache.

Expert: The cache used internally by sorting and range query classes.

Checks the internal cache for an appropriate entry, and if none is found, reads the terms in field as integers and returns an array of size reader.maxDoc() of the value each document has in the given field.

Used to get field values. Which field contains the integers. The values in the given field for each document. IOException If any error occurs.

Used to get field values. Which field contains the integers. Computes integer for string values. The values in the given field for each document. IOException If any error occurs.

Checks the internal cache for an appropriate entry, and if none is found, reads the terms in field as floats and returns an array of size reader.maxDoc() of the value each document has in the given field.

Used to get field values. Which field contains the floats. The values in the given field for each document. IOException If any error occurs.

Used to get field values. Which field contains the floats. Computes float for string values. The values in the given field for each document. IOException If any error occurs.

Checks the internal cache for an appropriate entry, and if none is found, reads the term values in field and returns an array of size reader.maxDoc() containing the value each document has in the given field.

Used to get field values. Which field contains the strings. The values in the given field for each document. IOException If any error occurs.

Checks the internal cache for an appropriate entry, and if none is found reads the term values in field and returns an array of them in natural order, along with an array telling which element in the term array each document uses.

Used to get field values. Which field contains the strings. Array of terms and index into the array for each document. IOException If any error occurs.

Checks the internal cache for an appropriate entry, and if none is found reads field to see if it contains integers, floats or strings, and then calls one of the other methods in this class to get the values. For string values, a StringIndex is returned. After calling this method, there is an entry in the cache for both type AUTO and the actual found type.

Used to get field values. Which field contains the values. int[], float[] or StringIndex. IOException If any error occurs.

Checks the internal cache for an appropriate entry, and if none is found reads the terms out of field and calls the given SortComparator to get the sort values. A hit in the cache will happen if reader, field, and comparator are the same (using equals()) as a previous call to this method.

Used to get field values. Which field contains the values. Used to convert terms into something to sort by. Array of sort objects, one for each document. IOException If any error occurs.

Interface to parse ints from document fields.

Return an integer representation of this field's value.

Interface to parse floats from document fields.

Return an float representation of this field's value.

The internal cache. Maps Entry to array of interpreted term values. *

See if an object is in the cache.

See if a custom object is in the cache.

Put an object into the cache.

Put a custom object into the cache.

The pattern used to detect float values in a field

removed for java 1.3 compatibility protected static final Object pFloats = Pattern.compile ("[0-9+\\-\\.eEfFdD]+");

Expert: Every key in the internal cache is of this type.

Creates one of these objects.

Creates one of these objects for a custom comparator.

Two of these are equal iff they reference the same field and type.

Composes a hashcode based on the field and type.

Expert: Returned by low-level search implementations.

Expert: The score of this document for the query.

Expert: A hit document's number.

Expert: Constructs a ScoreDoc.

Expert: The values which are used to sort the referenced document. The order of these will match the original sort criteria given by a Sort object. Each Object will be either an Integer, Float or String, depending on the type of values in the terms of the original field.

Expert: Creates one of these objects with empty sort information.

Expert: Creates one of these objects with the given sort information.

Creates a hit queue sorted by the given list of fields.

Field names, in priority order (highest priority first). The number of hits to retain. Must be greater than zero.

Allows redefinition of sort fields if they are null. This is to handle the case using ParallelMultiSearcher where the original list contains AUTO and we don't know the actual sort type until the values come back. The fields can only be set once. This method is thread safe.

Returns the fields being used to sort.

Returns an array of collators, possibly null. The collators correspond to any SortFields which were given a specific locale.

Array of sort fields. Array, possibly null.

Returns whether a is less relevant than b.

ScoreDoc ScoreDoc true if document a should be sorted after document b.

Creates a hit queue sorted by the given list of fields.

Index to use. Field names, in priority order (highest priority first). Cannot be null or empty. The number of hits to retain. Must be greater than zero. IOException

Stores a comparator corresponding to each field being sorted by

Stores the sort criteria being used.

Stores the maximum score value encountered, needed for normalizing.

returns the maximum score encountered by elements inserted via insert()

Returns whether a is less relevant than b.

ScoreDoc ScoreDoc true if document a should be sorted after document b.

Given a FieldDoc object, stores the values used to sort the given document. These values are not the raw values out of the index, but the internal representation of them. This is so the given search hit can be collated by a MultiSearcher with other search hits.

The FieldDoc to store sort values into. The same FieldDoc passed in.

Returns the SortFields being used by this hit queue.

Internal cache of comparators. Similar to FieldCache, only caches comparators instead of term values.

Returns a comparator if it is in the cache.

Stores a comparator into the cache.

Returns a comparator for sorting hits according to a field containing integers.

Index to use. Field containg integer values. Comparator for sorting hits. IOException If an error occurs reading the index.

Returns a comparator for sorting hits according to a field containing floats.

Index to use. Field containg float values. Comparator for sorting hits. IOException If an error occurs reading the index.

Returns a comparator for sorting hits according to a field containing strings.

Index to use. Field containg string values. Comparator for sorting hits. IOException If an error occurs reading the index.

Returns a comparator for sorting hits according to a field containing strings.

Index to use. Field containg string values. Comparator for sorting hits. IOException If an error occurs reading the index.

Returns a comparator for sorting hits according to values in the given field. The terms in the field are looked at to determine whether they contain integers, floats or strings. Once the type is determined, one of the other static methods in this class is called to get the comparator.

Index to use. Field containg values. Comparator for sorting hits. IOException If an error occurs reading the index.

Returns the value used to sort the given document. The object returned must implement the java.io.Serializable interface. This is used by multisearchers to determine how to collate results from their searchers.

Document Serializable object

Returns the type of sort. Should return SortField.SCORE, SortField.DOC, SortField.STRING, SortField.INTEGER, SortField.FLOAT or SortField.CUSTOM. It is not valid to return SortField.AUTO. This is used by multisearchers to determine how to collate results from their searchers.

One of the constants in SortField.

Constructs a new query which applies a filter to the results of the original query. Filter.bits() will be called every time this query is used in a search.

Query to be filtered, cannot be null. Filter to apply to query results, cannot be null.

Returns a Weight that applies the filter to the enclosed query's Weight. This is accomplished by overriding the Scorer returned by the Weight.

Rewrites the wrapped query.

Prints a user-readable version of this query.

Returns true iff o is equal to this.

Returns a hash code value for this object.

Equality compare on the term

Equality measure on the term

Indicates the end of the enumeration has been reached

Returns the docFreq of the current Term in the enumeration. Returns -1 if no Term matches or all terms have been enumerated.

Increments the enumeration to the next element. True if one exists.

Returns the current Term in the enumeration. Returns null if no Term matches or all terms have been enumerated.

Closes the enumeration to further activity, freeing resources.

Implements the fuzzy search query. The similiarity measurement is based on the Levenshtein (edit distance) algorithm.

Constructs a query for terms matching term.

Returns the pattern term.

Construct the enumeration to be used, expanding the pattern term.

Prints a user-readable version of this query.

Create a new FuzzyQuery that will match terms with a similarity of at least minimumSimilarity to term. If a prefixLength > 0 is specified, a common prefix of that length is also required.

the term to search for a value between 0 and 1 to set the required similarity between the query term and the matching terms. For example, for a minimumSimilarity of 0.5 a term of the same length as the query term is considered similar to the query term if the edit distance between both terms is less than length(term)*0.5 length of common (non-fuzzy) prefix IllegalArgumentException if minimumSimilarity is >= 1 or < 0

or if prefixLength < 0

Calls {@link #FuzzyQuery(Term, float) FuzzyQuery(term, minimumSimilarity, 0)}.

Calls {@link #FuzzyQuery(Term, float) FuzzyQuery(term, 0.5f, 0)}.

Returns the minimum similarity that is required for this query to match.

float value between 0.0 and 1.0

Returns the non-fuzzy prefix length. This is the number of characters at the start of a term that must be identical (not fuzzy) to the query term if the query is to match that term.

The termCompare method in FuzzyTermEnum uses Levenshtein distance to calculate the distance between the given term and the comparing term.

Finds and returns the smallest of three integers

Grow the second dimension of the array, so that we can calculate the Levenshtein difference.

The max Distance is the maximum Levenshtein distance for the text compared to some other value that results in score that is better than the minimum similarity.

the length of the "other value" the maximum levenshtein distance that we care about

Wrapper used by {@link HitIterator} to provide a lazily loaded hit from {@link Hits}.

Jeremy Rayner

Constructed from {@link HitIterator}

Hits returned from a search Hit index in Hits

Returns document for this hit.

Returns score for this hit.

Returns id for this hit.

Returns the boost factor for this hit on any field of the underlying document.

Prints the parameters to be used to discover the promised result.

An iterator over {@link Hits} that provides lazy fetching of each document. {@link Hits#Iterator()} returns an instance of this class. Calls to {@link #next()} return a {@link Hit} instance.

Jeremy Rayner

Constructed from {@link Hits#Iterator()}.

true if current hit is less than the total number of {@link Hits}.

Unsupported operation.

UnsupportedOperationException

Returns the total number of hits.

Returns a {@link Hit} instance representing the next hit in {@link Hits}.

Next {@link Hit}.

A ranked list of documents, used to hold search results.

Tries to add new documents to hitDocs. Ensures that the hit numbered min has been retrieved.

Returns the total number of hits available in this set.

Returns the score for the nth document in this set.

Returns the id for the nth document in this set.

Returns a {@link HitIterator} to navigate the Hits. Each item returned from {@link Iterator#next()} is a {@link Hit}.

Caution: Iterate only over the hits needed. Iterating over all hits is generally not desirable and may be the source of performance issues.

Frees resources associated with this Searcher. Be careful not to call this method while you are still using objects like {@link Hits}.

Expert: Returns the number of documents containing term. Called by search code to compute term weights.

Expert: For each term in the terms array, calculates the number of documents containing term. Returns an array with these document frequencies. Used to minimize number of remote calls.

Expert: Returns one greater than the largest possible document number. Called by search code to compute term weights.

Expert: Returns the stored fields of document i. Called by {@link HitCollector} implementations.

Expert: called to re-write queries into primitive queries.

BooleanQuery.TooManyClauses

Returns the documents matching query.

BooleanQuery.TooManyClauses

Returns the documents matching query and filter.

BooleanQuery.TooManyClauses

Returns documents matching query sorted by sort.

BooleanQuery.TooManyClauses

Returns documents matching query and filter, sorted by sort.

BooleanQuery.TooManyClauses

The Similarity implementation used by this searcher.

Expert: Set the Similarity implementation used by this Searcher.

creates a weight for query

new weight

Creates a searcher searching the index in the named directory.

Creates a searcher searching the index in the provided directory.

Creates a searcher searching the provided index.

Return the {@link IndexReader} this searches.

Note that the underlying IndexReader is not closed, if IndexSearcher was constructed with IndexSearcher(IndexReader r). If the IndexReader was supplied implicitly by specifying a directory, then the IndexReader gets closed.

A query that matches all documents.

John Wang

MultiPhraseQuery is a generalized version of PhraseQuery, with an added method {@link #Add(Term[])}. To use this class, to search for the phrase "Microsoft app*" first use add(Term) on the term "Microsoft", then find all terms that have "app" as prefix using IndexReader.terms(Term), and use MultiPhraseQuery.add(Term[] terms) to add them to the query.

Anders Nielsen 1.0

Sets the phrase slop for this query.

Add a single term at the next position in the phrase.

Add multiple terms at the next position in the phrase. Any of the terms may match.

Allows to specify the relative position of terms within the phrase.

Returns the relative positions of terms in this phrase.

Prints a user-readable version of this query.

Returns true if o is equal to this.

Returns a hash code value for this object.

Creates a searcher which searches searchables.

Return the array of {@link Searchable}s this searches.

Returns index of the searcher for document n in the array used to construct this searcher.

Returns the document number of document n within its sub-index.

Create weight in multiple index scenario. Distributed query processing is done in the following steps: 1. rewrite query 2. extract necessary terms 3. collect dfs for these terms from the Searchables 4. create query weight using aggregate dfs. 5. distribute that weight to Searchables 6. merge results Steps 1-4 are done here, 5+6 in the search() methods

rewritten queries

Document Frequency cache acting as a Dummy-Searcher. This class is no full-fledged Searcher, but only supports the methods necessary to initialize Weights.

A scorer that matches no document at all.

Creates a searcher which searches searchables.

TODO: parallelize this one too

A search implementation which spans a new thread for each Searchable, waits for each search to complete and merge the results back together.

A search implementation allowing sorting which spans a new thread for each Searchable, waits for each search to complete and merges the results back together.

A thread subclass for searching a single searchable

Support class used to handle threads

This interface should be implemented by any class whose instances are intended to be executed by a thread.

This method has to be implemented in order that starting of the thread causes the object's run method to be called in that separately executing thread.

Contains conversion support elements such as classes, interfaces and static methods.

Support class used to handle threads

The instance of System.Threading.Thread

Initializes a new instance of the ThreadClass class

Initializes a new instance of the Thread class.

The name of the thread

Initializes a new instance of the Thread class.

A ThreadStart delegate that references the methods to be invoked when this thread begins executing

Initializes a new instance of the Thread class.

A ThreadStart delegate that references the methods to be invoked when this thread begins executing The name of the thread

This method has no functionality unless the method is overridden

Causes the operating system to change the state of the current thread instance to ThreadState.Running

Interrupts a thread that is in the WaitSleepJoin thread state

Blocks the calling thread until a thread terminates

Blocks the calling thread until a thread terminates or the specified time elapses

Time of wait in milliseconds

Blocks the calling thread until a thread terminates or the specified time elapses

Time of wait in milliseconds Time of wait in nanoseconds

Resumes a thread that has been suspended

Raises a ThreadAbortException in the thread on which it is invoked, to begin the process of terminating the thread. Calling this method usually terminates the thread

Raises a ThreadAbortException in the thread on which it is invoked, to begin the process of terminating the thread while also providing exception information about the thread termination. Calling this method usually terminates the thread.

An object that contains application-specific information, such as state, which can be used by the thread being aborted

Suspends the thread, if the thread is already suspended it has no effect

Obtain a String that represents the current Object

A String that represents the current Object

Gets the currently running thread

The currently running thread

Gets the current thread instance

Gets or sets the name of the thread

Gets or sets a value indicating the scheduling priority of a thread

Gets a value indicating the execution status of the current thread

Gets or sets a value indicating whether or not a thread is a background thread.

Represents the methods to support some operations over files.

Returns an array of abstract pathnames representing the files and directories of the specified path.

The abstract pathname to list it childs. An array of abstract pathnames childs of the path specified or null if the path is not a directory

A simple class for number conversions.

Min radix value.

Max radix value.

Converts a number to System.String in the specified radix.

A number to be converted. A radix. A System.String representation of the number in the specified redix.

Parses a number in the specified radix.

An input System.String. A radix. The parsed number in the specified radix.

Performs an unsigned bitwise right shift with the specified number

Number to operate on Ammount of bits to shift The resulting number from the shift operation

Returns the index of the first bit that is set to true that occurs on or after the specified starting index. If no such bit exists then -1 is returned.

The BitArray object. The index to start checking from (inclusive). The index of the next set bit.

Mimics Java's Character class.

Use for .NET 1.1 Framework only.

Constructs an empty phrase query.

Returns the slop. See setSlop().

Adds a term to the end of the query phrase. The relative position of the term is the one immediately after the last term added.

Adds a term to the end of the query phrase. The relative position of the term within the phrase is specified explicitly. This allows e.g. phrases with more than one term at the same position or phrases with gaps (e.g. in connection with stopwords).

Returns the set of terms in this phrase.

Returns the relative positions of terms in this phrase.

Prints a user-readable version of this query.

Returns true iff o is equal to this.

Returns a hash code value for this object.

A Query that matches documents containing terms with a specified prefix. A PrefixQuery is built by QueryParser for input like app*.

Constructs a query for terms starting with prefix.

Returns the prefix of this query.

Prints a user-readable version of this query.

Returns true iff o is equal to this.

Returns a hash code value for this object.

Constructs a filter which only matches documents matching query.

The original list of terms from the query, can contain duplicates

A Filter that restricts search results to a range of values in a given field.

This code borrows heavily from {@link RangeQuery}, but is implemented as a Filter

The field this range applies to The lower bound on this range The upper bound on this range Does this range include the lower bound? Does this range include the upper bound? IllegalArgumentException if both terms are null or if

lowerTerm is null and includeLower is true (similar for upperTerm and includeUpper)

Constructs a filter for field fieldName matching less than or equal to upperTerm.

Constructs a filter for field fieldName matching greater than or equal to lowerTerm.

Returns a BitSet with true for documents which should be permitted in search results, and false for those that should not.

Returns true if o is equal to this.

Returns a hash code value for this object.

A Query that matches documents within an exclusive range. A RangeQuery is built by QueryParser for input like [010 TO 120].

$Id: RangeQuery.java 329381 2005-10-29 09:26:21Z ehatcher $

Constructs a query selecting all terms greater than lowerTerm but less than upperTerm. There must be at least one term and either term may be null, in which case there is no bound on that side, but if there are two terms, both terms must be for the same field.

Returns the field name for this query

Returns the lower term of this range query

Returns the upper term of this range query

Returns true if the range query is inclusive

Prints a user-readable version of this query.

Returns true iff o is equal to this.

Returns a hash code value for this object.

A remote searchable implementation.

$Id: RemoteSearchable.java 351472 2005-12-01 21:15:53Z bmesser $

Constructs and exports a remote searcher.

Exports a searcher for the index in args[0] named "//localhost/Searchable".

Construct a ReqExclScorer.

The scorer that must match, except where indicates exclusion.

Returns the score of the current document matching the query. Initially invalid, until {@link #Next()} is called the first time.

The score of the required scorer.

The scorers passed from the constructor. These are set to null as soon as their next() or skipTo() returns false.

Construct a ReqOptScorer.

The required scorer. This must match. The optional scorer. This is used for scoring only.

Returns the score of the current document matching the query. Initially invalid, until {@link #Next()} is called the first time.

The score of the required scorer, eventually increased by the score of the optional scorer when it also matches the current document.

Explain the score of a document.

Also show the total score.

See BooleanScorer.explain() on how to do this.

Special comparator for sorting hits according to computed relevance (document score).

Special comparator for sorting hits according to index order (document number).

Represents sorting by computed relevance. Using this sort criteria returns the same results as calling {@link Searcher#Search(Query) Searcher#search()}without a sort criteria, only with slightly more overhead.

Represents sorting by index order.

Sorts by computed relevance. This is the same sort criteria as calling {@link Searcher#Search(Query) Searcher#search()}without a sort criteria, only with slightly more overhead.

Sorts by the terms in field then by index order (document number). The type of value in field is determined automatically.

Sorts possibly in reverse by the terms in field then by index order (document number). The type of value in field is determined automatically.

Sorts in succession by the terms in each field. The type of value in field is determined automatically.

Sorts by the criteria in the given SortField.

Sorts in succession by the criteria in each SortField.

Sets the sort to the terms in field then by index order (document number).

Sets the sort to the terms in field possibly in reverse, then by index order (document number).

Sets the sort to the terms in each field in succession.

Sets the sort to the given criteria.

Sets the sort to the given criteria in succession.

Representation of the sort criteria.

Array of SortField objects used in this sort criteria

Creates a comparator for the field in the given index.

Index to create comparator for. Field to create comparator for. Comparator of ScoreDoc objects. IOException If an error occurs reading the index.

Sort by document score (relevancy). Sort values are Float and higher values are at the front.

Sort by document number (index order). Sort values are Integer and lower values are at the front.

Guess type of sort based on field contents. A regular expression is used to look at the first term indexed for the field and determine if it represents an integer number, a floating point number, or just arbitrary string characters.

Sort using term values as Strings. Sort values are String and lower values are at the front.

Sort using term values as encoded Integers. Sort values are Integer and lower values are at the front.

Sort using term values as encoded Floats. Sort values are Float and lower values are at the front.

Sort using a custom Comparator. Sort values are any Comparable and sorting is done according to natural order.

Represents sorting by document score (relevancy).

Represents sorting by document number (index order).

Creates a sort by terms in the given field where the type of term value is determined dynamically ({@link #AUTO AUTO}).

Name of field to sort by, cannot be null.

Creates a sort, possibly in reverse, by terms in the given field where the type of term value is determined dynamically ({@link #AUTO AUTO}).

Name of field to sort by, cannot be null. True if natural order should be reversed.

Creates a sort by terms in the given field with the type of term values explicitly given.

Name of field to sort by. Can be null if type is SCORE or DOC. Type of values in the terms.

Creates a sort, possibly in reverse, by terms in the given field with the type of term values explicitly given.

Name of field to sort by. Can be null if type is SCORE or DOC. Type of values in the terms. True if natural order should be reversed.

Creates a sort by terms in the given field sorted according to the given locale.

Name of field to sort by, cannot be null. Locale of values in the field.

Creates a sort, possibly in reverse, by terms in the given field sorted according to the given locale.

Name of field to sort by, cannot be null. Locale of values in the field.

Creates a sort with a custom comparison function.

Name of field to sort by; cannot be null. Returns a comparator for sorting hits.

Creates a sort, possibly in reverse, with a custom comparison function.

Name of field to sort by; cannot be null. Returns a comparator for sorting hits. True if natural order should be reversed.

Returns the name of the field. Could return null if the sort is by SCORE or DOC.

Name of field, possibly null.

Returns the type of contents in the field.

One of the constants SCORE, DOC, AUTO, STRING, INT or FLOAT.

Returns the Locale by which term values are interpreted. May return null if no Locale was specified.

Locale, or null.

Returns whether the sort should be reversed.

True if natural order should be reversed.

A Query that matches documents containing a term. This may be combined with other terms with a {@link BooleanQuery}.

Constructs a query for the term t.

Returns the term of this query.

Prints a user-readable version of this query.

Returns true iff o is equal to this.

Returns a hash code value for this object.

Expert: A Scorer for documents matching a Term.

Returns the current document number matching the query. Initially invalid, until {@link #Next()} is called the first time.

Returns a string representation of this TermScorer.

Construct to collect a given number of hits.

the maximum number of hits to collect

The total number of documents that matched this query.

The top-scoring hits.

Expert: Returned by low-level search implementations.

Expert: The total number of hits for the query.

Expert: The top hits for the query.

Expert: Stores the maximum score value encountered, needed for normalizing.

Expert: Returns the maximum score value encountered.

Expert: Sets the maximum score value encountered.

Expert: Constructs a TopDocs.

Construct to collect a given number of hits.

the index to be searched the sort criteria the maximum number of hits to collect

The fields which were used to sort results by.

Creates one of these objects.

Total number of hits for the query. The top hits for the query. The sort criteria used to find the top hits. The maximum score encountered.

Implements the wildcard search query. Supported wildcards are *, which matches any character sequence (including the empty one), and ?, which matches any single character. Note this query can be slow, as it needs to iterate over many terms. In order to prevent extremely slow WildcardQueries, a Wildcard term should not start with one of the wildcards * or ?.

***************************************** String equality with support for wildcards ******************************************

Determines if a word matches a wildcard pattern. Work released by Granta Design Ltd after originally being done on company time.

Base implementation class for buffered {@link IndexOutput}.

Abstract base class for output to a file in a Directory. A random-access output stream. Used for all Lucene index output operations.

Writes a single byte.

Writes an array of bytes.

the bytes to write the number of bytes to write

Writes an int as four bytes.

Writes an int in a variable-length format. Writes between one and five bytes. Smaller values take fewer bytes. Negative numbers are not supported.

Writes a long as eight bytes.

Writes an long in a variable-length format. Writes between one and five bytes. Smaller values take fewer bytes. Negative numbers are not supported.

Writes a string.

Writes a sequence of UTF-8 encoded characters from a string.

the source of the characters the first character in the sequence the number of characters in the sequence

Forces any buffered output to be written.

Closes this stream to further operations.

Returns the current position in this file, where the next write will occur.

Sets current position in this file, where the next write will occur.

The number of bytes in the file.

Writes a single byte.

Writes an array of bytes.

the bytes to write the number of bytes to write

Forces any buffered output to be written.

Expert: implements buffer write. Writes bytes at the current position in the output.

the bytes to write the number of bytes to write

Closes this stream to further operations.

Returns the current position in this file, where the next write will occur.

Sets current position in this file, where the next write will occur.

The number of bytes in the file.

Straightforward implementation of {@link Directory} as a directory of files.

Doug Cutting

This cache of directories ensures that there is a unique Directory instance per path, so that synchronization on the Directory can be used to synchronize access between readers and writers. This should be a WeakHashMap, so that entries can be GC'd, but that would require Java 1.2. Instead we use refcounts...

Set whether Lucene's use of lock files is disabled. By default, lock files are enabled. They should only be disabled if the index is on a read-only medium like a CD-ROM.

Returns whether Lucene's use of lock files is disabled.

true if locks are disabled, false if locks are enabled.

Directory specified by Lucene.Net.lockDir or java.io.tmpdir system property

The default class which implements filesystem-based directories.

A buffer optionally used in renameTo method

Returns an array of strings, one for each file in the directory.

Returns true iff a file with the given name exists.

Returns the time the named file was last modified.

Set the modified time of an existing file to now.

Returns the length in bytes of a file in the directory.

Removes an existing file in the directory.

Renames an existing file in the directory.

Creates a new, empty file in the directory with the given name. Returns a stream writing this file.

Returns a stream reading an existing file.

So we can do some byte-to-hexchar conversion below

Constructs a {@link Lock} with the specified name. Locks are implemented with {@link File#createNewFile()}.

the name of the lock file an instance of Lock holding the lock

Closes the store to future operations.

For debug output.

IndexInput methods

Method used for testing. Returns true if the underlying file descriptor is valid.

output methods:

Random-access methods

A memory-resident {@link Directory} implementation.

$Id: RAMDirectory.java 351779 2005-12-02 17:37:50Z bmesser $

Constructs an empty {@link Directory}.

Creates a new RAMDirectory instance from the {@link FSDirectory}.

a File specifying the index directory

Creates a new RAMDirectory instance from the {@link FSDirectory}.

a String specifying the full index directory path

Returns an array of strings, one for each file in the directory.

Returns true iff the named file exists in this directory.

Returns the time the named file was last modified.

Set the modified time of an existing file to now.

Returns the length in bytes of a file in the directory.

Removes an existing file in the directory.

Creates a new, empty file in the directory with the given name. Returns a stream writing this file.

Returns a stream reading an existing file.

Construct a {@link Lock}.

the name of the lock file

Closes the store to future operations.

A memory-resident {@link IndexInput} implementation.

$Id: RAMInputStream.java 150537 2004-09-28 20:45:26Z cutting $

A memory-resident {@link IndexOutput} implementation.

$Id: RAMOutputStream.java 150537 2004-09-28 20:45:26Z cutting $

Construct an empty output buffer.

Copy the current contents of this buffer to the named output.

Resets this to an empty buffer.

Optimized implementation of a vector of bits. This is more-or-less like java.util.BitSet, but also includes the following:

a count() method, which efficiently computes the number of one bits;
optimized read from and write to disk;
inlinable get() method;

Doug Cutting $Id: BitVector.java 150536 2004-09-28 18:15:52Z cutting $

Constructs a vector capable of holding n bits.

Sets the value of bit to one.

Sets the value of bit to zero.

Returns true if bit is one and false if it is zero.

Returns the number of bits in this vector. This is also one greater than the number of the largest valid bit number.

Returns the total number of one bits in this vector. This is efficiently computed and cached, so that, if the vector is not changed, no recomputation is done for repeated calls.

Writes this vector to the file name in Directory d, in a format that can be read by the constructor {@link #BitVector(Directory, String)}.

Constructs a bit vector from the file name in Directory d, as written by the {@link #write} method.

Some useful constants.

Doug Cutting $Id: Constants.java 189792 2005-06-09 18:58:30Z bmesser $

True iff this is Java version 1.1.

True iff this is Java version 1.2.

True iff this is Java version 1.3.

True iff running on Linux.

True iff running on Windows.

True iff running on SunOS.

Floating point numbers smaller than 32 bits.

yonik $Id$

Converts an 8 bit float to a 32 bit float.

byteToFloat(b, mantissaBits=3, zeroExponent=15)

byteToFloat(b, mantissaBits=5, zeroExponent=2)

Methods for manipulating strings. $Id: StringHelper.java 150248 2004-03-25 13:39:59Z otis $

Compares two strings, character by character, and returns the first position where the two strings differ from one another.

The first string to compare The second string to compare The first position where the two strings differ.

Lucene's package information, including version. *