org.opencms.search
Class CmsSearchSimilarity

java.lang.Object
  extended by org.apache.lucene.search.Similarity
      extended by org.apache.lucene.search.DefaultSimilarity
          extended by org.opencms.search.CmsSearchSimilarity
All Implemented Interfaces:
java.io.Serializable

public class CmsSearchSimilarity
extends org.apache.lucene.search.DefaultSimilarity

Reduces the importance of the lengthNorm(String, int) factor for the CmsSearchField.FIELD_CONTENT field, while keeping the Lucene default for all other fields.

This implementation was added since apparently the default length norm is heavily biased for small documents. In the default, even if a term is found in 2 documents the same number of times, the smaller document (containing less terms) will have a score easily 3x as high as the longer document. Using this implementation the importance of the term number is reduced.

Inspired by Chuck Williams WikipediaSimilarity.

Since:
6.0.0
Version:
$Revision: 1.12 $
Author:
Alexander Kandzior
See Also:
Serialized Form

Constructor Summary
CmsSearchSimilarity()
          Creates a new instance of the OpenCms search similarity.
 
Method Summary
 float lengthNorm(java.lang.String fieldName, int numTerms)
          Special implementation for "length norm" to reduce the significance of this factor for the CmsSearchField.FIELD_CONTENT field, while keeping the Lucene default for all other fields.
 
Methods inherited from class org.apache.lucene.search.DefaultSimilarity
coord, idf, queryNorm, sloppyFreq, tf
 
Methods inherited from class org.apache.lucene.search.Similarity
decodeNorm, encodeNorm, getDefault, getNormDecoder, idf, idf, scorePayload, setDefault, tf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CmsSearchSimilarity

public CmsSearchSimilarity()
Creates a new instance of the OpenCms search similarity.

Method Detail

lengthNorm

public float lengthNorm(java.lang.String fieldName,
                        int numTerms)
Special implementation for "length norm" to reduce the significance of this factor for the CmsSearchField.FIELD_CONTENT field, while keeping the Lucene default for all other fields.

Overrides:
lengthNorm in class org.apache.lucene.search.DefaultSimilarity
See Also:
Similarity.lengthNorm(java.lang.String, int)