getMoreLikeThis logic in a SearchComponent (with Solrj)

Recently I needed to search using a MoreLikeThis, but not as a MoreLikeThisHandler searchHandler or as a searchComponent (which returns mlt foreach result, expensive). What I wanted was to execute a standard search given a query and then use the top result as input for a mlt search. My final requirement was to provide this functionality inside a searchHandler itself so that I could add my own logic.

So with a bit of work I managed to get the following design. Note, the code below is cobbled together for the benefit of this blog entry. It is not tested and is only meant to share the lessons I learnt from the exercise. The crux of the solution is to use MoreLikeThisHelper, which is a Helper class for MoreLikeThis that can be called from other request handlers

First you need to register your handler (called /test below) in solrconfig.xml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
 <requestHandler name="/test" class="solr.SearchHandler">
 	<lst name="defaults">
       <str name="defType">dismax</str>
       <str name="q.alt">"*:*"</str> 
       <int name="start">0</int>
       <int name="rows">2000</int>
       <str name="echoParams">all</str>
       <str name="fl">id score</str>
       <str name="qf">content</str>

       <str name="mlt.match.include">true</str>   
       <str name="mlt.fl">content</str>   
       <int name="mlt.mintf">3</int>
       <int name="mlt.mindf">1</int>  
	 </lst>

       <arr name="last-components">
         <str> customComponent </str>
       </arr>
</requestHandler>
<searchComponent name="customComponent" class="com.abc. customComponent"/>

Next, we define the actual handler by extending the SearchComponent (public class Classname extends SearchComponent)
and define (in either overridden prepare() or process() methods) the following handler logic (see here for how MoreLikeThis handler implements its logic)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
public void process (ResponseBuilder rb) 
{
	SolrParams params = rb.req.getParams();
        String q = params.get( CommonParams.Q );
	SolrIndexSearcher searcher = rb.req.getSearcher();
	List filters = rb.getFilters();
    	String defVectorSize = params.get(CommonParams.ROWS);   	
    	int vectorSize = Integer.parseInt(params.get("vectorSize",defVectorSize));
   	String defType = params.get(QueryParsing.DEFTYPE);
	defType = defType==null ? QParserPlugin.DEFAULT_QTYPE : defType;
    	String fl = params.get(CommonParams.FL);
	int start = params.getInt(CommonParams.START, 0);
	int flags = 0;
	if (fl != null) 
	    flags |= SolrPluginUtils.setReturnFields(fl, rb.rsp);

	 // Hold on to the interesting terms if relevant
	 TermStyle termStyle = TermStyle.get( params.get( MoreLikeThisParams.INTERESTING_TERMS ) );
	 List interesting = (termStyle == TermStyle.NONE )
	      ? null : new ArrayList();

         DocListAndSet mltDocs = null;

	 MoreLikeThisHelper mlt = new MoreLikeThisHelper( params, searcher );

	 // Matching options
	 boolean includeMatch = params.getBool(MoreLikeThisParams.MATCH_INCLUDE,true);
	 int matchOffset = params.getInt(MoreLikeThisParams.MATCH_OFFSET, 0);

	 try 
	 {
	    Query query = QParser.getParser(q, defType, rb.req).parse();			
	    DocList tophit = searcher.getDocList(query,filters, null, matchOffset, 1,flags);
            if( includeMatch ) {
             rsp.add( "match", tophit );
            }
           // This is an iterator, but we only handle the first match
           DocIterator iterator = tophit.iterator();
           if( iterator.hasNext() ) {
           // do a MoreLikeThis query for each document in results
           int id = iterator.nextDoc();
           DocListAndSet mltDocs = mlt.getMoreLikeThis( id, start, rows, filters, interesting, flags );
            }
          else {
              throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
                "MoreLikeThis requires either a query (?q=) or text to find similar documents." );
           }
          if( mltDocs == null ) {
             mltDocs = new DocListAndSet(); // avoid NPE
          }
          rsp.add( "response", mltDocs.docList );
         } catch(Exception e){
           // handle error logic
          }
 }

Comments welcomed.

Leave a Reply

Your email address will not be published.