AnsweredAssumed Answered

Performance issue on ResultSet.getNodeRef

Question asked by amenel on Jul 13, 2010
Hi all,
We've been encountering a performance problem in one of our webscripts that has been pinned down to the getNodeRef(i) and getNodeRefs() functions in the ResultSet obtained after a Lucene query for fetching objects of a specific content type.

Basically, what our "list" webscript does is:
- search for the given type in SpacesStore using Lucene and the default searcher
- get the noderefs for each item in the result set
- for each result node, build a label from properties of the node as per a format given as a parameter to the webscript
- sort the result list by computed label.

I've done some basic profiling and came up with the following for a result set of 3049 items:
- the instruction NodeRef nodeRef = luceneResultSet.getNodeRef(i);, with i = 0..49, takes up to 7.9 seconds, that's 85,1593% of the total execution time (9.3s). That's too bad so I tried to remove the get the noderefs task from the loop… and it got worse: 
- the single (executed only one time) instruction List<NodeRef> resultSet = luceneResultSet.getNodeRefs(); takes up to a staggering 13.5 seconds… that's 92,4239% of the total webscript execution time (14.6s).

luceneResultSet = ADMLuceneSearcherImpl.getSearcher(xxxx).query(searchParameters)
and xxxx are the non-customized parameters for SpacesStore and the default searcher with the default configuration.

In both runs, only the first 50 items are returned. The problem holds whether the content type is indexed or not.

We came to the profiling because we were investigating how fetching 50 cm:person objects among 6000 could take up to 3 or 4 minutes. In the beginning, we thought that the label computations based on properties of nodes could be the culprit. But that label computation topped at a meager 5% of the total time. The sorting never went above 0.3%. On indexed content types, the actual search (i.e. getting a ResultSet) varied between 5% and 2.5% (the latter case is the one where getting the noderefs alone took 92% of the total 14 seconds).

I'm wondering what could be causing the getNodeRef functions to take so long to complete. Does anyone have some insight? Is it the searcher? or some bad configuration? I must also add that, although our custom "list" webscript has changed very little from 2.9B to 3.2, we had never used it with data sizes beyond a few hundreds so this problem is now to us.