AnsweredAssumed Answered

Indexing XHTML

Question asked by hbf on Dec 6, 2007
Latest reply on Nov 6, 2008 by hbf
Hi,

I see that currently, only the cm:content of nodes of type HTML but not of type XHTML are indexed to Lucene.

I'd like to contribute a transformer from XHTML to plain text, only question I have is: which library should I use?

I see form HtmlParserContentTransformer.java (in Alfesco SVN) that Alfresco currently uses http://htmlparser.sourceforge.net/ for HTML-to-plain-text conversion. I could not find, however, any info on whether this thing works for XHTML, too.

Any suggestions?

Kaspar

Outcomes