AnsweredAssumed Answered

Pounds and other characters

Question asked by samuel.penn on Jan 7, 2009
I'm having some issues understanding how the WCM TinyMCE/Web Forms/FreeMarker stack handles special characters. When the user creates the content for the web form using TinyMCE, they add content into a text field such as:

John's book cost £1.

This gets stored in the XML as:

John&#39;s book cost £1.

The apostrophe is escaped, but the pound isn't. This is then extracted using the metadata extractor into the cm:description property and stored as:

John&#39;s book cost £1.

And rendered to a JSP to something like:

<p>John&#39;s book cost £1.</p>

When the final page is displayed in the browser, it appears as:

John's book cost £1.

If we try to escape the content, using abstract?xml or abstract?html, then it doesn't fix the pound sign, but escapes the ampersand, so it gets rendered as &amp;#39; in the JSP and appears as &#39; in the browser page.

To confuse things further, we have a webscript which generates a directory listing, searching for documents and outputting a list of them, showing their title and description (which is pulled from the properties on the object, not from the XML). This output also fails to display the pound signs correctly, outputting the accented A before it. I've tried to escape/strip the £ in the code, but the following FTL:

<#assign["cm:description"]?replace("£", "") />

actually fails to find the pound sign. Changing it to look for other characters works, it just seems that FreeMarker is unable to identify pound characters. I assume that this problem isn't limited to currency symbols, but applies to other non-standard characters. We are setting UTF-8 as the encoding type everywhere I can, including setting the defaultEncoding for freeMarkerProcessor in template-services-context.xml (as suggested in another forum posting).

If we call the webscript directly, it displays correctly (no A character before the pound), but Mozilla is using quirksmode to display the content in this case (according to the page properties). It's when the content is displayed as part of a UTF-8 encoded XHTML JSP page that it handles it wrong (presumably because it should be escaped since it's expecting strict content).

I think that the correct fix would be that TinyMCE stores the content correctly escaped, though it only seems to properly handle a small subset of characters. We could fix it ourselves in the Freemarker templates, except attempts to find these characters fail (as per the ?replace example above).

Has anyone had similar problems and got it working correctly?