File upload fails because of UTF8 strings in metadata

Question asked by don_eros on May 22, 2013
Latest reply on May 25, 2013 by don_eros
Hi there,

I'm having trouble uploading files (via the "Add content" button and via webdev). I think the problem is the metadata included in the files: I have the same problem with DOC, PDF and HTML files.

In all cases I managed to correct the problem by simply removing non-ASCII characters from the file's metadata, the problem seems to be in the "title" element and "description" meta for HTML files, in "Title" for PDF files and again in "Title" for DOC files.

Attached to this post you'll find the log file of a very simple case, I fixed this HTML file (a newspaper article) by removing the "è" just before the string "clausola di salvaguardia" (you'll see it on line 5 of the log).

The only workaround I found is precisely this: removing troublesome characters from meta attributes, but needless to say this is not a viable solutions, because a) it's a lot of work which I cannot ask my users to do and b) my repository contains files in dozens of different languages (including Hebrew, Russian and Arabic) and I would like to keep the metadata intact.

Is there any other more obvious solution I'm overlooking?

The specs of my system are:

<li>Ubuntu 12.04.2 LTS</li>
<li>Alfresco 4.2.c (binary installer)</li>
<li>MySQL 5.5.31 (instead of the included PostGre)</li>
<li>default locale for the OS is en_US.UTF-8</li>
<li>most tables in the MySQL database are UTF8, but some of them are latin1, default for the system is UTF8, I'm not sure what happened</li>