Friday, August 14, 2009

Parsing Parsing Parsing...

After the release of LogicalDOC EE 4.5 our partners have reported a small problem during the automatic extraction of Tags on particular types of documents.
In fact, we checked that extracting tags from OpenOffice documents the accented letters (found in most Latin languages) were eliminated.
At this point our development team was activated to resolve the problem and to review the various parsers / extractors of LogicalDOC to extract text from documents in the best possible way and in accordance with the UTF-8.
In this way the texts are then indexed by LogicalDOC and the document archive is full-text searchable.

The Community Edition of LogicalDOC owns the parser for Microsoft Office 2003 applications (Word, Excel, Powerpoint), AbiWord, AbiWord compressed (.zabw files), OpenOffice 2.3/3.0, StarOffice, KOffice 1.6, HTML, XML, TXT, PDF, PS (PostScript), WordPerfect (versions 4, 5, 6).

The licensed version of LogicalDOC Enterprise as well as having the parser in the Open-Source release of the software is able to index the text content of Microsoft Office 2007 documents (.docx, .xslx, .pptx), Autocad DWG documents and is able to perform optical character recognition (OCR) in PDF files (PDF raster), TIFF (Multipage), JPG, PNG.

Our developers have been involved also to implement support for the extraction of the textual content of documents .eml (Thunderbird saved emails, MS Outlook Express, email forwards) and documents with .msg file extension (Microsoft Office Outlook 2007).
These new parser will be available for the next release of LogicalDOC Enterprise scheduled for next autumn.
Bookmark and Share

Tuesday, August 4, 2009

New Translation Program for LogicalDOC

After about a year since the release of LogicalDOC 3.5 and after several intermediate versions a problem have started to emerge:
keeping aligned translations of LogicalDOC with the current version of the program.

At the same time we need to understand what was the status of alignment of a given translation respect to a particular release of LogicalDOC.
Also we had a third problem: how to simplify the translation process, so that even a
person with no particular knowledge of Java could participate in the process of localization, without having to install special software or IDE to work on a translation.

At this point we began to look around to find a solution to our problems.

The first Open-Sorce solution we found was NickelsWebtranslator, but still not particularly satisfied, because the interface was a bit woody and why there was the need to install a Python interpreter on our server.

Continuing our research we have evaluated the translation system offered by Launchpad.
In the end we chose the latter because it is easy to use, even a person without knowledge of programming and without special tools may work on a translation, allowing more players to work simultaneously on the same translation and speeds up the translation with good suggestions. Also this system allows to graphically view the percentage of completion of a translation.

There are also negative aspects, such as the need to convert the RB in input into Gettext format and returned files back to .properties, but this negative aspect is largely offset by the convenience of the translation platform of Launchpad - Rosetta.

On the wiki of LogicalDOC there is a page (Translate LogicalDOC) with detailed instructions to start working with the new platform for the translations.
Bookmark and Share