https://www.proz.com/translation-articles/articles/2264/1/Translating-PDF-Files-with-Free-Tools/print/2264?phpv_redirected=1&set_site_lang=dut
ProZ.com - https://www.proz.com/translation-articles
Translating PDF Files with Free Tools
https://www.proz.com/translation-articles/articles/2264/1/Translating-PDF-Files-with-Free-Tools
Author: Eric Le Carre
Frankrijk
Engels naar Frans translator
http://proz.com/pro/92597 
By Eric Le Carre
Published on 03/4/2009
 
This article will show you how to count words in your PDF files, extract the texts and keep their formatting for later processing in a translation memory system.



If, like me, you receive many requests for quotation for PDF documents, especially for PDF marketing materials, and you don't know where to start because you don't own any PDF editing software applications like Abbyy PDF Transformer, Nuance PDF Converter or Solid Converter PDF, this article will show you how to count words in your PDF files, extract the texts and keep their formatting.

This solution is more a workaround than a fully functional solution for translating PDF files, as it may require some manual editing work. However, it is well suited for short to mid-sized PDF documents, especially for PDF marketing materials.

Please note that this solution doesn't work if your PDF file is password-protected and has PDF security options turned on.

The tools you need are the following:


For information on how and where to install these programs, especially AbracadabraCompteur 2, read the accompanying documentation.

Counting Words Counting words is the basic step you need to perform to know how many words there are in your PDF file and provide your customer with quoting and pricing information.

To count words with AbracadabraCompteur 2:

  1. In Adobe Reader, select Tools > Word Counter > Current Page to count the words from the currently displayed page or Tools > Word Counter > Document to count the words in a PDF files with more than one page.
  2. To count word with Translator's Abacus:
  3. Double-click the WordCount.exe file, the executable file for Translator's Abacus. For my part, I put it under C:\Program Files\Translator'sAbacus3.1 and created a shortcut on the Windows Desktop.
  4. In the Translators Abacus window, click Add files.
  5. In the Open File window, select the PDF file or files whose words you want to count.
  6. Click Report Word Count.
  7. The word count is displayed in your Web Browser.
  8. In the Translators Abacus window, click Exit to quit the application.

Extracting the text... There is a special way of extracting the text from the PDF file.

In Adobe Reader, select Editing > Select All, then select Editing > Copy. You can also use the key combinations Ctrl+A (Select All) and Ctrl+C (Copy). All the selected text is then copy into the Windows Clipboard.

...and Keeping its Formatting Using AutoUnbreak, you can keep the basic formatting attributes of the original PDF files (font names, sizes, colors, etc.) and remove most of the carriage returns/ line breaks that you get when you simply cut and paste the contents of a PDF file into an empty RTF or MS Word document.

To keep the format of your original PDF file:

  1. Double-click the AutoUnbreak.exe, the executable file for AutoUnbreak, to start the application.
  2. In the AutoUnbreak main window, click 1. Paste to paste the contents of the Windows Clipboard into AutoUnbreak.
  3. When the contents of the Windows Clipboard are in the AutoUnbreak main window, click 2. Unbreak! to remove the carriage returns/line breaks.
  4. In the Processing done! message window that appears, click OK.
  5. Back into the AutoUnbreak main window, click 3. Copy results.
  6. In the Text copied to clipboard message window that appears, click OK.
  7. Back into the AutoUnbreak main window, click Quit to close AutoUnbreak.
  8. Start your MS Word processor.
  9. In an empty MS Word page, press Ctrl+V to copy the resulting text from your AutoUnbreak session into MS Word.

You can now compare your MS Word text and the original text from the PDF file to determine whether there are still unremoved carriage returns/ line breaks and/or any other remaining formatting issues. These will have to be manually fixed.

When you are happy with your new MS Word document, you can start translating it with the translation memory system of your choice.

Happy translating!

This article was written with KompoZer, an open source WYSIWIG (What You See Is What You Get) HTML editor.


Copyright ProZ.com, 1999-2006. All rights reserved.