Translating documents paragraph-wise
Gijos autorius: CafeTran Trainer
CafeTran Trainer
CafeTran Trainer
Nyderlandai
Narys (2006)
Jan 31

Sometimes it is better to translate documents paragraph by paragraph so that you can move sentences within paragraphs. While this is possible with text documents, it is not possible with Ms Word documents: you can't segment them by paragraph.

You have to do a conversion to plain text. But you don't want to lose the inline character formatting, such as bold, italics, and underline.

This is the sample document:

Image-000301

First, you run a macro to mark the character formatting, like bold, italic, and underline. I use a kind of Markdown for this (kind of, because Markdown doesn't have a tag for underlined):


Sub CharacterFormattingToMarkdown()
'Mark character formatting with a kind of Markdown

Selection.Find.ClearFormatting
Selection.Find.Font.Italic = True
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "〈?@〉"
.Replacement.Text = "*^&*"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.ClearFormatting
Selection.Find.Font.Bold = True
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "〈?@〉"
.Replacement.Text = "**^&**"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
Selection.Find.ClearFormatting
Selection.Find.Font.Underline = wdUnderlineSingle
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "〈?@〉"
.Replacement.Text = "_^&_"
.Forward = True
.Wrap = wdFindContinue
.Format = True
.MatchCase = False
.MatchWholeWord = False
.MatchAllWordForms = False
.MatchSoundsLike = False
.MatchWildcards = True
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub


Then you copy the content of the document into a plain text document. You can import this into CafeTran Espresso while segmenting per paragraph. You get long segments where you can move sentences. You can see which words need to be bold, italic or underlined. Translate the document, export, paste the translation into Ms Word and run another macro to replace the Markdown with real character formatting.

Image-000296

Image-000299

Image-000298

https://share.cleanshot.com/xRBKvDKC


 
CafeTran Trainer
CafeTran Trainer
Nyderlandai
Narys (2006)
TEMOS KŪRĖJA(S)
HTML Jan 31

On second thought, I think it's better not to use Markdown, but HTML tags: that way the character formatting will show up in the grid. You can choose to define some non-translatables to hide the HMTL tags.

Image-000303

Image-000302

Image-000305

Image-000304


 
Epameinondas Soufleros
Epameinondas Soufleros  Identity Verified
Graikija
Local time: 07:37
Narys (2008)
iš anglų į graikų
+ ...
OmegaT to the rescue Jan 31

Alternatively, you can use the free and open-source OmegaT, which makes the task very easy: all you need to do is select a checkbox.

CafeTran Trainer
 
CafeTran Trainer
CafeTran Trainer
Nyderlandai
Narys (2006)
TEMOS KŪRĖJA(S)
Nice Feb 1

Thank you for pointing this out. I just had a look at it and you are right.

From the manual:
Segments are by default paragraphs defined by the file format itself.

Not using sentence segmentation on a document is equivalent to using paragraph segmentation. In that case, each paragraph (as defined in the original document format) is displayed as a single segment, and the translator is free to reorganize the sentences within the segment in the translation.

Paragraph segmentation works well with more literary or creative texts, as well as, more generally, with documents for which translation memory matches are not so important.

Maybe CafeTran Espresso will one day get segmentation per paragraph for Ms Word files. This would be especially useful when using AI.


[Edited at 2025-02-01 09:27 GMT]


 


To report site rules violations or get help, contact a site moderator:

Šio forumo moderatoriai
Natalie[Call to this topic]

You can also contact site staff by submitting a support request »

Translating documents paragraph-wise






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »