Word/character count for bilingual word file
Thread poster: Egija Zarina
Egija Zarina
Egija Zarina
Latvia
Latvian to English
+ ...
Aug 12, 2015

Hi,

I have an issue regarding the need to count one language characters/words in a bilingual file. I would like to know how and whether it is possible to be done without making a separate one language document as it takes too much time to copy each segment of the language interested (the document has many pages). The document I want to get the info about characters/words is a word (.doc) file that is original and has not been created via Trados or some other program.

Pe
... See more
Hi,

I have an issue regarding the need to count one language characters/words in a bilingual file. I would like to know how and whether it is possible to be done without making a separate one language document as it takes too much time to copy each segment of the language interested (the document has many pages). The document I want to get the info about characters/words is a word (.doc) file that is original and has not been created via Trados or some other program.

Perhaps you know some program that could do it (Trados, etc.) or you have some other suggestions. I would really appreciate your answer. Thank you in advance.
Collapse


 
Cilian O'Tuama
Cilian O'Tuama  Identity Verified
Germany
Local time: 13:43
German to English
+ ...
Different formats? Aug 13, 2015

If the two languages are formatted differently, you could run a search for and delete all text in a particular format and then just count the remaining text.

 
Soonthon LUPKITARO(Ph.D.)
Soonthon LUPKITARO(Ph.D.)  Identity Verified
Thailand
Local time: 18:43
English to Thai
+ ...
Bilingual file format? Aug 13, 2015

egya123 wrote:
I have an issue regarding the need to count one language characters/words in a bilingual file. I would like to know how and whether it is possible to be done without making a separate one language document as it takes too much time to copy each segment of the language interested (the document has many pages). The document I want to get the info about characters/words is a word (.doc) file that is original and has not been created via Trados or some other program.


I cannot imagine of your bilingual MS Word file format. I guess it is like a WordFast Classic bilingual file (which tags are hidden text fonts) or other paragraphed bilingual texts. If so, it is quite easy to count words as follows:
1. Select All in Word
2. Change Texts to Table by separating with tab or other bilingual file tags.
3. Delete the language you do not want to count (source or target?)
4. Use MS Word function to count words.
5. Count only number of tag words and subtract from 4. above.
The result is the exact count.
Note: Use MS Excel Macro to split texts if source and target bilingual texts are on different lines, and count with Word above.

Soonthon L.


 
Valerijs Svincovs
Valerijs Svincovs  Identity Verified
Latvia
Local time: 14:43
English to Latvian
+ ...
Deleting the unnecessary is all I can think of as well Aug 13, 2015

From what I have understood from the topic author's other posting (http://www.proz.com/forum/right_to_left_language_technical_forum/289493-word_character_count_for_bilingual_word_file.html ), it is a Word document which happens to be in multiple languages.
Now, assuming that is the case, I can only think of de
... See more
From what I have understood from the topic author's other posting (http://www.proz.com/forum/right_to_left_language_technical_forum/289493-word_character_count_for_bilingual_word_file.html ), it is a Word document which happens to be in multiple languages.
Now, assuming that is the case, I can only think of deleting the paragraphs in which you are not interested in (make a backup copy of the two-language file if needed) and then counting what remains using the word/character count function. I understand that, depending on the file, it may not be optimal and time-consuming. Maybe someone more clever can come up with something involving language auto-recognition and styles...

To help you locate the "other" language sentences, you can mark all text as the language you are interested in and then run spell check, which should take you to the next "other" language instance.

Another option, probably, would be translating the document with Trados 2007, not touching the segments you are not interested in and then get the wordcount from the cleanup log (Cleaned) AFTER the translation is done. If this works in your case

[Edited at 2015-08-13 19:31 GMT]
Collapse


 
Tony M
Tony M
France
Local time: 13:43
Member
French to English
+ ...
SITE LOCALIZER
IF the original doc was created properly! Aug 13, 2015

It would be reasonable to expect that the language attribute would have been correctly set for each language.

IF this is the acse, then you can simply use a copy of your file to do a search & replace on 'any character' + the language attribute set to the language you don't want, and put nothing in the replace box, then perform 'replace all'. This will simply delete all text in your unwanted language, from which point you can just do a straight word count.

Of cou
... See more
It would be reasonable to expect that the language attribute would have been correctly set for each language.

IF this is the acse, then you can simply use a copy of your file to do a search & replace on 'any character' + the language attribute set to the language you don't want, and put nothing in the replace box, then perform 'replace all'. This will simply delete all text in your unwanted language, from which point you can just do a straight word count.

Of course, the original document may not have been correctly formatted! I guess it all depends how it originated; had it been cobbled together by copying-and pasting chuncks from (say) an EN document into (say) a FR document, then the languages of those individual documents MIGHT have been set correctly and it will work. However, if this is just a document in which people have typed in the 2 different languages, chances are they won't have bothered to change the language between each chunk

If not, is there any OTHER attribute that differentiates the text, like colour or font, for example? If one of the languages uses a foreign script, chances are you could differentiate on the basis of the font that was used.

Good luck!

Oh, and by the way, this has been discussed before in the forums, if you try a search, you may find some other better suggestions.
Collapse


 
Egija Zarina
Egija Zarina
Latvia
Latvian to English
+ ...
TOPIC STARTER
Thanks Aug 14, 2015

Thank you all for your suggestions for now it stays only in manual level, but I will try to check also other forums.

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Word/character count for bilingual word file






Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »