How to accurately calculate word count and repetitions?
Thread poster: Ine Spee
Ine Spee
Ine Spee
United Kingdom
English to Dutch
+ ...
Jan 10, 2023

Hi, sorry if I posted this in the wrong board, I wasn't sure where my question should go.

I have been sent an Excel file to translate by an agency, but the client does not want to pay for repetitions. So I have been asked to calculate the word count and the amount of repetitions in the file.

I have tried putting it through a few different CAT tools (Phrase/Memsource, Matecat and Smartcat), and I get very different numbers each time.
I am also unsure how exactly
... See more
Hi, sorry if I posted this in the wrong board, I wasn't sure where my question should go.

I have been sent an Excel file to translate by an agency, but the client does not want to pay for repetitions. So I have been asked to calculate the word count and the amount of repetitions in the file.

I have tried putting it through a few different CAT tools (Phrase/Memsource, Matecat and Smartcat), and I get very different numbers each time.
I am also unsure how exactly the repetitions are calculated. Would it be sentences, or singular words it counts?

Does anyone know what the best way is to find out these numbers, without me losing out too much on so called repetitions?
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 10:01
Member (2006)
English to Afrikaans
+ ...
@Ine Jan 10, 2023

Ine Spee wrote:
I have been sent an Excel file to translate by an agency, but the client does not want to pay for repetitions.

Right, so the client isn't going to pay for repetitions, but they do want you to translate the repetitions. This means that it would be best if you translated this file in a CAT tool that helps to automatically translate the repetitions that you've already translated.

I have tried putting it through a few different CAT tools (Phrase/Memsource, Matecat and Smartcat), and I get very different numbers each time.

Yes, different tools will produce different results, because they all have unique ways of counting "words". Also, it's possible that the Excel file contains elements that you cannot see (e.g. hidden cells, or additional worksheets) that some CAT tools do count and others do not count. Finally different tools have different definitions of what a "repetition" is. And some of these tools may also be calculating internal fuzzy matches (which are not useful to you but which would have been useful to a regular user of a CAT tool).

I am also unsure how exactly the repetitions are calculated. Would it be sentences, or singular words it counts?

Usually, a "repetition" is a whole sentence (or some smaller piece of text that stands on its own) that is identical to another whole sentence. However, with Excel files, clients consider something a repetition if there is an identical *cell* somewhere, i.e. even if some cells contain more than one sentence, only whole cells are taken into consideration.

Does anyone know what the best way is to find out these numbers, without me losing out too much on so called repetitions?

If possible, try to set the CAT tools' segmentation to "paragraph segmentation" so that it only considers wholse cells. Then, on the analyses that you get from all those CAT tools, subtract the word count for "repetitions" from the complete word count.

Added: Some clients (but not many) use the term "repetition" also to refer to fuzzy matches. Let's hope your client isn't one of those.

[Edited at 2023-01-10 17:42 GMT]


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 11:01
English to Russian
Your client should count reps Jan 10, 2023

It is your client who has to define a scope of work for you and remove all the content they don't want to pay for. They should say please translate this and please don't translate that. Not you. Their request sounds like please rip off yourself for us. If they don't want to pay for repetitions, you should not translate those repetitions. Lock them in your CAT tool as is before translation and let them insert those reps on their own. No pay no translation.

[Edited at 2023-01-10 18:16 GMT]


expressisverbis
Dan Lucas
Christopher Schröder
Philip Lees
blackwindmill
 
Ine Spee
Ine Spee
United Kingdom
English to Dutch
+ ...
TOPIC STARTER
Thanks for the replies Jan 10, 2023

Samuel Murray wrote:

If possible, try to set the CAT tools' segmentation to "paragraph segmentation" so that it only considers wholse cells. Then, on the analyses that you get from all those CAT tools, subtract the word count for "repetitions" from the complete word count.



Thank you for your reply!

I've tried looking for different settings when calculating the word count, but it doesn't seem these three tools let me set it to paragraph segmentation. I'm using the free version of Smartcat and Matecat, and the trial version of Phrase, so it might be that these settings are simply not available on these versions. I am not sure if there is another tool I can use to calculate the word count.

Here's what I get from all three:

Smartcat word count



Phrase word count



Matecat word count, which gives me three different reports when I select/deselect a few options...



 
Noura Tawil
Noura Tawil  Identity Verified
Syria
Local time: 11:01
Member (2013)
English to Arabic
Repetitions should be paid for Jan 10, 2023

Ine Spee wrote:

but the client does not want to pay for repetitions.


My clients pay 20-25% for repetitions. Because we WILL at least read through all repetitions (especially if they are not Context Matches (CM)), and spend time on them to make sure they fit well.
In some languages, although the segment is a repetition, you may still need to make some edits. For example, if the target language uses a masculine-feminine noun classification.


Josephine Cassar
Andriy Yasharov
blackwindmill
 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 10:01
Member (2006)
English to Afrikaans
+ ...
IMO Jan 10, 2023

In my opinion there is nothing wrong with not paying for repetitions, if repetitions are not reviewed. Nor is there anything wrong with expecting the translator to insert the repeating translations, if the translator is able to do this using e.g. a CAT tool in a way that doesn't cost him any effort.

 
Christopher Schröder
Christopher Schröder
United Kingdom
Member (2011)
Swedish to English
+ ...
Easy Jan 10, 2023

Charge them by the hour!

 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 11:01
English to Russian
RE: nothing wrong with not paying Jan 10, 2023

Samuel Murray wrote:
In my opinion there is nothing wrong with not paying for repetitions, if repetitions are not reviewed. Nor is there anything wrong with expecting the translator to insert the repeating translations, if the translator is able to do this using e.g. a CAT tool in a way that doesn't cost him any effort.
Having and using a CAT tool itself does already cost effort (investment, learning curve, further mastering, etc.). Probably it can be ok when agency gives you a tool for their job. But when they don't do anything to provide you with a tool but still expect CAT-grid discounts, it is just the next level of insolence and you don't mind...

[Edited at 2023-01-10 22:04 GMT]


Josephine Cassar
blackwindmill
 
Philip Lees
Philip Lees  Identity Verified
Greece
Local time: 11:01
Greek to English
A question Jan 11, 2023

Ine Spee wrote:

I have tried putting it through a few different CAT tools (Phrase/Memsource, Matecat and Smartcat), and I get very different numbers each time.



How much (unpaid) time have you already wasted on this potential job? How much could you have earned by spending that time doing something else?

OK, that's two questions.


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Lock Jan 11, 2023

Ine Spee wrote:

I have been sent an Excel file to translate by an agency, but the client does not want to pay for repetitions.


Ask them to lock the reps.


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 10:01
Member (2006)
English to Afrikaans
+ ...
@Ine Jan 11, 2023

I forgot to clarify something: when CAT tools analyze a text, the term "repetition" refers not to all duplicate sentences but to all duplicate sentences *except* the first one. So, in this example:

One two three. One two three. One two three. Four five six seven eight nine.

The total word count is 15, the repetition word count is 6, and the non-repetition word count is 9. The sentence "One two three" occurs three times,
... See more
I forgot to clarify something: when CAT tools analyze a text, the term "repetition" refers not to all duplicate sentences but to all duplicate sentences *except* the first one. So, in this example:

One two three. One two three. One two three. Four five six seven eight nine.

The total word count is 15, the repetition word count is 6, and the non-repetition word count is 9. The sentence "One two three" occurs three times, which means that it is repeated two times, so the "repetition" count is the total word count of the two repetitions (i.e. excluding the first instance of it).

Ine Spee wrote:
I'm using the free version of Smartcat and Matecat, and the trial version of Phrase ... Here's what I get from all three...

All three analyses from MateCat tell you the same thing, namely that the repetition count is 425 and that the total count is 10097, and from there you can calculate that the non-repetition count is 10097-425=9672. The other information in them are not useful to you because it analyzed your text against a translation memory (specifically the public one) and/or against the text itself for internal fuzzy matches (which isn't what the client asked you for).

This means that the word counts that you got were:

Smartcat: 6895 non-repetitions + 3116 repetitions = 10011
Memsource aka Phrase: 6560 non-repetitions + 3082 repetitions = 9642
Matecat: 9672 non-repetitions + 425 repetitions = 10097

Smartcat's and Memsource's counts are close enough to each other to be considered the same (they're within 15% of each other). I'm really not sure what happened to Matecat (perhaps Matecat counted whole cells? it's difficult to guess without seeing the text).

However, I strongly suspect that Smartcat's and Matecat's counts are sentence-based, not cell-based, and if you don't use a CAT tool to do the translation, it is rather important that you get the cell-based count.

I tried to figure out if it's possible to hide duplicate cells in Excel, but the solutions that I found removed *all* duplicate cells and doesn't leave the first instance of the duplicate cell.

It's possible to do it in MS Word, if you know how to do a wildcard search. First, copy all text to MS Word, and turn the table into text. Then sort the paragraphs alphabetically. Then do this search:

Find what: (*^13)@
Replace with: \1
(wildcards enabled)

This will remove all duplicate paragraphs except for the first instance. I would trust this counting method more, because everyone has MS Word and because it counts whole cells.

Again, I find it odd that the agency can't just give you this information themselves. All agencies worth their salt will have at least one working CAT tool on their computers, capable of doing this sort of analysis for you... unless this is a very complex Excel file.

Since the client has left it up to you to do the analysis, and since you're not being dishonest if you use Smartcat or Matecat to do the analysis, feel free to give the Smartcat or the Matecat analysis to the client, and tell the client in which tool you did the analysis. The client knowingly took the risk of a slightly less favourable word count, when they passed the job of counting the words on to you instead of just spending 2 minutes of their own time to do the analysis themselves. It's not dishonest, if there is no specific agreement about which specific tool you're supposed to do the analysis.

[Edited at 2023-01-11 08:46 GMT]
Collapse


Elaine S
blackwindmill
 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to accurately calculate word count and repetitions?







Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »