MemoQ extensions to segmentation rules
De persoon die dit onderwerp heeft geplaatst: Piotr Bienkowski
Piotr Bienkowski
Piotr Bienkowski  Identity Verified
Polen
Local time: 03:19
Lid 2005
Engels naar Pools
+ ...
Apr 18, 2012

Is there a help topic that explains thoroughly the MemoQ extensions to segmentation rules (I mean all those strings with # at both ends and words with underscores in the middle...). I can't find any!

I am not happy with the way MemoQ segments Polish originals, athough it appears to have some seg. rules for Polish.

Will appreciate your help.

Regards,

Piotr


 
Yasmin Moslem
Yasmin Moslem  Identity Verified
Egypte
Local time: 04:19
Engels naar Arabisch
Segmentation Rules Apr 18, 2012

Dear Piotr,

I mean all those strings with # at both ends and words with underscores in the middle...


For example, the first one #end##!#[\s]+#cap# can be read as follows:

#end# end of segment mark
#!# segment here
[\s]+ white space, one or more
#cap# capital letter


Both #end# and #cap# (as well as other ##) can be found under the tab "Custom lists".


Most importantly, could you please give us some segmented examples you do not like and how you expect them to be.

Kind regards,
Yasmin


 
Piotr Bienkowski
Piotr Bienkowski  Identity Verified
Polen
Local time: 03:19
Lid 2005
Engels naar Pools
+ ...
ONDERWERPSTARTER
This helps a bit, but Apr 18, 2012

Yasmin Moslem wrote:

Dear Piotr,

I mean all those strings with # at both ends and words with underscores in the middle...


For example, the first one #end##!#[\s]+#cap# can be read as follows:

#end# end of segment mark
#!# segment here
[\s]+ white space, one or more
#cap# capital letter


Both #end# and #cap# (as well as other ##) can be found under the tab "Custom lists".


Most importantly, could you please give us some segmented examples you do not like and how you expect them to be.

Kind regards,
Yasmin



I am looking for a comprehensive lists that explains the use of ALL such symbols.

Examples of wrong segmentation:


wykonując badania geologiczne terenu i przeprowadzając uzgodnienia z Zarządem Fabryki Mydła S.A. Przygotowywany jest również projekt budowlany inwestycji, trwają pierwsze prace przygotowawcze przed wszczęciem właściwej procedury środowiskowej.


New segment should start after S.A.


Może podlegać nieznacznym zmianom po przeprowadzeniu oceny o oddziaływaniu projektu na środowisku i uzyskaniu pozwolenia na budowę, które planowane są odpowiednio na czerwiec 2012 r. i listopad 2012 r. Prace powinny rozpocząć się we grudniu 2012 r. i zakończyć w sierpniu 2013 r. Realizacja zadania jest ważna z uwagi na powiększenie miejsca składowania pustych kontenerów przeładowywanych w większej liczbie po zakupie urządzeń przeładunkowych i zwiększenia zdolności przeładunkowej.


New segments should start before Prace and before Realizacja.

P.S. I changed the company name in my quotes. I hope my client will not take me to court


 
Yasmin Moslem
Yasmin Moslem  Identity Verified
Egypte
Local time: 04:19
Engels naar Arabisch
abbreviation + end of segment + capital letter Apr 18, 2012

Dear Piotr,

In this case, you can go to the Segmentation Rules section of your project, click "Edit" and accept creating a copy; then select the copy and click "Edit".

On the "Segmentation" tab, make sure you select the first rule on the "Rules" pane #end##!#[\s]+#cap# and then move to the "Exceptions" pane and select the second rule [\s]#abbr_onlyabbr##!#[\s]+#cap##end##!#[\s]+#cap# and then move to the "Exceptions" pane and select the second rule [\s]#abbr_onlyabbr##!#[\s]+#cap# and click "Delete". Then, click "OK" to save the settings.

Now, try to reimport the document.

For your information, the meaning of this exceptional rule that you have deleted [\s]#abbr_onlyabbr##!#[\s]+#cap# is as follows:


[\s] white space
#abbr_onlyabbr# the list of abbreviations on the list with the same name under the "Custom lists".
#!# segment here
[\s]+ white space, one or more
#cap# capital letter


For your information again, memoQ segmentation rules are consisted of:
1- some custom list names, found under the "Custom lists" tab.
2- regular expressions, which need some knowledge. You should find the main ones here:
http://kilgray.com/memoq/50/help-en/index.html?regular_expressions.html
Also, here is a useful video by Denis Hay: http://vimeo.com/36075095
3- the mark #!# which means: segment here, and which separate what should be before and after the break.


HTH,
Yasmin



[Edited at 2012-04-18 14:45 GMT]
Collapse


 
Piotr Bienkowski
Piotr Bienkowski  Identity Verified
Polen
Local time: 03:19
Lid 2005
Engels naar Pools
+ ...
ONDERWERPSTARTER
The rules don't work consistently Apr 24, 2012

Today I had this Polish piece of text:

Klaster obliczeniowy został uruchomiony na potrzeby konsorcjum utworzonego w kwietniu 2007 r. przez Politechnikę Śląską, Centrum Onkologii (Instytut im. Marii Skłodowskiej-Curie Oddział w Gliwicach), Śląski Uniwersytet Medyczny (wówczas Śląską Akademię Medyczną) oraz Uniwersytet Śląski.

The segment was broken after "im." and I had to merge it. I did not modify the seg rules at all, so I did not introduce t
... See more
Today I had this Polish piece of text:

Klaster obliczeniowy został uruchomiony na potrzeby konsorcjum utworzonego w kwietniu 2007 r. przez Politechnikę Śląską, Centrum Onkologii (Instytut im. Marii Skłodowskiej-Curie Oddział w Gliwicach), Śląski Uniwersytet Medyczny (wówczas Śląską Akademię Medyczną) oraz Uniwersytet Śląski.

The segment was broken after "im." and I had to merge it. I did not modify the seg rules at all, so I did not introduce the change you suggested, and yet sometimes text is segmented in these places, and sometimes not. Oh well, maybe it is because "im" in Polish is also an actual word in addition to being an abbreviation ("im.").

I know that maybe I grumble too much and I can always (almost always, I can't do that in online MemoQ projects) split or merge, but I would like for MemoQ to get it right the first time. and I am much more familiar with "standard" regexes than with MemoQ's extensions.

Regards,

Piotr
Collapse


 
Piotr Bienkowski
Piotr Bienkowski  Identity Verified
Polen
Local time: 03:19
Lid 2005
Engels naar Pools
+ ...
ONDERWERPSTARTER
And the seg rules are really good for nothing when... Apr 29, 2012

I want to align the HTML format in Livedocs. Paragraphs are stuck together even if the end of paragraph tag and the start paragraph tag are separated by a "real" CRLF, e.g.


z tego względu istnieją podstawy do
wyłączenia takich urządzeń z zakresu niniejszej dyrektywy.{/p}
{p}(7) W
odniesieniu do urządzeń ciśnieniowych objętych konwencjami
międzynarodowymi,


The parts before and after (7) are lumped to
... See more
I want to align the HTML format in Livedocs. Paragraphs are stuck together even if the end of paragraph tag and the start paragraph tag are separated by a "real" CRLF, e.g.


z tego względu istnieją podstawy do
wyłączenia takich urządzeń z zakresu niniejszej dyrektywy.{/p}
{p}(7) W
odniesieniu do urządzeń ciśnieniowych objętych konwencjami
międzynarodowymi,


The parts before and after (7) are lumped together. Angle brackets were changed to braces because the forum inteprets them as tags and they go away.
Collapse


 
ahmadwadan.com
ahmadwadan.com  Identity Verified
Saudi-Arabië
Local time: 05:19
Engels naar Arabisch
+ ...
Similar issue here... Oct 30, 2016

Hello,

I have a similar issue.

I wish you can help handling a segmentation rule for company names below so MemoQ treats it as ONE segment instead of two segments:

XXX Company K.S.C. (Closed)

instead of:

XXX Company K.S.C.
(Closed)

Thank you


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

MemoQ extensions to segmentation rules






Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »