1

I have many documents consisting of text where words hav become incorrectly hyphenated. Like;

be-come; mons-ter; any-thing etc

I can search for the place where this has happened using;

\l-\l

any lowercase character and any other lowercaser character with a '-' in between.

How do I code the replacement to just remove the '-' character ?

Thanks for reading.

Ramses505
  • 23
  • 4
  • looks like Scribus (DTP Software open source) has a dehyphenation feature: (search this doc for `deyhphenate`) https://fossies.org/linux/scribus/scribus/doc/en/hyphenator.html Scribus is scriptable I think. – Yorik Apr 07 '21 at 15:14
  • Hi Scribus, thanks for the input, will be checking that out. – Ramses505 Apr 10 '21 at 08:23
  • If any answer helped to solve the problem please check the ✓ symbol next to the answer. – ZygD Apr 15 '21 at 07:32
  • @Yorik When I paste text into Scribus and press dehyhphenate nothing happens. It looks like Scribus can only dehyphenate text that it has itself hyphenated, so basically something like an undo function. I haven't looked at the implementation though. – Stefan Schmidt Jun 21 '22 at 15:46
  • I guess it is looking for a "soft hyphen" and treats typed hyphens as inviolate – Yorik Jun 21 '22 at 19:22

2 Answers2

1

Find what: (\l)-(\l)
Replace with: $1$2

enter image description here

ZygD
  • 2,459
  • 12
  • 26
  • 43
  • ZygD - thanks so much for that answer. Of course, I have a follow up. What might I be able to do about the following; two-eyes-and-a-mouth where I don't want to make the change. I suppose what I am asking is how do I limit the replace to words which contain only a single '-' character ? Thanks again for such a fastr answer, appreciated. – Ramses505 Apr 07 '21 at 14:50
  • Minor note: you won't really solve your problem by limiting to only a single hyphen, though you will limit the potential damage. The hyphenation you are attempting to remove is defined by syllable breaks, which really requires some sort of dictionary cross-reference (typographic software normally will use a hyphenation dictionary to find the breakpoints) – Yorik Apr 07 '21 at 15:11
  • I believe the OP will not do "Replace All". I assume he will manually "Replace" after inspecting every occurrence. – ZygD Apr 08 '21 at 06:20
  • Hi ZygD, after testing and using I think your right, I will have to do a one by one inspect and change. I actually think I have to more tightly frame my problem so I can automate it more. Thanks for the input and taking the time to help. – Ramses505 Apr 10 '21 at 08:21
1
  • Ctrl+H
  • Find what: (?<!-)\b(\l+)-(\l+)\b(?!-)
  • Replace with: $1$2
  • CHECK Match case
  • CHECK Wrap around
  • CHECK Regular expression
  • Replace all

Explanation:

(?<!-)      # negative lookbehind, make sure we haven't an hyphen before
\b          # word boundary
(\l+)       # group 1, 1 or more small letters
-           # an hyphen
(\l+)       # group 2, 1 or more small letters
\b          # word boundary
(?!-)       # negative lookahead, make sure we haven't an hyphen after

Screenshot (before):

enter image description here

Screenshot (after):

enter image description here

Toto
  • 17,001
  • 56
  • 30
  • 41
  • Hi Toto, thanks so much for looking and the answer you posted. I have added a comment with a question below. Thanks again. – Ramses505 Apr 07 '21 at 14:56
  • @Ramses505: see my edit. – Toto Apr 07 '21 at 15:38
  • HeyToto, thanks for taking the time to help out, appreciated. – Ramses505 Apr 07 '21 at 17:02
  • @Ramses505: You're welcome, glad it helps. Feel free to mark the answer as accepted, [How to accept an answer](https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work) – Toto Apr 07 '21 at 17:11