1

I have vision issues and read eBooks with TTS. But a lot of eBooks are only available in PDF format and PDFs are awful when you convert them to ePub because of headers and footers.

Then I found a FOSS software called BRISS which crops PDFs to remove headers and footers and also has advanced extraction features. I was overjoyed. But after converting the PDF to ePub, I realized that BRISS only hides the data outside of crop, it doesn't remove it. And when the cropped PDF is converted to ePub, the headers and footers that were invisible in the PDF will be visible in the ePub.

Is there a software that is able to destroy all text outside of view in a PDF instead of just hiding it? And I don't want to fiddle with Regexes in Calibre.

I guess if there is a software that does destructive crop then I can set the crop(of the BRISS cropped file) to a small amount so that everything outside of view is destroyed

  • In PDF a /Cropbox is designed to keep all the peripheral data but not display in a compatible reader (it is thus NEVER redacted by cropping since the whole purpose of PDF cropping is to KEEP registration and bleed metadata until reprinted without. So the simple way to remove marginal body text is reprint a smaller area at bigger size! basically this is a case of you are using briss incorrectly but Acrobat correctly so just use acrobat to reprint ! Briss does not edit scanned images simply reduces the viewport to not show the edge aberrations and same with text line can be excluded from portal – K J Aug 15 '23 at 13:22
  • Questions specifically asking for software products are "shopping recommendation" questions, and are off topic on this site. You may be able to rephrase this question to focus on the problem you are trying to solve without explicitly asking for a software recommendation. I recommend that you do that. Also, reading the Help section may be a good idea to learn about the rules of this place and why they exist. – music2myear Aug 29 '23 at 15:22

1 Answers1

1

I found a solution, though this a paid one. Still hoping for a free solution.

First use BRISS to crop.

The next steps involve using Adobe Acrobat Pro.

  1. In Acrobat Pro go to tools. Add Redact under Protect and Standardize.
  2. The open the BRISS cropped PDF and then click Redact button on the RHS of the software. It only appears if you have added Redact from Tools.
  3. Then click Santize Document and click OK. This removes all hidden data.

TL;DR: First use BRISS to crop and then use Santizie document from Redact in Tools in Adobe Acrobat Pro. This removes all hidden content(hidden by BRISS).

Then convert the eBook to ePub.