3

I have a rather large project which will ultimately benefit society, and I'm looking for all the help I can muster.   I have about 130,000 pages that need to be digitized. Many of them are in packages that have staples, or are on paper that is 40 plus years old (and is quite thin compared to today’s paper). Some of it is oddly sized (full size legal, maps, and small postcard sizes..). However, we have only ~10 days to process this work (once we arrive on site). We could work through the night.  

I have a team of 6, and we have a relatively small budget to accomplish this task. We’ve considered modern scanners (such as a feed-tray fujitsu scansnap), which can process pages at ~25ppm (pages per minute), but we are concerned about pages being torn or caught (and we are trying to not jeopardize originals). There is also the question of the staples (which could be removed...). We could do flatbed, but whoa, that's a huge job to do manually! We could always do this for the very large pieces.

I'm hoping you folks have some very clever ideas on how to accomplish this...   Thank you so much for your time and help 


EDIT It seems that a combination approach (fine paper scanner + vertical copy stand) would work best so as to ensure the req'd pages/minute. One offline suggestion: A photocopier? What do we assume would happen if we simply photocopied the whole collection first, then either had the copier send a digital onwards, or copy the photocopy in a scanner. It seems like doublework to me, but I'm not familiar enough with the guts of the tech to know better.

Gryph
  • 388
  • 1
  • 5
  • 16
  • 1
    There are companies that scan books very cheaply, they may be able to do it for you or have some ideas. If they can scan a book then perhaps they can scan your stuff, or give you some feedback about how to safely scan your stuff.. You could try contacting fujitsu. I've seen a good kodak scanner before, you wouldn't put in a large amount at once. Like maybe 5 at a time and keep feeding in more manually. I don't know how it'd deal with old extra thin paper or odd sizes. odd sizes maybe not well. – barlop Jan 12 '17 at 19:33
  • 1
    apparently Panasonic KV series has a "thin paper scanning" feature, I saw that mentioned on a website and it linked to http://panasonic.net/pcc/products/scanner/kv-s1065c_1046c/features.html "Now you can continuously scan documents that are as thin as 0.04 mm. That makes it easy to scan thin forms or vouchers. And it increases the number of situations where you can use scanning — without worrying about thin paper." <-- so sounds like it might take thin paper and oddly sized. You could call panasonic too – barlop Jan 12 '17 at 19:39
  • 1
    another- http://www.scantastik.com/hardware/kodak/kodak-i2600-scanner.htm "Versatility. Small or large, thick or thin, ID cards, even embossed hard cards — no matter what you’re scanning, paper handling from Kodak comes through. The output tray can easily be adjusted to accommodate a wide range of documents." – barlop Jan 12 '17 at 19:41
  • There are all kinds of companies out there that deal with this kind of work. –  Jan 12 '17 at 21:01
  • Yes, there are a number of companies. Most charge a heaping pile of money for something like this. I think their primary intention is to work in archiving (For instance on paper that has iron gall ink that has burned through the page). It's really cool, and there's a nice resource from the library of congress on this subject here (https://loc.gov/preservation/care/scan.html) – Gryph Jan 13 '17 at 14:39
  • I should also mention they aren't too keen on helping out from a technical perspective, at least those I've contacted. I could be seen as a ruse - to get their IP! @barlop, those **scanners** seem like a great option.. I may end up getting one of those and also setting up the **vertical copy stand** mentioned below. – Gryph Jan 13 '17 at 14:42
  • 1
    If you are bringing the scanners with you I suggest scanners with LED lighting as the old CCFL could break in transit, and that is a headache you don't need. – cybernard Jan 13 '17 at 19:25
  • If **"failure is not an option"** I would buy a lot of scanners, like 10 because if you have to scan somethings at 300,600,1200dpi it is going to slow way down. Then your staff will still keep busy feeding the other scanners. Even if some are unused most stores have a 30 day return policy. 130,000 /10 scanners =13000pg ea. 20ppm =650 minutes. However, odd ball pages will slow the process down, and you will thank yourself later for the extra time you budgeted. – cybernard Jan 13 '17 at 20:12
  • 1
    @cybernard well he has a team of 6, I don't think they will be crowded around one scanner.. But also, sometimes you can't put that many in at a time, you have to manually feed a few at a time otherwise it can hiccup.. You can get the max speed of the scanner (I suppose given the resolution, so, you suggest, a slow speed), but still if he has 6 people, they won't be able to simultaneously use >6 scanners {If it's the case that you have to manually feed a bunch at a time). It is a slight skill easy to develop to feed, like as the last one is going in, put new ones in.. i've seen it done... – barlop Jan 14 '17 at 11:22
  • i've seen it done on a kodak.. – barlop Jan 14 '17 at 11:25
  • 1
    @barlop First you need more scanners because of possible breakage. Second, a person can put 100 sheets in scanner 1 and 100 more in scanner 2. At 25ppm,if that is sustainable, that means those 2 scanners are now busy for at least 4 minutes. It is conceivable, you could remove enough staples, to have enough new documents ready to feed a 3rd scanner. Also the speed of 25ppm is a physical limit, you need more scanners, if there was even the slightest can chance of using more than 1 scanner each I would jump at it. I would have many more but at least +3 for odds sized documents(hi dpi). – cybernard Jan 14 '17 at 15:45
  • My first take was to have multiple feed-tray scanners, maybe even two per person, and a runner or two grabbing documents and handing them off. But the possibility of paper damage, machines getting stuck, etc. Has some of the team thinking that flat beds would be better. I'm feeling like that would be a nightmare (even clicking "scan" 130,000 times!). – Gryph Jan 14 '17 at 16:57
  • my current take (and the answer will be, if I don't see anything else come up), will be at least a few of the **thin-paper scanners** listed above, and a vertical copy stand for the odd formats or particularly sensitive. This is aside from the other gear (staple puller, multiple hard drives (possibly RAID), a couple of desktops, etc.) – Gryph Jan 14 '17 at 17:00
  • @cybernard don't say I need it, you'll just confuse people, it's not me that is asking the question. The term is "somebody would" or one would, or the OP would. Also, i've seen a kodak scanner that was fast and reliable,but while it cud take lots of sheets itd eat 2 at a time if u put more than a small amount in at a time. And I recall a printer, HP Deskjet 895cxi which was a flagship model printer HP produced,meant to be very good, and HP is a great make for printers,but it'd eat more than one sheet at a time unless not much paper was fed into it.. maybe likewise with scanners eg my example. – barlop Jan 14 '17 at 22:38
  • What about photocopying? Any thoughts on that (see above edit?) – Gryph Jan 16 '17 at 16:45

3 Answers3

6

If you simply need facsimiles of these and do not care so much about perfect presentation, consider a camera attached to a vertical copy stand.

Guaranteed not to jam, easily adjusted for different media, reasonably straight for OCR, and far faster than a consumer flatbed.

A homemade one can be quite cheap and you can then simply drop the stack under the camera, adjust the camera so that the frame is maximally filled, and then start flipping the pages, taking a shot of each.

Auto-focus should handle any depth change, and you would never need to remove the staples/binders/etc.

Might be cheap enough you can get all 6 people working cameras.

Two things to bear in mind:

An 8.5 x 11 page @150ppi filled with random noise, rgb is going to be about 1MB jpg compressed, so you are going to need at least 200GB of free storage.

130,000 / 6 people / 10 days / 8 hours a day/ 60 minutes per hour = 5 scans per minute. I think this is doable for a camera, but not a consumer-grade flatbed scanner.

enter image description here

Yorik
  • 4,166
  • 1
  • 11
  • 16
  • 1
    Probably any camera with 8 megapixels or more is going to work. – Yorik Jan 12 '17 at 22:06
  • Now we're talking! Not sure if you know that this is actually quite similar to google's approach for scanning books (in form at least), here's a NPR article about [that](http://www.npr.org/sections/library/2009/04/the_granting_of_patent_7508978.html). I'm not sure yet, but I'm wondering if this approach (for the trickiest or large format pieces), plus a high efficiency scanner or two, might be the best balance between speed and efficiency (as the text will require **OCR** processing). – Gryph Jan 13 '17 at 14:50
  • I use a camera for everything now. I even use DSLR and a light table as a backlight for capturing 4x5 and 8x10 transparencies. – Yorik Jan 13 '17 at 15:02
  • A really large item with text you want to OCR will probably need to be stitched together. Otherwise the text size will fall below reasonable size. You probably want a minimum of 150ppi when capturing. Luckily you can text your whole workflow before you get there. The OCR can wait till after the scanning window, so long as you test and ensure you are capturing good data. – Yorik Jan 13 '17 at 15:08
  • An option that produces lower-quality results but requires significantly less setup is equipping everyone with a smartphone with a scanning app. I use [Office Lens](https://blogs.office.com/2014/03/17/office-lens-a-onenote-scanner-for-your-pocket/) on Windows Phone to produce PDFs with searchable OCR text, and I'm fairly sure there's good equivalents on Android and iPhone, such as CamScanner – Micah Lindström Jan 13 '17 at 18:54
  • @MicahLindström: I agree. Two things a DSLR setup on a scaffold has that might work in its favor for this particular job: (1) a remote switch cable so the operator can turn pages with one hand and click the button release with the other. This is more a speed consideration; (2) since the camera never moves, compositional framing need only be done once per stack (so faster). The second one can still be achieved with a smart phone simply by using rubber bands etc to mount the camera on an armature – Yorik Jan 13 '17 at 21:57
  • Has anyone in this answer considered the use of a photocopier? I am weary of the idea because of a drop in quality, but I can see the advantage of the consistent output format. Sorry to re-post similar comments, I'm not sure the threads would keep the messages maintained between answers. – Gryph Jan 16 '17 at 16:51
  • I just noticed that Adobe Acrobat DC ([rent for $25/month](https://acrobat.adobe.com/us/en/acrobat/pricing.html)) can process JPEGs into PDFs, including auto-cropping page borders and OCR. See ["Convert JPEG to PDF for archiving" video](https://acrobat.adobe.com/us/en/acrobat/how-to/convert-jpeg-tiff-scan-to-pdf.html) and [also this](https://helpx.adobe.com/acrobat/using/enhance-camera-images.html). Then using the [Action Wizard](https://helpx.adobe.com/acrobat/using/action-wizard-acrobat-pro.html), you could probably post-process all those images very quickly. – Micah Lindström Jan 31 '17 at 01:47
5

I can't answer what scanner to get, I can however speak from experience as an ex-worker who prepared, scanned and archived documents of all shapes and sizes that paper is rarely fragile and any tears are hard to spot in the digital copy.

Staples are a pain to deal with, depending on how important the corners are. If they are important not to be damage it can take 4-15 seconds to remove one depending on how stubborn they are, some also like to explode so please cover the staple with you hand to avoid eye damage.
There are two different kinds of tools for removing staples, one with metal teeth and one that just a kind of stick that you slide under the staple and then keep sliding until the staple is out.
The toothed one is way slower but rarely tears the paper and the sliding one is fast but is more likely to tear the corner.

An experienced team would handle 130K papers 150-225 man hours, inexperienced team might be double, depending on how the paperload needed to be handled. But the important part is to always keep the scanner running.

The advice I would give about the scanner and scanning is that it's very important to provide the workload to the person who is scanning in an efficient way. Collect the papers and run them together with some separators between the different documents. Split the documents in post if the scanner can't do it live.
You're really going to need a "paper jogger" in order to avoid papers messing up the orientation in the machine. WAAYY faster and better results then a human simply shaking the papers. But I only have experience with one machine so I don't know how to tell a good from a bad without using it (if there are bad ones).
It's more important to have scanner which is easy to load then it is to have a high PPM rate (everything is relative). If you can't load a 25ppm scanner with 25ppm then it's not really 25ppm worth of work you're getting. You really want to be able to load hundreds of papers at once to keep the machine rolling.

If there's any more things you're wondering about I'll try to answer those too.

  • This is great advice - do you have experience scanning fragile paper (think receipts from 20 years ago, that thin paper). I see some options above that might work, but I figured I'd like to ask directly first. The **paper jogger** seems like a good idea, though I was a bit depressed at the **price** (About 2K). I will look into a **rental** option for it. – Gryph Jan 13 '17 at 14:45
  • @Gryph I didn't handle old receipts but once in a while we did get paper that was of phone book quality and there's nothing special about it. If there was small receipts or such we taped them onto a standard sized paper and ran that through the scanner. If the small paper had information on the back we first copied the paper and then taped the original with the other side to the copy. – Gustav Eriksson Jan 14 '17 at 10:13
  • That's great to know; I'm quite concerned about that aspect. Did you ever consider just photocopying the work, then dealing with the copies? My gut says that the doublework and drop in quality would be a headache, but I can see why there's appeal - the outputted format would be consistent, and we could long scan each copy at our leisure. – Gryph Jan 16 '17 at 16:47
  • @Gryph I'm not quiet sure I follow your train of thought. Most of the papers (almost all) were of some legal important as to store (the originals), although most times we had to retrieve the originals it's because it was supposed to be sent somewhere else (senders error 95% of the time). Cheap personal copiers and even photos from phones will yield more detail than you can see with your eyes so I don't think quality will be an issue. You can always spot a print of a copy in color but text in Black Or White always looks sharp. – Gustav Eriksson Jan 19 '17 at 22:42
3

A few thoughts on removing staples

For standard document scanners you need to remove staples.

If the paper edge next to the staple does not contain any information you could consider to just cut the edge off together with the staple. The simplest and fastest way is to use a paper cutter with a lever. Rotary paper cutters are less ergonomic and slower for that purpose. With your amount of stapled documents you will soon get sore fingers if you use scissors for that purpose, especially if you have thicker stapled documents.

If you want to retain the edges, you have the choice among quite a number of different shapes of staple removers. To remove hundreds of staples a plier-shaped staple remover probably offers the best ergonomics and is the safest for the paper originals. The advantage is that it has a lever, so you need less force. Jaw-shaped removers do not have a lever. As a consequence, you need much more force and will soon get a cramp in your hand and sore muscles in the arm; the same with tongue-shaped staple removers. The risk to damage the paper with jaw-shaped ones is very high, with tongue-shaped a bit less. With jaw-shaped ones you often need to "bite" under the staple from both sides of the paper pile, especially if the paper pile is thicker and the staple long. In that case, it will take you a long time to get the staple out.

With a good plier-shaped staple remover one "bite" from the top side of the paper pile is often enough to remove the staple in one go. With the remover I use (Skrebba skre-klick) the risk of paper damage is minimal as is the force needed. But there might be others out there that are as good. With such a staple remover you are easily twice as fast as with the other two mentioned and you rarely will damage the paper.

Examples of staple removers mentioned above:

"Plier-shaped” enter image description here

“Jaw-shaped” enter image description here

“Tongue-shaped” enter image description here

user291737
  • 155
  • 1
  • 9