From: legg@nospam.magma.ca   
      
   On Sun, 7 Dec 2025 19:06:21 -0700, Don Y    
   wrote:   
      
   >On 12/7/2025 6:24 PM, Waldek Hebisch wrote:   
   >> Don Y wrote:   
   >>>   
   >>> I've learned to become less "discriminating" in my collections. As long as   
   >>> I can *read* the content, it's good enough for me (given that the   
   >>> alternative was *paper*!).   
   >>   
   >> Well, I can read low quality text, but I read faster if quality is   
   >> better. Sometimes it is hard to decide if there is a speck or   
   >> a dot (or comma, apostrophe etc) on a page. And if you scan   
   >> at larger quantity you probably do not want to proofread each   
   >> page, so either you have generous margin for possible disturbances   
   >> or you risk badly scanned pages.   
   >   
   >If I can find something, on-line, that has already been scanned,   
   >it saves me the trouble of destroying a book to chop it into   
   >individual pages and feed them through a scanner. So, I can   
   >spend THAT time scanning something that I *can't* find on-line.   
   >   
   >E.g., I'd rather spend time scanning copies of _Chronobiologia_,   
   >service manuals for various pieces of kit or other documents   
   >than scanning a bunch of research papers that I can download,   
   >regardless of their quality.   
   >   
   >My "library" is VERY big. Chances are, many of these "electronic   
   >versions of paper documents" will not be "opened", again, in my   
   >lifetime. *But*, I'd hate to be forced to choosing which to   
   >preserve (regardless of quality) and which to discard.   
   >   
   >> Other thing is OCR: it seem to work quite well with actual   
   >> text (as opposed to mixture containing also figures and formula)   
   >> when scan quality is high enough. OCR is valuable if only to do   
   >> searches. Scanning at low/medium quality throws out information   
   >> that is hard or impossible to restore.   
   >   
   >About 600 dpi is the point where OCR gets dubious. Below that,   
   >you take your chances (of course, depends on typeface, size,   
   >etc.). And, you typically want to preserve page layout, tables,   
   >illustrations, photos, formulae, etc. in documents that have them.   
   >   
   >A better time investment is thinking about how you would categorize   
   >the document so you know where to start looking for it. E.g.,   
   >"old projects", "reliability", "algorithms", etc.   
      
      
   With text font variations on the same page, or imbedded images   
   from other sources, like newspaper articles. I found I had to   
   manually bury plain text beneath an image of the reproduction.   
   Very time consuming.   
      
   RL   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|