Friday, March 21, 2008

Followup on gscan2pdf

I have spent quite a bit of time over the past 24 hours working with gscan2pdf. One document was unworkable, and another worked out with an estimated 95+% accuracy. I was attempting to ocr the document. The ability to save a pdf is a bonus. I could have typed all the pages in over the time it took me to get through this. As it is, I will be able to clean it up in minutes. See the note at the end of this post about Ubuntu.

Document 1: 0% OCR success.
An ancient printout on a 9 pin dot matrix printer, faint due to worn out ribbon. I attempted to deal with various settings for scanning (not many settings possible from the interface of gscan2pdf), unpaper (the options of which I understood but little, if at all), and the OCR---I specified tesseract.

0% isn't good. I will attempt to use the methods described on line using a tiff file and some tweaks. Much too much work.


Document 2: a dark, inkjet printed copy, about 9 pages, with hand written edits on the page.


Discussion and Results:
After spending some hours working with Document 1, with NO effect observed, I was pleased that Document 2 ran through gscan2pdf with almost perfect OCRs. I was displeased that I had to select and paste into a file using an editor. Did I miss something?

I did nothing to the optoins this time around: mostly defaults, except setting the language to English, and setting the scan dpi at 500. Fewer would perhaps work. It didn't take too long, though.

This is more work than one would like to have to do to get editable copies of a stack of pages. Not bad at all, and the next time around, I won't even try with faint dot matrix copies.

NOTE ABOUT UBUNTU:

This is another instance where Ubuntu has it right, or at least right enough to make my life easier. Ubuntu does share the Debian concept regarding compiling kernels and packages that gives me fits and starts once in a while. Productivity is improved at least over the short haul. I need to reflect on this a bit.

Monday, March 17, 2008

I have entered my Ubuntu period

I have installed Ubuntu on all of my machines. I installed 7.10, first. Now I have been running hardy heron, 8.04, for a month or so on two machines, my main (home) machine, and my main school machine. A laptop and a newer dual core intel machine at school are running 7.10, and the latter is dual booting with Windoze XP because I need to use Windoze for school official business (a perfect BANE). On the laptop I am running VMWare with windoze XP, for school business.

More later on this. A few remarks are pertinent, however:

Ubuntu is easy. Maintaining several gentoo machines can be a bore and a drudge. My original wants for Gentoo were to learn GNU/Linux better, and to achieve better stability after so long of a time with unstable (in real terms) debian based systems. Ubuntu is a debian based system, and nowadays, it works. The test package was avidemux and other video programs. In the old days, complicated programs were complicated to keep running, complicated with lots of problems. This program and others are running out of the box, with few glitches at all. The constant upgrade knots that I had twith Debian, Knoppix, and actually Ubuntu of old (a year and more ago) are not a serious problem, although I have had to intervene once or twice. And now VMWare: was I dreaming? It installed with a single "apt-get install vmware-server", a few tweaks per a well-written howto, including a simple XP install.

The install was still tricky, in terms of getting the particions right. I have been preserving the /home partition with home directories for years, through Knoppix, Debian, Ubuntu, Gentoo, etc. Ubuntu works ok for this but requires a methodical and careful intervention at install time.

Thoughts about Manjaro, Endeavour, and Gentoo: Recent Dives into the rabbit holes

I have tried Endeavour Linux in the past; something always feels wrong about it.  It has some good points, though.  When I saw it overtook M...