Date  |  Author  |  Subject  |  Thread

REPLY TO THIS MESSAGE OR POST A NEW MESSAGE

RE: More about PDF


Much of what Rich has said in his two messages is true, but some clarification is needed. First, about pasting text from a multicolumn PDF document into Word, you need to use the column select tool (I think that's what it's called. It's a hidden tool under the select text button on the palette.)

PDF files can store a page of text in three alternative ways: as an image, as coded text (like ASCII or .doc), or as a combination of the two. Which way is used is decided when the file is distilled.

When an PDF file is produced simply by scanning a printed document and distilling the scans, the resulting file is huge and the text is unsearchable. An agency like EPA should never use this method of making PDF files.

A second technique is to run the scans through a OCR program (this is basically what Adobe Capture does.) The resulting file is much smaller and is searchable. The trouble with this technique is that it requires an enormous amount of effort proofreading and correcting the resulting files. For example, superscripts are usually botched. The Office of Technology Assessment used this technique in its dying moments (using the beta version of Adobe Capture) to preserve all its reports on two CDS. Thank goodness they did, I use the CDS often and will be eternally grateful, but the files are riddled with errors.

The third and optimal method, is for the author or art director to use Word, Excel, QuarkExpress, PageMaker, Ventura or whatever software they've used to create the document, to print it to a .prn or .ps file, and then distill that file. The resulting file is small and searchable, free of errors, and can be created in minutes, a fraction (a hundredth?) of the time the second technique requires. This is what EPA should require of contractors/grantees producing reports, and they should require font embedding and the use of fonts made by foundries that permit font embedding (Adobe and a lot of others).

Rich's point about making HTML files is well-taken, especially for documents simple enough that a word processor suffices for their creation, but many HTML files do not print very well. No one format is optimal for all documents.

I am sorry to interrupt the discussion with these technical matters, but they do bear heavily on ready dissemination of information.



 Date  |    Author  |  Subject  |  Thread

Welcome | About this Event | Briefing Book | Join the Dialogue | Search the Site