|
|
Dual output: PostScript files are sent from the Harris XP-21 to both the imagesetters and to a Pentium running Adobe Acrobat. The Pentium then sends the resulting PDF files to a '486 PC that runs Adobe Cataloger. Tiny type: Alameda Newspapers makes finding display ads in its Acrobat archive easier by including the ad number in 5-point type on the right margin. Archiving ads without jumping through hoopsPLEASANTON, Calif. -- With most publishing technology, you either get it right away or it takes a long, long time. "I guarantee you that within the first hour, I knew we had something," said Grady Cooper, the pagination pioneer from the Alameda Newspaper Group, based here in the San Francisco Bay Area. Cooper, who masterminded the full-page output of the five titles that are produced here, was talking about the potential for using Adobe Acrobat technology for full-page archiving. A former newsroom type who is now director of systems management, Cooper built the organization's pagination system out of a front-end system he designed himself, linked to components from Harris Publishing Systems Corp. (see The Cole Papers, February 1994). The five newspapers that are all paginated in the Herald's second-floor "copy desk" -- the Alameda Times-Star, The Argus of Fremont, the Daily Review of Hayward, the Oakland Tribune and the Tri-Valley Herald, based here -- now produce in excess of 2300 PostScript-based full-page negatives a week. As with most of the technical types in the industry, Cooper had heard about Acrobat when it debuted in 1993, so he knew it was a collection of applications that allow PostScript files to be "distilled" into a document format that can be read on a variety of computer systems, regardless of whether the creating application or its fonts are available (see sidebar). In fact, Cooper had copies of Acrobat Distiller and Acrobat Exchange on his shelf -- he had attended a conference co-sponsored by Adobe Systems Inc. of Mountain View, Calif., where the applications had been distributed. But Cooper didn't take the applications off the shelf until his publisher got a letter from Lou Boccardi, the chief executive of the Associated Press, regarding the AP's plan to introduce AP Adsend last summer. The AP was proposing to use Acrobat as a format in which to deliver digital advertising. Cooper opened his Acrobat package and began experimenting. Within an hour Cooper knew he had a solution to a nagging problem.
Ad, ad, who's got the ad? How long do you keep an ad available on disk? Cooper said the display ad sales representatives were "pushing management about getting access to old ads." The newspaper group had purchased a 2 gigabyte hard drive to hold old ads "to see how long we could keep them," said Cooper. "If you keep [an ad] for 30 days," said Cooper, the ad sales reps "want 60 days. If you keep it for 60 days, they want 90." Further, Cooper had been experimenting with keeping the whole page in PostScript format. Though saving pages in binary rather than ASCII had saved some space, the PostScript files were just too big. "We were constantly looking for smaller" files, said Cooper. "The reality is that storage is a problem, whether it's paper, negative or in the computer." Then the AP started talking about Acrobat and Cooper had found his solution -- except there were a number of problems with Version 1 of Acrobat. It didn't handle font embedding well and the search capabilities were tied to Version 2, which wasn't scheduled for release until fall 1994. But Cooper was persistent with Adobe and his local computer supplier, and got an early release of Version 2. "We got 2.0 two months before the AP got it," he said. "We installed it and we immediately realized it not only met our expectations, but exceeded them," said Cooper. The next step was to integrate Acrobat into the pagination operation. Alameda Newspapers uses the Harris XP-21 pagination server that's based on a Sun SPARCstation to handle page elements and output PostScript code to four Harlequin RIPs through Output Manager, a routing and tracking application from Information International Inc. of Los Angeles. Cooper elected to have the XP-21 create a second PostScript file, which was shunted off to another directory in the Sun. Unfortunately, the file names generated by the XP-21 were meaningless (and didn't conform to Windows naming conventions). Cooper wrote a small UNIX shell script that extracts the page and edition information out of the PostScript header and appends it to the UNIX file name. At this point, one of Cooper's assistants sits down at the Sun terminal and manually renames each file (though Cooper anticipates Harris will be able to help him automate this task). The Acrobat Distiller program runs on a Pentium 90 MHz PC ("We determined a '486 wouldn't do it") with 32 megabytes of RAM. The Pentium sits on the same network as the Sun; using a program called PC/TCP from FTP Software of North Andover, Mass., the files are sent to the Pentium's directory using File Transfer Protocol. Once the files enter the Pentium machine, they are distilled quite quickly. "The Distiller runs about six to eight hours," said Cooper, to handle a normal day's 200 pages (there are few common pages among the five newspapers, excepting classified). The resulting PDF files are moved to a directory monitored by Acrobat Cataloger; the program runs on a standard '486 66 MHz PC with 16 megabytes of RAM. "Actually, we just recycled an old PC," said Cooper. It takes about 30 minutes to index every word of an entire day's 200-plus pages. But even compressed and indexed, the files take up quite a bit of space. What to do with them? "CDs seemed like a good way to store the files," said Cooper. "If you believe Kodak," he said, the theoretical shelf life of a CD-ROM is 100 years. Alameda Newspapers acquired a CD recorder from Phillips and has been getting the best blank CDs it can find for about $10.50 each, in lots of 100. The newspaper group gets about six days of all five titles on a CD; a backup CD is made and put in a secure location. The papers started saving indexed PDF files of every page on Dec. 7, 1994. But then they found a problem. With all the text on a page indexed, it was difficult, if not impossible, to find a specific ad. Then Cooper realized that if all the text on a page was indexed, why not print the ad number in 5-point type in the ad's margin? This solved the problem of finding a specific ad, and at this point, no advertiser has complained. "The retail world lives by numbers," said Cooper. "Maybe they just understand."
OK, now what? Part of the beauty of PDF is that it is just a different version of PostScript. Adobe has developed a "plug-in" for its Illustrator 5.5 drawing package that will allow you to open a PDF file (Macromedia's Freehand is scheduled to have a similar plug-in available this summer). Cooper says the process of recreating the ad is relatively simple: search for the ad number in Exchange and once the page is found, open it in Illustrator. Then crop the ad out and save it in the Illustrator format, where it can be handled in a normal fashion (images are dropped into a folder or directory as EPS files in the Photoshop format, so they too can be manipulated as necessary). Cooper's "only concern" with this process is that it can take quite a while to open up an ad in Illustrator if there are lots of images. "It's extremely slow if there are a lot of bit maps in the file," said Cooper. The next step in the process will be to buy some sort of system that will support large numbers of CD-ROMs on-line at the same time. "The technology is out there and it's just a matter of doing the research and making the right choice," said Cooper. And once multiple CD-ROMs are mounted on the network, anyone with the Exchange software can do a sophisticated search and anyone with the free Reader software can do a "find" (similar to the "find" command in a word processor). Does this mean that the newsroom will then have access to the full-page text of the paper? Yep. Though Alameda Newspapers doesn't have a history of newsroom libraries, when the group bought the Oakland Tribune in 1993, it inherited that paper's clip-and-file library and the librarians who maintained it. But before Cooper unleashes the Acrobat archive to the newsroom masses, he has one other thing in the budget: "I have to have a person on staff" to maintain the system, he said.
What's it all mean? Certainly a key missing element is the page renaming function, but since Cooper has had numerous inquiries about his process from existing Harris customers, that problem probably will be addressed soon. The bottom line here is that because the Alameda Newspaper Group handles all of its pages in PostScript, it has inexpensively -- Cooper said the whole thing "cost less than $10,000" -- developed a display ad archiving system that with a little tweaking will be a fully functional editorial library system. Missing from Cooper's setup are the niceties that traditional electronic libraries provide: the attachment of additional terms (known as keywords) and the identification of elements (byline, headline, dateline, etc.). Cooper postulates that such element flagging could be easily handled by embedding these codes (maybe in Standardized Generalized Markup Language?) into the pagination system's typesetting formats and setting them as "white type," making them invisible on the printed page but available for indexing in the PDF file. Another problem is that Acrobat Search deals with whole pages. If you type "Clinton and Dole" into a search prompt, the system will retrieve all instances of those two words on the same page, not necessarily in the same story. But for the base cost of a system such as this, a newspaper could invest quite a bit of money in programming to overcome these shortcomings and still come out far ahead. -- dmc
FTP Software, Also: The portability circus: a family of Acrobats From THE COLE PAPERS, June 1995, Copyright © 1995, All Rights Reserved. |
|
Top |
ColeGroup.com |
Consulting |
Cole Papers |
NewsInc. |
Cole's Store |
Miscellanea |
Search Copyright © 1990-2012, The Cole Group. All Rights Reserved. Contact us. Modified date: 06/ 7/1995, 11:19:44 PM. URL: http://www.colepapers.net/TCP.archive/Cole_Papers_95/TCP_95_06/ad_archiving.HTML |