|
|
|
Librarian mode: After material has been imported into Digital Collections, an "archivist template" can be called up so that librarians can add descriptive information, such as source or copyright information. Gannett picks and deploys a text, image and page archive(When Gannett -- the largest newspaper publisher in the United States with almost 100 newspapers, now that it has acquired Multimedia -- decides to implement new technology at most of its properties, it's news. (As I was pondering how to cover Gannett's move to the Digital Collections PaperDesk multimedia archiving system, I ran across an article by former Gannett Advanced Systems Lab Technology Editor Kerry Northrup, who left Gannett last summer to join IFRA -- the INCA-FIEJ Research Association, which represents newspaper publishers worldwide. (Northrup's article appeared in the October 1995 issue of IFRA's newspaper techniques. His unique perspective on the choice and initial implementations of PaperDesk made the article well worth reprinting, which I do here with IFRA's permission.) The cries of anguished photojournalists started even before their new electronic picture desks were being plugged in and switched on throughout Gannett newsrooms in 1989. How, they asked in warning, are we going to preserve our pictures? It was a simple question that started the largest newspaper publisher in the United States on a four-year, worldwide technology quest. The result is now or soon will be put to use at a majority of Gannett's nearly 100 newspapers. That result is Digital Collections Systems/NA, a fully integrated digital multimedia archive and information workflow system based on the PaperDesk software developed by Digital Collections of Hamburg, Germany. Gannett executives expect Digital Collections Systems/NA to be a keystone of their newsrooms' technology and revenue well into the 21st Century. In fact, their expectations are so strong that they purchased a minority holding in the German firm and are setting up their own North American marketing and development operation for it.
Photo desk storage too limited
In this case, using bits and bytes rather than paper and chemicals to create photographs doesn't leave behind the easily filed "hard copies" that for decades have been stuffed into manila folders and row after row of storage drawers at virtually every newspaper in the world. But electronic picture desks designed by The Associated Press and Leaf Systems Inc. of Southborough, Mass., were not intended to provide long-term, high-volume storage. Originally, a single AP-Leaf server could hold no more than about 400 photos. That number has increased to about 1600 today, with larger hard drives and compression boards installed. But still, most of a server's storage space at any one paper is needed just to accommodate a daily satellite-delivered photo feed that can number as many as 400 photos. Handling more than one day's images at a time, holding onto advance photos often transmitted by a wire service a week ahead of their publication dates, and accommodating a newspaper's locally scanned photos, the AP server has little if any room left for protecting older electronic picture files so that they will not be routinely deleted when the next newly arriving photo demands space on the drive. The AP server is also an extremely closed system, operating on a dedicated fiber optic token-ring network which is difficult to access externally. The aged server is well known at newspapers for its under-powered central processors, easily corrupted software and other reliability issues. (Incidentally, the AP is in the process of developing a new server system.)
Newsroom archiving requirements pervasive
First, the lab checked all available archive and image cataloging systems in the U.S. market, several of which were in use at individual Gannett sites. None proved flexible enough to meet the workflow requirements for all newsrooms, which are not one-size-fits-all environments. None was robust enough to satisfy the demands of daily newspapers, at least not right off the shelf. Several of the popular image catalogers of the day had potential, such as Fetch from then Aldus, now Adobe Systems of Mountain View, Calif., and Search from Creator-developer Multi-Ad Services Inc. of Peoria, Ill. They had easy-to-use interfaces and handled all the file types with which a newsroom usually dealt. But they needed to be more -- more multi-user, more client-server, more cross-platform, more robust. Most significantly, the lab's inquiry pointed to the fact that the need for archiving in most newsrooms reaches beyond just photographers and their pictures. For instance, most Fetch users are in graphics departments, and most reporters want to be able to search full-text digital story libraries. So the lab set out to determine definitively just what the American newspaper industry needed and could use in a newsroom archive system. The plan was to work with Apple Computer and one of its development partners -- RWD Technologies of Columbia, Md. -- to generate a specific, all-encompassing list of requirements and then see who might want to build the perfect newspaper archive, either from scratch or through changes and additions to an existing product. The engineers' study confirmed the lab's findings -- newspaper archiving needs are pervasive, yet similar across all editorial departments. From a functional standpoint, there is almost no difference in how an archive needs to handle photos, graphics, text, video, sound or any other file type with which journalists might want to work. The conclusion: Why implement a single-purpose archive when a newspaper can have a multipurpose system for the same effort?
The 'ideal' archive -- specifying it ...
The search interface needs to be simple, since the primary users of the archive will be editors and reporters who are not necessarily trained in library sciences or data research techniques. While comprehensive search functions should be available, most users tend to employ basic word-lookup without complicated Boolean constructions.
Since all front-end system purchases and expansions in future years should involve replacing dumb terminals with off-the-shelf PCs, it was reasoned that eventually this issue would go away.
However, the Advanced Systems Lab had found that SQL databases usually run into a performance wall as they grow in a full-text, multimedia environment like the one envisioned for future newsrooms. So, instead of demanding a nonproprietary archive, Gannett would require only that it be open in the sense of allowing adequate interaction with external sources such as the AP-Leaf server and pagination programs.
... and selecting it
Most of their products had been designed to handle only text or photos, not both. So Gannett decided to cross the ocean in its search, renewing contacts with Digital Collections of Hamburg, Germany. It was a small firm that the lab had tracked through Drupa and IFRA expos over the years as it developed and marketed its PaperDesk archive system. Digital Collections has many installations in Europe and is being adopted by the German Press Association (DPA), among other national news agencies. Digital Collections was invited to make a presentation to Gannett corporate executives as well as to a specially convened committee of editors, photographers, systems managers and librarians from various Gannett newspapers. The result was a contract for at least 50 installations, the first in North America. PaperDesk had been developed originally under the NeXTStep operating system but was ported to a variety of UNIX platforms when NeXT's fortunes declined. The full-text proprietary database holds stories, photos, graphics, video, audio and full-page images RIPped from Quark XPress files, and offers a single interface for searching any or all file types. PaperDesk also offers HyperText-like links between files, drag-and-drop placement of text and photos into a Macintosh Quark XPress page, and a variety of photo workflow management tools so that it can be used in the daily newspaper production cycle rather than just for post-production archiving. Client access software is available for Macintosh and Windows PCs, and a World-Wide Web interface is nearing completion. Gannett expects the web interface to all but replace the computer-specific client software and eliminate most cross-platform headaches. Through a Digital Collections-developed Quark XPress XTension, text can be archived directly from an XPress page on the Mac. Otherwise, the software accepts any file that can reach its UNIX directories either through network connections and directory-mounting software, or through a Xylogics Communications Annex. For instance, a photo editor can send a locally scanned picture to the archive by copying it into a folder on his or her Macintosh computer, where the folder actually represents a holding directory on the archive server. In another case, a proprietary front-end editorial system can be set up to automatically output a copy of every file that reaches its typesetter post-set queue through a standard serial (printer) connection to the Xylogics, which then routes the incoming text feed to an archive holding directory. Similarly, PaperDesk can handle direct feeds off any number of wire service text or photo modems through the communications annex, which has led some Gannett editors to use the system as a high-capacity wire browser that is able to retain several weeks or even months of raw wire feed at a time. To deal with the proprietary AP photo feed in the United States, Gannett uses a Gpib-equipped Macintosh and the PhotoWeb software package from Tecmark to transfer files off the Leaf server to the standard EtherNet network and then to the archive's directories. After that, specialized import filters can automatically index the photos by reading their embedded Iptc header data. Newspaper photo editors are urged to add Iptc headers to locally scanned photos to take advantage of the same automatic indexing.
New York newspaper hosts prototype
This was to be more than a prototype for follow-on installations. Extensive work was needed to convert the software, dictionary, thesaurus and stop-word list to American English, and to configure the system to work with American newspaper production techniques. Gannett used the installation to develop expertise in writing import and export filters so PaperDesk could interact with Atex and SII systems, AP modems and a variety of other file sources. Digital Collections programmers from Germany spent many months in Rochester during 1994 working with Gannett and integration specialists from RWD Technologies, the same company that had been involved with the Apple-run archive needs assessment. The Rochester installation was also used to test and determine the appropriate class of server for installations at newspapers of various size and archiving demands. Hewlett Packard of Atlanta won the contract to provide servers for the Gannett installations, and the Advanced Systems Lab eventually pared the various server and other hardware/software options down to two standard packages it would offer Gannett sites. Generally, Gannett installs PaperDesk on an HP Series 9000 Model 800 server of a class appropriate for the newspaper involved. The baseline configuration for the server as Gannett orders it includes a 20 gigabyte disk farm that is set up to provide data redundancy, making its effective storage space 10 gigabytes. After system and application software installation, about eight gigabytes remain for archive storage. The number of stories, photos, graphics and page images that can be kept on the hard disk depends on the mix the newspaper wants to introduce. As photo, graphic and page partitions fill up, older files are written off to recordable CD-ROMs. The CD-ROMs are then loaded into a 100-platter CD-ROM jukebox, which is connected directly to and controlled by the archive server. After the CD-ROM is verified by the system, the original files are automatically erased from the hard disk to free up space for new files. On the other hand, searchable text -- including all stories, all text associated with photos and graphics, and all the archive indexes -- is always kept on the hard disk to facilitate speedy searches. PaperDesk also keeps thumbnails and low-resolution representations of photos on the hard disk, since these appear in search results. An editor is presented with the low-resolution image on screen for cropping and editing without any delay. Then, when the editing is done and the editor moves on to other tasks, the archive software retrieves the high-resolution original from wherever it is, applies the edits and outputs a new high-resolution file to an imagesetter, OPI or networked directory.
Storage capacity continually increases
Gannett newspapers are told to budget for expansion of their hard disk farms by four to eight gigabytes (two to four gigabytes effective storage after data redundancy) every one to two years. Operations the size of the one in Rochester burn a new CD-ROM to free up photo and page image space on its hard disk about every month. When the CD-ROM jukebox reaches its limit, additional and even larger capacity jukeboxes can be daisy-chained on the system. To master the CD-ROMs, a standard Gannett installation includes a Kodak CD-ROM writer and a simple drag-and-drop third-party CD-ROM authoring application that runs on a Macintosh. A PC configured to the demands of running the Display PostScript-oriented NeXTStep operating system is included to RIP Quark or other page layout files into the bitmapped page images -- complete with photos and graphics -- that are compressed and stored in the archive. Depending on a site's existing computer network, a Gannett installation will include routers and other networking gear. Though the Digital Collections system has not been found to cause an inordinate increase in network traffic, Gannett recommends that newspapers isolate on a separate network segment the archive server, the communications annex, the AP-Leaf transfer Macintosh and the page-RIPping PC. A Gannett installation configuration is rounded out with an appropriate number of Macintosh and/or Windows-PC computers, and with both high- and low-resolution printers. The computers are used as archive access workstations by newsroom staffers who do not already have access to a computer able to run a PaperDesk client application. Usually this means the reporters, since editors often already work on Macs or PCs for pagination or photo selection and editing. Macs with larger monitors are provided for the archivists in the library. While there are methods for bringing text and images back into the newspaper's production system from the archive, usually an editor or reporter simply wants to print out the results of a search for later reference, so often newsrooms are provided with 600-dpi printers to make photo proofs. Despite the efforts Gannett put into standardizing its PaperDesk installations, every newspaper is different and has to be configured individually. A preinstallation on-site evaluation determines exact equipment requirements. More importantly, the evaluation includes a thorough review of newsroom workflows and organization. Changes are recommended to help the newspaper get the most out of its new digital capabilities. Dealing with the organizational and management aspects of the new technology quite often is more of a challenge than resolving technical issues. So far, Gannett has installed the archive at these newspapers: News Journal, Wilmington, Del. (127,000 circulation); News-Press, Fort Myers, Fla. (95,000); Florida Today, Melbourne, Fla. (85,000); Honolulu Advertiser (101,000), which uses the system jointly with the non-Gannett Honolulu Star-Bulletin (83,000); Des Moines (Iowa) Register (188,000); Courier-Journal, Louisville, Ky. (236,000); The Times, Shreveport, La. (81,000); Clarion-Ledger, Jackson, Miss. (109,000); Democrat and Chronicle/Times Union, Rochester, N.Y. (210,000 combined circulation); Cincinnati Enquirer (201,000), and The Tennessean, Nashville, Tenn. (142,000), which uses the system jointly with the non-Gannett Nashville Banner (57,000).
Payback seen more in the future
In the near-term, the multimedia archive is saving some production costs -- for instance, by reducing the need to rescan mug shots for republication, and eliminating the paper supply costs for AP DataPhoto Picture Receivers some sites still use to print photos for filing. But most of the immediate payback is claimed in the form of enhanced staff productivity and quality improvements in the news report, both of which are hard to measure. So far, no Gannett newspaper has asked for additional librarians to deal with some of the new responsibilities of a digital archive. But several sites whose library staff numbers only one or two people are concerned, and say they will carefully monitor their libraries' workloads for future budget requests. On down the digital archiving road, the installations have the potential for realizing new revenues at the newspapers. Since PaperDesk operates on a TCP/IP network internal to the newspaper and since Digital Collections is nearing completion of a World-Wide Web interface to the database, the infrastructure will be in place at Gannett's PaperDesk sites for putting the newspapers' archives on the Internet and possibly selling access, once various legal and copyright issues are resolved. In Shreveport, The Times has already launched a limited experiment in offering modems and copies of the Macintosh client software to local schools so that students can access the paper's news files, photos and pages. Uniformity in the archiving system used at Gannett newspapers, combined with the Internet access, also raises the potential for linking all Gannett sites into one, nation-spanning information resource. From the view of an individual newspaper, a PaperDesk installation today could be a major factor in the editorial system expansion or replacement decisions of tomorrow. Since PaperDesk incorporates interfaces to OPIs and scanners, and provides a customizable workflow management environment, about the only thing needed to turn it from an archive into a multimedia-capable front-end system is revision text handling. Digital Collections, headed by some of the original developers of the P.INK editorial system, has not indicated it has any plans to expand its product to the level of a full editorial system, but Gannett has noted the potential. All this is not to say that Gannett could not find a way to make money out of the PaperDesk archive right off the bat. It has purchased a minority stake in Digital Collections and set up a new subsidiary, called Digital Collections Systems/NA, to co-develop, market and support Digital Collections installations in North America, both inside and outside the Gannett group itself. Digital Collections Systems/NA has been made part of Gannett Media Technologies International in Cincinnati, a new Gannett arm charged with generating revenue by marketing a variety of Gannett technology initiatives. In addition to installation and support specialists, Gmti will have several programmers on staff who will work hand-in-hand with the Hamburg-based Digital Collections developers to refine, update and enhance PaperDesk. -- Kerry J. Northrup
Gannett Media Technologies International, From THE COLE PAPERS, December 1995, Copyright © 1995, All Rights Reserved. |
|
Top |
ColeGroup.com |
Consulting |
Cole Papers |
NewsInc. |
Cole's Store |
Miscellanea |
Search Copyright © 1990-2012, The Cole Group. All Rights Reserved. Contact us. Modified date: 12/ 9/1995, 1:17:34 AM. URL: http://www.colepapers.net/TCP.archive/Cole_Papers_95/TCP_95_12/Gannett.HTML |