The Cole Papers April 2001

Hitting the highlights: Cold North Wind's implementation of an archive with graphic fidelity highlights search terms on the page image.

Newspapers get graphically accurate digital archiving

Most newspapers are only able to provide searchable on-line content for the last 10-20 years and can provide virtually no access -- or only limited access -- to older archives. As it stands right now, the only method of finding a specific newspaper article from several decades ago means going the time-consuming route of traveling to the library (whether that library is in the newspaper's building or in a public building) and search the publication on microfilm.

Historians are sanguine about the task. "You begin to learn which pages papers use for specific types of articles," said Bruce MacGregor, an amateur historian who has recently completed a 600-page book on 19th-century West Coast railroad-car building for Stanford University Press.

Newspapers have been trying to find ways to make their back issues available on-line to readers for the last several years (see The Cole Papers, December 2000). The popularization of digitizing archives has become something of a fad in recent months, with the introduction of on-line information services from companies like Cold North Wind and Bell & Howell.

These services enable the reader to view the information in context as it was originally published in previous centuries in an effort to provide complete access to information.

At the beginning of this year, Bell & Howell Information and Learning, an Ann Arbor, Mich.-based division of the Skokie, Ill.-based Bell & Howell Co., announced the product launch of its digital archiving system.

The Historical Newspapers project enables publications to digitize their microfilm from the beginning of the 19th-century to the most current issue. This archive digitally reproduces each issue in its entirety from cover to cover, including not only the news stories and editorials but also the photos, advertisements and graphics.

Similar to searching a "traditional" newspaper database -- in the past known as an electronic library -- a user may define a search by entering in a specific keyword, date, author's name or article type. Search results include the metadata, with date, page number and the author's name.

In order for the reader to view the text, a click on the article enables the entire image to be displayed. One of the biggest benefits of these new archives is the fact users will be able to display full-page representations of any specific page of a newspaper while conducting a search. These databases are also easily brows-able by issue, which allows readers to leaf through the digital issue exactly as they would a print issue.

Monitor and Bell & Howell
The Christian Science Monitor just announced its plans to digitize all of the back issues by utilizing Bell & Howell's ProQuest service. Although the system has not yet been implemented, plans call for digitizing nearly one million pages of back issues to be distributed to libraries worldwide.

"We have been doing business with [Bell & Howell] for years, but the ProQuest service digitization contract is now in effect with them," said Steve Gray, managing publisher for the Monitor. "They take the master microfilm and run it through 'optical blade' equipment and shoot a digital image of each page. This is a highly automated process because each frame has to be checked to be sure it is centered properly."

The Monitor believes this digitization process will prove extremely beneficial to historical researchers.

"Bell & Howell's ProQuest system will take an optical character reader [OCR] scan and turn it into what they call 'dirty ASCII' text," Gray said, using to the acronym synonymous with plain text. "This simply means that the text that is searched by a searching tool goes and identifies the specific hits. It then queries up a digital image so a user can view an article on a page-by-page basis, article basis, area of a specific page, etc. We were looking for a way to bring our 92-year history to a single database, and through this ProQuest system we hope to be able to provide just that."

The Monitor's first issue appeared in 1908 and the ProQuest database will bring the entire publication into a digital form. Even though the publication's articles have been available on-line for the last several years through the paper's web site and other electronic databases, the ProQuest project will offer the digital page images for the first time.

"We are unlocking the resources available to us on microfilm," Gray said. "When we were looking at how much this would cost for us to digitize these issues dating back to 1902, we found the cost to be approximately $2-3 million. But Bell & Howell's system is provided to us at no cost to us."

Gray said that the new system would provide an "added dimension" to the publishing process. "Although we had been providing public archives since 1980, we wanted to digitize our publications dating back to 1908, but we were not sure how we were going to afford it."

The Monitor is one of the first publications being offered through Bell & Howell's Historical Newspapers project, which, in turn, will be sold as an annual subscription to libraries and educational institutions.

Over the last decade, the news business has realized a dramatic increase in growth in the distribution of on-line news and recent archives. Because of this, more and more publishers realize the growing demand for historical news content.

"Newspapers are realizing this as a way to create unprecedented access to historical information," said Tina Creguer, director of communications for Bell & Howell's Information and Learning group. "The Historical Newspapers project is the largest one of its kind right now. We are starting with national papers and expect hundreds of newspapers to become involved. We will eventually target regional and international papers as well."

Over the next few years, ProQuest will deliver more than 4000 full-text publications, with some dating back to 1475. The intelligent document linking capability highlights people, places and companies in full-text articles. The databases can be tied into a library's full-text search resources. Once a search is performed, the information is delivered to the reader and can be printed, saved and viewed on screen and even e-mailed.

"We have had relations with these publications already because we have already completed their microfilms so now we just have to take the microfilm and convert it into a digital format," Creguer added. "The film goes through a special digitization operation with scanning machine, which creates a high-quality image. Each article is then indexed to make a search more efficient."

Cold North archives
Like Bell & Howell, Cold North Wind Inc. (CNW) has a similar system to digitize and publish microfilm archives and make them available on the Internet.

"We have 71 papers now under construction for digitization," said R.J. Huggins, president and chief executive officer of the Toronto-based company. "We work from microfilm and use digital optical character recognition to publish to the Web."

Huggins said that CNW is currently processing "50,000 page images per day and we expect to be up to 200,000 images shortly."

CNW's digital imaging process creates and publishes the entire images of every page of a newspaper, not just the top headlines.

"The images are scanned into TIFF [tagged-image format files] images, which are then sent to the software to detect any problems and clean up the image for display," Huggins added. "We work solely with newspapers on a B2B [business-to-business] basis to offer a faster process. This system has enabled us to bring digital collections on-line for those looking for more content."

The New York Post, the oldest continuously published daily newspaper in the United States, recently signed an agreement with CNW to digitize the publication's archives. Readers will be able to search and receive the results in an image of the original printed format. CNW will bring the paper's 200-year-old archive to a digital form.

CNW offers the ability to digitize archived stories using complete OCR to support a searchable archive. Building the digital archive involves scanning the original second-generation silver halide microfilm negatives that create high-quality digital images of the original page format in TIFF and provides an e-commerce platform for selling access to the digitized content.

The Paper of Record project includes birth and death notices, editorial and opinion pieces, and advertisements and want ads, in addition to the headlines that announced the events of the day.

Over the next year, Bell & Howell will include CNW's Canadian Library Association Collection in its offering of historical newspapers. CNW has licensed Bell & Howell to distribute products exclusively to the education market (K-12 schools, colleges, universities and libraries) and non-exclusively in all other markets worldwide.

CNW has also entered into an agreement with Torstar Media Corp. to digitize newspaper archives of the Toronto Star, dating back to 1892.

CNW's Paper of Record is an Internet-based collection of searchable images of newspaper pages dating back to the 1700s.

Three-and-a-half million pages
These new digitizing systems are enabling newspapers to make their complete archives an additional revenue source. In addition to transforming the way people research, the papers are realizing the growing demand for people to have this archived database at their fingertips.

Through this digital process, researchers and even genealogists are now able to use the Internet to access original copies of archived issues of many daily newspapers.

The New York Times has been working with Bell & Howell for the last 17 years to produce the Times on microfilm and is now under agreement to reproduce every issue, from cover-to-cover, dating from 1851 to 1998. Bell & Howell will offer subscriptions to the full text in ASCII format dating from Jan. 1, 1999 to the present. This agreement allows for Bell & Howell to digitize the back file of the nation's newspaper -- nearly 3.5 million pages in total -- and to distribute the resulting database to educational institutions and libraries throughout the world.

Martin Nisenholtz, chief executive officer of New York Times Digital, the Internet division of the Times Co., said in a recent press release, "This digitization project promises to increase the availability and accessibility of the New York Times archive to scholars and researchers of all types. In addition, we look forward to enhancing our web sites by utilizing various elements of the digital archive in a number of different ways."

The Wall Street Journal is also on top of this growing need to digitize. Dow Jones & Co. recently signed up as part of Bell & Howell's ProQuest Historical Newspapers project to digitize the Journal. The digital project plans for this publication include bringing nearly 1.7 million print pages to make full-page images available in a digital format.

"The ProQuest system is a way to give a different approach to accessing information in its original context," Bell & Howell's Creguer added. "Readers are able to view specific stories of the day during a particular year. It's also interesting to see the different typefaces that reflect the era."

The Journal's electronic database will be released in segments covering 10 years each. As one of the first publications to sign an agreement with Bell & Howell, the Journal is working to have the entire digitization process completed in just 15 months with monthly releases already beginning.

The completed database, which will be sold as an annual subscription to libraries and schools, will feature full-text search with article and full-page images display format.

The Journal will take part in Bell & Howell's Digital Vault initiative to reproduce each issue in its entirety.

In the second release of the service, scheduled for June, the Journal's database will be completely brows-able by issue, covering the years 1889-1986. The entire back file will also be available in microform as well as ASCII full-text from 1986 to the present via ProQuest.

-- Kellie K. Speed, kkspeed@colepapers.net

Bell & Howell Co.,
(734) 761-4700;
Cold North Wind Inc.,
(613) 722-9886.

From THE COLE PAPERS, April 2001, Copyright © 2001, All Rights Reserved.

Top | ColeGroup.com | Consulting | Cole Papers | NewsInc. | Cole's Store | Miscellanea | Search
Copyright © 1990-2010, The Cole Group. All Rights Reserved. Contact us.
Modified date: 07/22/2002, 11:42:49 AM.
URL: http://www.colepapers.net/tcp.archive/cole_papers_01/TCP_01_04/microfilm.html