The Cole Papers

The third Merlin: The image enhancement screen from Merlin 3, which has an Internet search system.

Buying bylines: The Lexis-Nexis products offer byline management, to prevent accidental posting.








Supplier reaction to Intranet connectivity runs the gamut

The users:

Hey, suppose we came up with a way to provide an internal Internet that people from any platform could use to get a look at the photo archive, the text archive, the company Rolodex, stuff like that.

Since everything would use the same Web browser as we use for the Internet, we wouldn't have to spend megabucks buying proprietary client software to search archives -- it'd be built in, easy to use, and cheap.

The suppliers:

Great. Just great.



Hey, it's no fun being a supplier these days, though as we've said before, maybe it's never been.

The latest (though by no means final) insult is the Intranet, the internal information-sharing network that's now all the rage among computer types in the Real World, with the Newspaper World not far behind.

Intranets, in the land of open platforms and off-the-shelf terminals, promise to be the all-purpose workgroup vehicle journalists always wanted but never had -- one-stop shopping for electronic mail, photo, text and page archive searches, phone books, addresses of co-workers and sources.

And a mouse-click away, on the other side of the firewall, is the Internet itself, so you can look at dirty pictures between deadlines.

Everything is just a cut-and-paste away from the front-end system on which you write and edit.

In fact, it's but a short step from the present Intranet to a new model that takes in wire, has hooks into a production database and otherwise supplants any current editorial front-end system.

(And as always, The Cole Papers will be glad to hold your coat while you build the first one, then critique the hell out of it when you're done.)

But back in the present, suppliers of text and image databases are scrambling to be sure they're not the only one on the NEXPO floor in Las Vegas next month without an Intranet browser.

Late in April, we called around the country, looking for new developments in archiving and checking on who was naughty and nice in the Intranet department.

We found reactions running the gamut, from enthusiasm to acceptance to dogged refusal to give up their proprietary search clients, at least not yet.

The Associated Press
"We got caught with our pants down, just like everybody else," admitted Fady Khairallah, director of research and development for the New York-based news cooperative.

But it turns out AP was moving its Preserver archive clients away from an Informix SQL search engine to a new PLS (Personal Library Software) engine that already had browser support, "so we were able to kill two birds with one stone."

The new AP Preserver is making the leap from an image archiving system to a full-blown multimedia system, which will avoid the slowness of SQL databases by tying together different media databases to one user interface.

"Text, photos, audio or graphics all keep information in different places," Khairallah said. "The challenge was to present a single face to the person searching it."

So the AP utilized four of the Journalism 101 questions: Who, what, where and when.

"We went into a virtual mapping of those four categories," Khairallah said, pointing data to the browser from different information types: photo captions, text, audio clips and the like.

The result is that the user now can do a "who" search, for example, on Hillary Rodham Clinton, and call out pictures, video, audio, text or pages. Since PLS supports relevance ranking, you can get a rating of how close to the search each citation is.

And a "related items" search could also call up Whitewater, Rose Law Firm and cattle futures.

The result is fast, bells-and-whistles text searching that looks to the user like a multimedia search.

The disparate databases can be on one server, different servers or in different locations connected by a WAN (that pesky Intranet again), Khairallah said, but they all have to be Preserver -- AP can tie together only its own databases.

The Intranet change, so far, has been a good one, Khairallah said.

"We're so happy, we're abandoning the Mac interface. We still offering the Windows interface, but we're concentrating on the Web interface, because of the hot demand and because you get more bang for the buck."

An example is the Java-enabled lightbox feature, which AP has just completed and will show at NEXPO June 15-19.

You can search for images, add them to the lightbox like food into a shopping cart, and then go back for more. You can have public or private lightboxes, and if you're a librarian, you could create personal lightboxes for specific projects or users.

The whole browser concept is about to get a big workout as well -- at press time, the AP was set to put its photo library on-line both for members and commercial customers (with the appropriate access privileges controlled by user profiles).

That's 150,000 pictures, 30,000 of them historical. And just after that, the internal text archive will go public, also running the new software. It's already being used by AP in-house, and Khairallah reports between 8000 and 15,000 hits a day.

Iota USA Inc.
This Israeli company, which has been profiled in these pages to a fare-thee-well because of its unique ability to search TIF files, is finally developing some marketing muscle in the United States, resulting in the appearance of three products:

  • MyDesk, an entry-level product that could cost "under 50 bucks," according to Gérard Lelièvre-Laferté, Iota USA's president. Iota is negotiating with several scanner manufacturers, among them Visioneer, maker of the popular PaperPort desktop scanners, to include MyDesk on an OEM basis, Lelièvre-Laferté said.

  • Further up the food chain, the full MyDesk product incorporates SQL support, as well as Microsoft Mail, forms, Boolean logic, proximity, fuzzy-logic searching and something called Apps, which helps index particularly bad-quality documents (think: old microfilm).

    It'll probably cost about $100 on a stand-alone basis, but as a client on a network it could get up to about $600, plus another $4000 to $5000 for the server.

  • At the high end of the in-house product line is Capture Pro, a regular Iota assembly line that runs on a PC running Windows NT and hooks up to a high-volume scanner such as those from Ricoh or Bell & Howell that can capture 3000 images an hour and index them on the fly.

    All of this is allegedly unattended, though experience leads us to suspect that the main attendant will be a service technician. But we'll wait and see.

    This level will set you back about $5000 for the software alone and a gazillion dollars for the hardware.

  • On the Web front, Iota has a problem, trafficking as it does in huge, ungainly TIFF images which tend to ooze, not spurt, through a network. Initiate your Hillary Rodham Clinton search on this baby and then go out for a three-martini lunch.

    But hold on, the ever-crafty Israelis have come up with a solution they call Intersite: Send a Netscape browser search off to an Iota database and you'll get back not a bunch of pages, but a bunch of strips (Headline: Hillary Rodham Clinton Strips on Israeli Computer System!), each containing a hit found by the search engine, extracted from the page and encoded to HTML format.

    That means you'll get citations, in a few lines of context, returned to your screen relatively quickly. When you find the one you want, double-click on it and the full page will start oozing across.

    What you have when you get done is a full page, with jumps, pictures, sidebars, etc., that you can view, print or store on your own machine or server.

    The historic Iota problem persists: Interesting, even potentially watershed technology, but no one is running it out in the field.

    But this might be the year; stay tuned.

    SRA International
    "Everything we're doing these days is web-based," said Frank Roche, media systems marketing director, clearly reading the tea leaves strewn over the shoulder of the Infobahn.

    SRA, which used to cozy up to intelligence agencies before government was reinvented and it ran out of money, has been shopping its data-manipulation services for the last few years to newspapers and other giant warehouses of nouns and verbs.

    This year, Fairfax, Va.-based SRA is particularly proud of four products:

  • Nametag, which reads text and -- without a lookup table or dictionary -- figures out from contextual analysis that "Hillary" is a name and "hilly" describes some Arkansas land. Once identified, the names are tagged, can be boldfaced or otherwise marked up, and indexed.

    So you can put together a "who was in the news this week" list, dynamically, for your web page, for instance. It works with a search engine, but doesn't require one. Think of this weapon in the hands of the New York Times crossword puzzle editor.

  • NetOwl, which eats URLs (addresses of World-Wide Web pages) and indexes the contents of that location for you. Then those tags can be linked to other resources. This is fundamentally what directories such as Yahoo or Magellan do for the larger Internet. Think of being able to do it for your Intranet, however: You have an automatically created phone directory, for example.

    In the outside Internet world, you could feed it a URL from, say, city hall and keep the search engine on your newspaper's web site. The possibilities are interesting.

  • Intellisearch, which is a product that will search across databases indexed by such engines as Conquest, Wais and Fulcrum, essentially becoming a server of servers, capable of displaying different object types such as photos, graphics and text.

  • Integrated Newspaper Library System (Inls), which is a web-based technology for creating archives of images and/or text. It will use the Conquest search engine and is based on the Oracle SQL database, but will support both Boolean and natural language searches.

    One of the unique things SRA claims is its ability to mix apples and oranges -- to use the above tools to pull together, for example, a BRS-based text archive with a Sybase photo archive and make them searchable through the same browser.

    SRA is installed at the Chicago Tribune and U.S. News & World Report, but a couple of big newspapers are in negotiations for the latest software and could be announcing contracts soon. That should provide the industry with a real-world test.

    Gannett Media Technologies International
    This offshoot of the media giant has embraced the Web with open arms.

    "We don't ever expect to write another client again," said Bill Toner, vice president for operations. "This whole [Intranet] thing is fine with us."

    "We're using Java, windowing, putting toolbars along the side of the window, even using a crawler to put messages up" on the Netscape browser, Toner said. "As soon as we can do drag-and-drop, we will."

    The DiGiCol archiving system that Cincinnati-based Gmti sells is growing, too -- in addition to web browser support, DiGiCol (finally) supports Adobe Acrobat's PDF format for storing, indexing and searching full pages.

    But Toner's been talking to the Iota folks, too.

    "We're talking about integrating the Iota indexing utility into our system for the specific things that use bit-mapping, where OCR [optical character recognition] doesn't do a good job.

    "We want to be able to handle hard copy a little more elegantly. With some large drawings, you can't index the text -- Iota will solve that," Toner added.

    He also pointed to an interesting Web advertising function that DiGiCol will be able to support: Say you're running a web site. You sell some ads, store them in the DiGiCol database.

    The archiving system can read the Internet addresses of people calling up your site, determine their locations and send a targeted ad out to the web site, just for that "customer"

    It's kind of an Internet Caller ID program, which should make your newspaper's database marketing people salivate.

    Lexis-Nexis
    The giant database company based in Dayton, Ohio, is really a front for tiny Tribune Solutions, the R&D division of the Morning Tribune, a 27,000-circulation newspaper in Lewiston, Idaho.

    Glenn Cruickshank, manager of Tribune Solutions, takes great delight in describing his firm as a Chevy among a fleet of Lexuses.

    "We started off as a text archive company in the late '80s," Cruickshank said, after developing a system for in-house use. About three years ago, when images needed to be added to the NewsView system, "we already had the Folio engine, which had the ability to display text, pictures, sound and video," Cruickshank said, so it wasn't hard to add photos.

    Now, like everyone else, he's doing web things, but not browsers -- Lexis-Nexis Connections software sets up text and photos for other on-line suppliers, for example.

    This year it's evolved into a "complete drag-and-drop electronic newspaper system." Cruickshank said. "You just move stories around on the page, hit the Build button and it builds the whole paper for you."

    As proof, he offers his own paper's web site: http://www.lmtribune.com.

    Instead of dedicating two to 10 people to process stories for the Web as most papers do, Cruickshank claims one person at the Tribune can place 70 stories a day, and do it in a half-hour.

    The Detroit News, Cruickshank said, is doing the same thing at http://www.detnews.com.

    Another new wrinkle in Connections is byline rights management, which sounds like something anybody could add to a database but nobody did, pre-Cruickshank.

    "When you're sending stories out to other on-line services, you can't just throw everything out there that they want because they may not own it -- if a stringer wrote it, for example." Cruickshank said. "Connections manages that on a byline-by-byline basis."

    Also new this year: a completely re-engineered version of PhotoView that now stores EPS files in addition to TIF, Iptc and JPEG formats.

    Tribune Solutions also has redesigned the interface, and added about 40 fields supported by the SLA (Special Libraries Association) guidelines. Before, there were 18.

    All of this is in the same IBM database that can sit on a small, Chevy-type front-end system and purr along contentedly, Cruickshank said.

    T/One Inc.
    T/One's Merlin photo archiving system is Netscape-ready for anything the 'Net can throw at it, and is hanging onto its old browser at the same time.

    "We see advantages to having both," declared Pete Leabo, director of marketing.

    "The browser is great for the occasional user who just wants to see if the photo exists," said Leabo, "and if it's there retrieving it from the archive. From the user's point of view, it's wonderful."

    But in the library, Leabo sees the dedicated client as the better choice: "When you're doing extensive work with the archive itself, for example, a librarian who is going to be appending keywords and adding published captions, correcting information and doing that quickly and effectively, that's where the client pays off.

    "But we're doing both and will continue to do both," Leabo said.

    New this year is Merlin 3, which is already in production at the Boston Globe and The Record of Kitchener-Waterloo, Ontario.

    It is the first complete redesign of the Merlin application, which is about 2¢ years old, and it is, Leabo said, faster than a speeding bullet.

    Merlin 3 has autopurge and autoprotect features which enable it to be used as a repository for incoming wire photos.

    "Some papers like The Virginian-Pilot [of Norfolk, Va.] were already flowing wire photos into the older software," said Leabo, "but that puts an undue burden on the librarian to manually delete about 300 pictures a day. Now, with autopurge, any image not protected can be purged from the database."

    Similarly, you could set the system to autoprotect any locally scanned images by looking at the credit field, so you'd know you'd never accidently let the system purge them, Leabo said.

    Other new features are the ability to save the results of several searches -- similar to AP's lightbox feature, but to save the searches themselves.

    "You could have a search you'd be repeating regularly -- say, all sports photos of the 49ers in the last six hours, using multiple search criteria," Leabo said.

    "You'd create it once, execute the search, save it as a 'Query Shortcut' and give it a name. Then it becomes a menu item."

    And all of this with faster throughput as well? How did they do it?

    "We were not re-engineering for speed," insists Leabo, "because we have the fastest archiving system anywhere to start with."

    But as they reprogrammed to give the system a nicer user interface and add features, they tightened the code, optimized some of the search routines, and presto!

    "We got a speed increase of at least fourfold and in some cases up to tenfold," Leabo said.

    By NEXPO, he adds, his team may have another goodie: The ability to issue a "Save As" command on a graphic and have the Merlin system save it in separate text, photo and graphics databases, while maintaining the same links to the original.

    -- John Bryan

    The Associated Press,
    (212) 621-1732;
    Gannett Media Technologies International,
    (513) 665-3777,
    e-mail: dzito@gmti.gannett.com,
    http://www.gmti.com/;
    Iota USA Inc.,
    (203) 227-5602,
    e-mail: glellevre@aol.com,
    http://www.iota.col.il/;
    Lexis-Nexis,
    (800) 227-9597, Ext. 1819;
    SRA International,
    (703) 803-1500,
    e-mail: frank_roche@sra.com,
    http://www.sra.com/;
    T/One Inc.,
    (617) 328-6645, e-mail: pbl@crl.com, http://www.t-1.com/.

    From THE COLE PAPERS, May 1996, Copyright © 1996, All Rights Reserved.

  • Top | ColeGroup.com | Consulting | Cole Papers | NewsInc. | Cole's Store | Miscellanea | Search
    Copyright © 1990-2012, The Cole Group. All Rights Reserved. Contact us.
    Modified date: 05/ 2/1996, 6:52:36 PM.
    URL: http://www.colepapers.net/TCP.archive/Cole_Papers_96/TCP_96_05/Intranet.HTML