The Cole Papers

Dream pages: At the News-Sentinel, users configure a process for batch preparation of images for the Web. Once ready, those images and their captions can be dropped from a Macromedia Dreamweaver library onto an HTML article page.

Digging into processes, uses and technology of shovelware

If you've been a newspaper's on-line manager for more than a year, you may recall the days when it was fun to bash "shovelware."

That's the impolite term applied to both the craft and technology of moving print-edition content, largely unchanged, onto the Internet.

Check your notes from any Connections or Interactive Newspapers trade show from, say, 1996 or 1997. Remember the podium pundits' ruminations, including:

"People expect more from your site than just shovelware."

Or:

"If all you're doing is shovelware, you're not taking advantage of the medium."

Both are understandable comments. Not all newspaper content plays well on the Web; for example, some longer stories beg to be redeveloped into non-linear components. And a typical once-a-day print edition cycle pretty much misses the point of a medium that's primed to deliver instantaneous updates.

But follow the logic of those two remarks carefully. Both imply that a newspaper's site begins with shovelware, even as they poke at its shortcomings.

In other words, the pundits said shovelware isn't good enough, but they never said shovelware isn't good.

Besides, it's easy to want more than shovelware if you already have it -- that is, if you happen to run a site where simple-to-use technology provides automatic, reliable file transfers, filtering, sorting, reformatting, image handling and packaging of content from the print publishing systems to the on-line serving systems.

If you're one of those people, you are free to leave the room. Other on-line managers -- in large organizations and small alike -- still struggle with one or more of those degrees of automation. They might be thrilled with having shovelware, if only to put it down among friends.

You see, site managers no longer feel the need to hate shovelware, but they sure hate to admit they need it.

Big site, big shovel
At Tribune Interactive, the new media division of Tribune Co. of Chicago, the importance of having newspaper content available as a web site foundation isn't lost on site developers.

TI produces chicagotribune.com, the web site of the 674,000-circulation Chicago Tribune, among other Internet businesses. The site offers the full editorial text of the newspaper as a launching pad for a suite of robust web services.

That text had to get on-line somehow. In the Tribune's case, it ventures from the newsroom's front-end system built by CText Inc. of Ann Arbor, Mich., into the hands of library archivists, then lands in a database that is managed using the StoryServer toolset for the Web from Vignette Corp. of Austin, Texas.

Jerry Busser, data architect at Tribune Interactive, explained the process:

  • About 9 p.m., Tribune archivists browse a CText queue of typeset items, find all the unique story slugs and begin assigning archive keywords and data relationships for use in the newspaper's library system.

  • By about 4:30 the next morning, the archivists have the entire news report for that publishing cycle organized into a single, formatted text file, ready to insert in the library database. The file contains 300 to 500 unique articles with fielded information and body text. Still, it's usually less than 1 megabyte of data in sum.

  • A copy also moves via File Transfer Protocol to one of the Tribune's web development servers, where a script written in Perl parses the articles for insertion in the web publishing database.

  • Another script acts like a mechanical coin sorter, Busser said, putting articles in appropriate "slots" on the site. The script looks at fielded data with each article -- bylines, print page numbers, sections -- and determines where to put the article in the site. Every shovelware story fits in only one place by design.

    "In some cases, we don't beat the papers onto the street, but in other cases we do," Busser said. "We publish the shovelware content by about 6 a.m. most days."

    The system works, he said, but expectations have outgrown the original specifications.

    "This system was designed when our bias was against shovelware," Busser said. "We figured it was second-class content, just another feed. We'd just dump it into the database and put it in the appropriate holes and producers never touched it."

    That's easy enough, he said, but not flexible enough.

    "It's more rigid than it should be," Busser said. "If a picture was attached to a story in print, in our shovelware process we don't even know that. We'd have to add it by hand."

    Folks at the Tribune and TI plan to seize an upcoming opportunity -- replacement of the CText front-end with a new printside publishing system from CCI Europe of Denmark and Kennesaw, Ga. -- to improve the flexibility of shovelware.

    High on the to-do list, Busser said, is an easier way of moving visuals, especially pictures, from print to Web.

    "When someone exports a story from CCI, we'll be able to know about pictures attached to that story," he said. "When they publish in [the library system] format, they'll export binary images, too, with the same file name referenced."

    Smaller site, smaller shovel
    Image handling is the fanciest piece of the shovelware routines deployed by The News-Sentinel of Fort Wayne, Ind., as described by Keith Hitchens, manager of the paper's web site (http://www.news-sentinel.com/).

    The 46,000-circulation afternoon newspaper relies on a Mac OS application called AutoStripper, distributed by Photo Systems Inc. of Dexter, Mich., to ease the conversion of high-resolution printside images to low-resolution web files.

    Hitchens said his lone web site producer can drag-and-drop groups of photographs from a library archive into AutoStripper. Then the fun begins.

    AutoStripper "resizes and sharpens the images to a pre-set size ready for web use," he said. "It also makes a separate text file containing all the caption information."

    Once the images are processed, the paper has a PC-based script to place the image-caption combinations into Hypertext Markup Language-formatted container documents, then place those containers in a Macromedia Dreamweaver library file.

    The producer opens article pages in Dreamweaver (a WYSIWYG HTML editor from Macromedia Inc. of San Francisco), chooses appropriate image-caption packages from the Dreamweaver library (for the desktop publishers among you, it's much like a Quark XPress library), then drags-and-drops the image HTML onto the article page.

    "This is as simple as it sounds," Hitchens said. "The hardest part of the process is getting the captions edited. The rest ... takes just a few minutes."

    On the text side of the equation, Hitchens said, the paper moves files from its MS-DOS-based front-end editorial system through a batch search-and-replace process. It strips newspaper typesetting markup and reformats the articles in HTML for the Web.

    That process "does nine-tenths of the work for us," he said. The only hitch is when printside designers build pages in Quark XPress from Quark Inc. of Denver.

    That means an extra step, Hitchens said. A Quark XTension called HexWeb pulls the text from the page layout, then another batch search-and-replace script prepares it for the Web.

    The site itself is housed remotely, by InfiNet Co. of Norfolk, Va. Once all the batch processes are done, the News-Sentinel staff simply sends the files to the InfiNet hosting farm via File Transfer Protocol.

    Unlike the Tribune, though, shovelware in the self-proclaimed Summit City doesn't include a staff of archivists to organize and prioritize the articles.

    "Converting is easy," Hitchens said. "Our major time is spent in the 'find' process: finding the stories on our print system and pulling them for the Web."

    Finders keepers
    Newsroom managers at the St. Louis Post-Dispatch, and staffers producing the paper's postnet.com site, started with the same problem: tracking down stories in development and production.

    According to Virgil Tipton, deputy editor at the Post-Dispatch, story budgets for each section used to be separate, flat files in the paper's old front-end system from Atex Media Solutions of Bedford, Mass. Those budget files, scattered and inconsistently formatted, could be updated by only one user at a time.

    The 330,000-circulation Post-Dispatch eased out Atex in favor of an editorial front-end system from Harris Publishing Systems Corp. of Melbourne, Fla. The paper's staff also developed an integrated story budget database in its Lotus Notes-Domino groupware network, Tipton said, from IBM's Lotus Development Corp. division of Cambridge, Mass.

    "Now all the budgets are available to everyone, in one place," he said. "And we integrate it with the on-line stuff so the on-line staff, section editors and reporting teams all work off the same budget.

    "A reporting team leader looking at the budget can see that an on-line editor has tagged a story for special play on postnet.com," Tipton added. "On-line producers can see what the editors deemed to be top stories. It's really cool that you can sit at this Notes database and see what's going to be happening in every section of the paper and every section of postnet.com."

    Cooler still, said Tipton, is the way the Notes-Domino environment also pulls duty as a shovelware back-end and web publishing platform.

    It helps that Notes client software -- combining electronic mail, calendaring and publishing tools -- is commonplace on desktop PCs in the Post-Dispatch newsroom. It also helps that the newsroom develops content for both print and on-line editions.

    "We made the decision early on to assume that our newsroom staff is our on-line staff, as well," Tipton said. "Meanwhile, we wanted to get the computers to do as much of the work as they could. For us, that meant, 'HTML? Bah!'"

    The Post-Dispatch, like the Tribune, assigns a dual role to its library archivists. They prepare articles and images for insertion in library storage systems, and at the same time annotate and extend those records for the Internet.

    Tipton said the decision to have archivists prepare shovelware content for the Web was simply good use of resources. Some staff members of the paper's old bulletin-board service had acted as on-line traffic cops, flagging down stories and parking them in directories. It was redundant work.

    "We thought, 'Wait a minute.' A staff of people at our newspaper already is touching every piece of copy and organizing it -- the archivists," he said. "So what we did was shift their schedules from daytime to a 1 a.m. daily start to get the content on-line sooner."

    The archiving job -- and, by default, the shovelware job from newsroom system to Notes database -- is done by 5 or 6 a.m. daily, Tipton said.

    "We were able to drop five positions in [on-line] production without adding anyone to the archives. Instead, what we did was build a smart tool that allowed archivists to format for both places at once."

    The tool permitted the archivists to categorize the articles more precisely for the extensive directory structure of postnet.com, Tipton said.

    "It simply puts the initial classification process in the hands of people who are our experts at classifying things."

    Paper's on-line. Now what?
    Even for the sites that have deployed mature shovelware solutions, time freed up by automation isn't always reinvested in extending the news report of web products. Some organizations struggle to find a return on resource investments, even when they do develop journalistic content beyond the print-edition feed.

    In recent years, the chicagotribune.com site has featured numerous Tribune series and story packages that were thoroughly enhanced and specially organized with interactive components for the Web.

    But TI's Busser said the "golden days are probably past" for developing such packages.

    "We know it's very expensive to participate in this space, and banner ads aren't getting the job done," Busser said. "People don't seem to be willing to pay for all that embellished content."

    If not, then what do they want?

    "We ask them what they expect from an on-line newspaper, and they say they expect to see the newspaper," he added. "How do you change that perception, and by the same token justify more of these interactive packages? The page views on some of them are in the hundreds."

    TI is an on-line organization big enough to support all the properties of a national media conglomerate -- with dozens of producers, technicians, sales executives and managers. If TI is scaling back efforts to embellish the shovelware feed of its 675,000-circulation flagship newspaper, it's little wonder that a much smaller organization would take pride in just reaching the shovelware threshold itself.

    "Let's be honest," said Fort Wayne's Hitchens. "I have one web person to do our site each day. Just one.

    "Take a good long look at our site, and you can see how deep it is," he added. "All this with one person. You know there has to be a lot of automation to make this happen."

    -- Jay Small

    CCI Europe Inc.,
    (770) 419-1588,
    e-mail: edeasley@mindspring.com;
    CText Inc.,
    (313) 677-4700,
    e-mail: sales@ctext.com;
    Harris Publishing Systems Corp.,
    (407) 242-5000,
    e-mail: hpscmktg@harris.com;
    InfiNet Co.,
    (800) 391-8760,
    e-mail: solutions@infi.net;
    Lotus Development Corp.,
    (617) 577-8500,
    e-mail: info@lotus.com;
    Macromedia Inc.,
    (415) 252-2000, e-mail: info@macromedia.com;
    Photo Systems Inc.,
    (800) 521-4042,
    e-mail: sales@photosys.com;
    Quark Inc.,
    (303) 894-8888,
    e-mail: quarktech@aol.com;
    Vignette Corp.,
    (888) 608-9900,
    e-mail: info@vignette.com.

    From THE COLE PAPERS, November 1999, Copyright © 1999, All Rights Reserved.

  • Top | ColeGroup.com | Consulting | Cole Papers | NewsInc. | Cole's Store | Miscellanea | Search
    Copyright © 1990-2012, The Cole Group. All Rights Reserved. Contact us.
    Modified date: 07/22/2002, 11:43:34 AM.
    URL: http://www.colepapers.net/tcp.archive/Cole_Papers_99/TCP_99_11/shovelware.html