The Cole PapersOctober 2000
The envelope please: NewsML provides a wrapper -- or an envelope -- which can bind together a number of disparate parts of a news story. Here, the "news item" binds together three major news components -- a print package, a TV package and a radio package. The print package has three content items: the main text, the photos and the sidebar. The sidebar is then broken down into multiple components as well: text and graphs. The content items can be in multiple languages as well. Source: IPTC.

Suppliers skeptical about new wire service transmission

By the time you read this, after months and months of debate, the group that sets the international technical standards for the digital transmission of news will have finally approved a specification that will bring the delivery of news firmly into the 21st century.

The new standard, called News Industry Text Format (NIFT), will supplant a specification first published in 1979, known variously as ANPA 1312 and IPTC 7901.

Combined with last summer's approval of the beta of another news transmission standard -- NewsML -- NIFT will allow newspapers and other news media to determine who owns a copyright to a specific news item and who may republish it; what subjects, organizations and events the new item covers; when the news item was reported, when it was issued and when it was revised; where the item was written, where the item's information took place and why the item is newsworthy, based on keywords an originating editor places into the news item's format. A vast array of other characteristics of the news item can also travel with it, allowing newspapers and news web sites the ability to automatically manage the thousands upon thousands of news items that pour in each day.

Today that management is largely a manual process, because the current specifications for news transmission are tied to a world where 1200 bits per second was considered "high-speed" delivery.

And though the standards process has taken almost a decade to produce these new formats, none of the participants seems particularly interested in implementing the new way of doing things.

A little history
ANPA 1312 and IPTC 7901 were crafted by the American Newspaper Publishers Association (ANPA, the predecessor to today's Newspaper Association of America) and the International Press Telecommunications Council (IPTC) in the late 1970s to assure that newspapers, wire services and publishing systems suppliers all agreed on the format of a wire service story. And while the IPTC began work on a new format in the early 1990s, it has taken almost a decade to get that format to a position where all concerned agreed to its content.

The long and winding road from ANPA 1312 to what are now known as NewsML and the NIFT was influenced greatly by the advent of the World-Wide Web in the mid- and late-'90s. When the process began there were no news web sites and there certainly weren't any news aggregators. As they are being completed, the new complementary standards now have the blessing not only of wire services, newspapers and suppliers, but also news web sites as well. (See The Cole Papers, August 1993, February 1999 and March 2000 for more background on ANPA 1312 and NIFT.)

Glenn Cruickshank, a senior manager in the digital asset management practice of KPMG Consulting LLC of McLean, Va., who has participated in the development group, said that the problem was that the new standards were developed "first in SGML, then in XML."

SGML, or Standard Generalized Markup Language, was a good place to start developing a news markup language in the early '90s. SGML is a coding language like Quark XTags or the composition code you would find in an editorial front-end system, but it's more as well: it separates structure from presentation, allowing a document to be portable.

In other words, in Hypothetical Typesetting System No. One, a code in a document called <byline> only denotes, for example, bold face with a rule above and below. To move that document to Hypothetical Typesetting System No. 2, where there is no <byline> code but there is a <credit> code, you would have to do a find-and-replace to change all the instances of <byline> to <credit> -- not to mention changing all the other composition codes as well.

This makes moving documents around rather difficult. In the 21st century, though, we're always moving documents around -- from one publishing system to another, from one web site to another, from print to wireless, from newspapers to TV.

In SGML, <byline> only denotes that the following text is a byline -- it defines the structure rather than the presentation. When the document is moved between systems, each system has its own specific SGML document type definition (DTD), which translates <byline> to whatever coding -- or presentation -- that system needs.

In the latter part of the '90s a subset of SGML, the eXtensible Markup Language (XML), was created specifically with the World-Wide Web in mind. XML was much better suited to the problem of news markup than SGML.

This, in essence, required the NIFT developers to scrap much of their previous work and start over.

A former newspaper executive who headed the company that makes the NewsView and PhotoView archiving systems (then called Tribune Solutions, now called NewsView Solutions), Cruickshank said that developing international news standards are not a simple process, as it is all volunteer work and requires extensive coordination.

"Thirteen-twelve worked pretty well for text, but with the emergence of the Web, a new standard was required," he said.

Further complicating the process is that while NewsML and NIFT were being developed in parallel, they overlapped in many areas. The volunteers had a go at making the standards complementary -- rather than competing -- over the last year and the result is best described by an analogy Cruickshank drew:

"Think of NewsML as the label on the outside of a box of books. It is also the shipping list inside the box. NIFT is the books themselves."

The shipping list inside Cruickshank's box of books is an analogy for what is called "metadata" -- or information about the information.

And, in fact, NewsML can envelop a variety of elements to make them a "news item": text, pictures, video or audio. NIFT, obviously, can only be applied to text.

Cruickshank said that NewsML is "really an XML wrapper around news items and serves as a mechanism to link different media elements together, including shared metadata. NIFT is the XML markup standard for textual elements included in a NewsML file."

David Allen, managing director of the IPTC, which is based in Windsor, Berkshire, United Kingdom, said, "NewsML is based on XML and is an open-standard. It is vendor-independent and will last as long as any other XML-based structure. It extends the principle of decoupling structure from content by using external controlled vocabularies to populate the metadata and other parts of the information tree. This means that the standard can be used for different purposes and to meet diverse business objectives without breaking."

Allen said NewsML knows that news information is no longer principally text with some still images and that news evolves over time. "NewsML recognizes these aspects of information management and provides the necessary features to support them."

Just to make matters a little more complex, the companies that are aggregating news on the Web -- folks like ScreamingMedia Inc. of New York City, and Wavo Corp. of Phoenix -- are now involved in something called the Xmlnews initiative, which is based on a subset of NIFT and will promote and support the use of open standards for news delivery.

The Xmlnews initiative is headed by David Megginson of Megginson Technologies Ltd. of Ottawa, Canada, and allows information providers to distribute their content in a single, standardized format.

"The Xmlnews initiative is going to have a lot of momentum," said Megginson, who worked on the NIFT standard and is chair of the WorldWide Web Consortium's (W3C) XML Information Set Working Group. "The news industry has been waiting a long time for someone to stop just talking about new standards and start actually implementing them -- this is it."

What do the suppliers think?
In the short term, KPMG's Cruickshank said, "publishers will have to build or modify their own systems," to support NewsML and they "will have to build black-box converters [to take NewsML encoded material] back to 1312 until they get a new system."

George Landau, founder of Naberth, Pa.-based NewsEngin, has spent a lot of time recently developing a new product he calls WireTracker, which is based on Lotus Notes and has been implemented at the New York Times to off-load wire processing from the paper's editorial front-end system. Reporters and editors browse the wires available to the Times on an intranet with standard web browsers.

"Right now we fully support NIFT," said Landau, delineating between the wrapper and markup standard for text. "To my knowledge, the only XML wire feed currently available outside of a test environment is the Associated Press' NIFT feed, which is being delivered over the Internet using NNTP [Network News Transfer Protocol] -- the tried-and-true protocol for disseminating the threaded discussions of Usenet Newsgroups."

Landau said that WireTracker supports the AP's NITF-NNTP feed.

"Stories from the XML feed have a more defined structure," said Landau, "with datelines and unambiguous headlines. In addition, the AP is using NNTP to move screen-resolution versions of their photos, which display quite well in our 100 percent Web-based WireTracker. As soon as the AP begins to include information linking text stories with particular photos, we will modify our app[lication] so that the two automatically display together," Landau said.

Despite this, Landau believes he can provide support of NewsML quickly. "We will support a true NewsML feed within a few weeks of any customer's request to receive it," he said. Landau also said that his company finds the specifications pretty straightforward, easy to implement and far more satisfying to work with than ANPA 1312.

"At NewsEngin, we love structured data because it's more useful for news-gathering -- the heavily delimited nature of XML is something that we're excited about," said Landau.

"Wire delivery using NewsML is a big deal, but not nearly so big as what I envision in NewsML's eventually becoming the universal wrapper for all news content. Given the obvious need for newspapers to support a broader range of media and devices, NewsML should make it far more feasible for a newspaper newsroom to remain the primary source of knowledge about its community."

He said, "I view the sudden ubiquity of XML as a keystone completing a global network of knowledge in which the majority of recorded information is described and organized in ways that profoundly increase its availability and usefulness."

Landau said that the Internet has "provided the physical connection," and now everyone has agreed on a standard, non-proprietary format for structuring and exchanging knowledge.

"It's a huge development," Landau said, "and I feel extremely lucky to be alive while it's happening. To the extent that the news industry has always been a prominent feature on the information landscape, I suspect NewsML may prove to be among the most important implementations of XML."

Landau also said that the benefits of NewsML include simplified data input -- one source for images and stories -- as well as the promise of more highly described elements that an editorial system can read for a variety of purposes.

"Some of those purposes include automatic formatting, better information sorting for end-users etc. NewsML has the potential to be a big deal but only if the major news service providers adopt it," he said.

Suppliers, Landau said, can provide NewsML import-export features to ease the transition of moving data from one supplier's source to another -- for example, between a front-end system to the Web or archive.

"But rewriting wire-collection systems only makes sense if organizations like the AP adopt it," Landau said.

More skeptics
Arguably, Baseview Products, the Ann Arbor, Mich., division of Harris Publishing Systems Corp., has sold more wire collection systems than any other supplier in the industry. But Baseview's product line does not currently support NewsML.

"We can support NIFT input today," said Victor Cardoso, assistant editorial product manager, "but we do not provide qualified NewsML support. We have international customers accepting Sweden's TT news service, which transmits text in an SGML variant similar to NIFT. We expect to be able to fully integrate NewsML input in our next generation editorial system, IQue 4.0."

Cardoso is skeptical about whether newspapers need newswire-collection systems that support NewsML. "Currently, the NewsML specification is still in beta format and is therefore open to change. We don't see a need for newspapers to look into purchasing NewsML wire-collection systems until the specification has been ratified and adopted by the major news service providers."

Nonetheless, Cardoso said, "XML appears to be the future for allowing disparate companies to share data more easily and efficiently.

"But there is a lot of hype in the computer industry that implies anyone using XML is instantly going to get that level of detail. The catch is that to get that level of detail, users either need to manually tag each piece of data within a file or have an application sophisticated enough to understand that Don Smith is a person or a politician."

Cardoso said that's the challenge XML poses to software suppliers -- how to write applications that provide that level of detail without it becoming a burden to the end users.

"If we vendors do not accept that challenge, or fail in the attempt, the XML is nothing more than a pretty header format," he said.

Digital Technology International (DTI) of Springville, Utah, was one of the first suppliers to handle data internally in its system in an "Sgml-like" way, so it isn't surprising that the company's products currently support NIFT.

"We will expand our current support of NIFT to include additional NewsML options based upon customer demand for them," said Alyson Oldham, the company's marketing director.

"DTI has been supporting tagged data formats for many years, so we have worked out many of the problems. Supporting both NIFT and NewsML were more in the way of an upgrade than the adoption of new technology for us. The main problem has simply been the ever-changing specifications."

"We do have legacy wire-collection systems," Oldham said. "We have been converting legacy wire to tagged wire for many years and the latest wire version of our WireSpeed software converts legacy wire feeds to NIFT as well as receives already tagged NIFT wire feeds."

Oldham said the company will support full NewsML because it now uses XML-tagged data internally in its editing and pagination software. "We do not simply use XML as an import or export format," Oldham continued. "Newspapers should get new systems that save them money by automating the print pagination process and by enabling Web publishing from the same system. With the amount of money newspapers are losing on Web publishing these days, we are finding enormous interest in this method of combined publishing capability. Wire collection supporting NewsML is fundamental to this approach."

What's the impact?
According to the Iptc's Allen, "NewsML will be a valuable tool for information providers and publishers, especially those who operate in several publishing domains. Although it is largely Internet-centric, it is also designed for use in a broadcast system. The speed of take-up will depend on the [news organization's] ability to handle XML and its associated standards.

Allen said that with the final version 1.0 of NewsML, scheduled to be adopted this month at the IFRA Exposition in Amsterdam, "I anticipate a number of IPTC members and others making announcements about its adoption soon after this."

In order to allow NIFT to be used in a stand-alone service, Allen said, it has to have some metadata and hypertext linking capability. However, it is the recommended text format for inclusion with NewsML and is fully compatible.

He continued, "One of the major advantages of NewsML is its system neutrality. Though the use of transformers, it is possible to convert NewsML to older formats. Although this will result in a loss of functionality, it means that legacy systems can accept NewsML in the short-term. There are encouraging signs that vendors are now starting to offer XML-capable editorial systems and this should assist in the introduction of NewsML and other XML based formats."

XML -- fad or future? Iptc's Allen summed it up: "The eXtensible Markup Language is the latest open standard for the information age. The IPTC selected it as the basis for its own standardization work last year and is committed to use XML for the foreseeable future."

Allen said, "We see the XML family of standards to be key to the long-term management of information as it removes systems-format dependency. It is now becoming widely adopted across many different business domains and it is only sensible that our prime information distribution industry makes full use of its power and flexibility as soon as possible. There has been major investment in XML functionality and this will ensure that it is likely to remain an important tool for information exchange for many years to come."

KPMG's Cruickshank said that trying to predict the future of XML is too big a question, "It's an international standard, and will be the interchange standard for data."

A lot of opinions, many of them similar, with the theme being that there is finally a single markup language that will enable the news industry to share news items, whether they are destined for print, broadcast or Internet, thus sparing costly translation time.

-- Aimee Beck, ab@colepapers.net

Baseview Products Inc.,
(734) 662-5800,
e-mail: marketing@baseview.com;
Digital Technology International,
(801) 853-5000,
e-mail: dtinfo@dtint.com;
International Press Telecommunications Council,
{011} (44) 1753 705051,
e-mail: m_director_iptc@dial.pipex.com;
KPMG Consulting LLC,
(703) 747-3000,
e-mail: kpmgweb@kpmg.com;
Megginson Technologies Ltd.,
(613) 722-8770,
e-mail: info@megginson.com;
NewsEngin Inc.,
(636) 537-8548,
e-mail: info@NewsEngin.com;
NewsView Solutions,
(801) 257-8803,
e-mail: info@newsviewsolutions.com;
ScreamingMedia Inc.,
(212) 691-7900,
e-mail: info@screamingmedia.com;
Wavo Corp.,
(602) 952-5500.

From THE COLE PAPERS, October 2000, Copyright © 2000, All Rights Reserved.

Top | ColeGroup.com | Consulting | Cole Papers | NewsInc. | Cole's Store | Miscellanea | Search
Copyright © 1990-2012, The Cole Group. All Rights Reserved. Contact us.
Modified date: 07/22/2002, 11:42:44 AM.
URL: http://www.colepapers.net/tcp.archive/cole_papers_00/TCP_00_10/newsml.html