The Cole Papers

Struggling for one language, news groups attempt conformity

"Soup is good friends."

The alphabet soup I grew up with has been advertising lately. The letters in the soup are now bigger, and, presumably better.

And the newspaper -- or information -- business, always on top of the trends in its own unique ways, has moved on this one as well. While its own version of alphabet soup may not feature bigger letters, it has nonetheless been busy for several years adding more letters to the soup.

Over the years, we have regaled you with the behind-the-scenes story about the politics of standards (see The Cole Papers, August 1993 and February 1999 and NewsInc., May 11, 1998) -- the case in point being the ongoing development of the News Information Text Format (NIFT).

In June 1992, the International Press Telecommunications Council (IPTC), a group based in the United Kingdom that represents news agencies and newspapers worldwide, began committee work on an industry standard for the interchange of textual material designed to replace two standards: IPTC 7901 and the ANPA 1312 (ANPA stands for the American Newspaper Publishers Association, which merged with other newspaper groups to become the Newspaper Association of America of Vienna, Va. -- the NAA -- in 1994). Both standards were developed in the mid-1970s.

The tasks of the New Text Subcommittee were four: analyze components and characteristics of text transmissions; articulate the problems and requirements the new text format should handle; investigate and select from among alternative representation schemes, and write, review, revise, approve, issue and implement the standard.

From the new text format, the committee required device independence, data intelligence, robust markup, ease of customization, markup minimization, availability of cost-effective off-the-shelf tools and compatibility with the new coded character set standards. It would provide a more adequate method of delivering tabular material, and include information identifiers.

For example, company names or organizations might be one class of proper noun, each of which could be assigned a "normalized identifier" which would subsequently become the basis of indices or subsidized reports. It would also include linkages, for example, between a brief and a full story, between a photo or audio clip and the associated text or between temporally separated versions of a story. Lastly, it would provide for the delivery of specialized formats, such as TV listings.

It's elemental
The Universal Text Format (UTF) had four major categories of Standard Generalized Markup Language (SGML) elements -- tables, a structural element set, a non-structural element set and the Information Interchange Model (IIM) datasets. These elements, which became the Document Type Declarations (DTD), were derived from work begun by Mead Data Central (then owners of the Dayton, Ohio-based on-line archive Lexis-Nexis, which is now a division of Reed Elsevier Inc.). Other elements were borrowed from previously published work, the Text Encoding Initiative, or TEI, which was sponsored by the Association for Computational Linguistics, the Association for Literary and Linguistic Computing and the Association for Computing and the Humanities. The work of the American Association of Publishers, the European Capsnews project and the International Committee for Accessible Document Design provided consideration for rendering text into Braille, large print or computer voice. The third source was the Computer-aided Acquisition and Logistics Support (Cals), which was an SGML initiative of the U.S. Department of Defense.

The element sets are what newsrooms are most likely to see. The structural element set contains the IIM/UTF element (the text data object, the IIM datasets, file element) the basic publishable chunk of text in the UTF. Then come the text blocks -- there may be several in a single transmission, which will usually contain related stories -- and paragraph level elements. Categories in the non-structural element set include text characteristics, temporal markup, names, language, locations, numbers, annotations and references, embedded quotes and optional material.

The envelope, please
The UTF was the precursor to NIFT, introduced in 1994, and designed as an SGML standard for the "News Distribution Industry," according to Dave Becker, who then worked for Lexis-Nexis and wrote a paper on the topic (the full text of which is at http://www.NIFT.org/docs/anatomy.HTML).

According to Becker's paper, the advent of digital photo technology in the late '80s and early '90s was the motivation for a new standard: the old transmission formats were simply inadequate. From that realization was born the IIM, the intent of which was the production of "an electronic envelope capable of transferring files derived from different sources of information between any of the various computer systems used in the news distribution industry."

That envelope concept was based on a seven-layer Open Systems Interconnection (OSI) model for the transfer of information between computers. While the lower layers in that model are filled in by products and standards generated by "other organizations," the top layer of the OSI was reserved for definition by those organizations, which actually sent and received the information, which included all types of data -- from text, to photos to graphics, audio and video. The goal was to provide "universal communications" on a single network or storage medium.

The product of that thinking was the IIM. In the application level of the OSI, drawing again from Becker's report, was a generic database in which a data object, in a binary format appropriate to the object itself, is surrounded by an "envelope" which contains self-identifying datasets required for any given application, although others can be used at the option of sender and receiver.

The data object and its companion datasets are organized by the IIM into nine records, three of which are unused. Record One is the envelope record, containing routing, identity and envelopment data. Records Two through Six are reserved for application records -- Two and Three contain pertinent editorial information about the object and image parameters related to digitized image objects -- Four through Six remain undefined. Record Seven is a pre-object descriptor containing an estimate of object length, Record Eight is the data object itself and Record Nine is the post-object descriptor, which contains the actual object length as transmitted.

These versions of the IIM and Digital Newsphoto standards were approved in January 1992. Revisions were approved in April 1993.

What happens at those meetings?
In the last couple of years the IPTC/NAA standards committee has stayed busy with frequent meetings around the world. Despite the ratification of the standard by the range of groups involved -- the IPTC, the NAA, the RTNDA (Radio-Television News Directors Association) -- the NIFT is an evolutionary beast. But it is also the only XML (eXtensible Markup Language) standard recognized by IPTC and NAA, says Walt Baranger, a committee member and assistant to the editor at the New York Times. "The Associated Press is going to go live with it in a few days. Dow Jones will likely follow."

The establishment of a new working group to provide maintenance and upkeep is one of the newest developments to come out of the meetings. Baranger says the "working party was formed to identify changes to NIFT that are needed for implementation. This is mainly because the emerging XML standard has forced certain changes." Though he anticipates no major changes, he expects that "several tags will be either clarified or deprecated." The deprecation is a consequence of events described below.

Yet, another meeting of the NIFT Maintenance Working Party was in progress in New York at press time. An Ad Hoc Metadata meeting is set for Paris in March 2000, and of course, the Iptc's annual spring meeting will be held at the end of March in Nice.

"The standards are going in two different directions now," says Glenn Cruikshank, a senior manager with Kpmg Consulting and a member of the NIFT committee. Much of the time at recent meetings has been devoted to discussion of the Reuters-led NewsML proposed standard.

In a 1999 news release, Reuters described its NewsML as an open standards-based format for the creation, transfer and delivery of news. It is based on XML, the emerging Internet standard for data sharing between applications developed by the World Wide Web Consortium, the coordinating body for Internet developments.

An April 1999 IPTC press release described the NIFT in similar terms: "The NIFT is an XML-compliant set of tags. ... By settling upon a single markup language, news organizations can share new articles and graphics among print, broadcast, electronic, Internet and archive systems without the need for costly translations and manual editing. Using a language that embraces the latest internationally accepted standards assures newspapers and broadcasters that stories can flow unimpeded between their news systems and the Internet."

In short, the NIFT will be an updated, more richly formatted version of ANPA 1312.

Expanding the scope
But the scope of capabilities that NIFT could open up extends to cellular phones and PDAs (personal digital assistants) as well as the Web. The AP has formulated NIFT products, and Reuters has been a key player in formulating the NIFT standard.

Baranger describes NewsML as contentious. "Reuters seems determined to go ahead with it as soon as possible, but many members have serious questions about its complexity, database implications and user interface. The March meeting in Nice, France, will be the place where NewsML will be more fully defined. NewsML is still in the conceptual stage, and is currently envisioned as a multimedia standard. ... [T]he entire IPTC membership will get its first real look at the NewsML outline at the Nice meeting."

Baranger's chief concern about NewsML is that it will upstage NIFT at its crucial roll out -- leaving text-based information providers short of the support and attention they need while working out the issues associated with implementing a new standard, which he expects to happen early this year. "NIFT is worth the effort to implement," he said, "but we have to have the tools." (Another, related though somewhat tangential concern, is the absence -- with notable exceptions -- of supplier involvement with developing the new standard.)

During the early part of that roll out period, the NIFT wire will look about the same as the old ANPA 1312 wire. But within a reasonably short time, Baranger expects to see more in-line markup. Eventually, he expects the quality of the "metadata" that wire services include will differentiate them in the way that their reporting and editing traditionally have.

Cruikshank also expressed some concern over the possibility of "Babelization," and its attendant demands for translating between standards rather than having one that seamlessly addresses the entire range of data objects. Kevin Roche, the Dow Jones representative to the committee who is news systems manager for the Wall Street Journal and chair of the Subject Codes subcommittee, agrees with Baranger that for the roll out period, at least, the focus should be on the NIFT.

John Iobst, vice president for technical research at the NAA and chair of the committee, believes that the NIFT is "still a central part of the whole thing. NewsML will make it easier to support non-text."

The functionality and requirements specifications that will be the basis of NewsML have been released, and an updated edition of the NewsML Encoding Decisions document is also available, all through the IPTC web site.

The report from the fall 1999 IPTC meeting in Sydney suggests that the metadata -- for example, the Subject Code System -- should be allowed to develop separately from the standard itself. Cruikshank described metadata as a component which "adds knowledge and value to words."

In the Iptc's newsletter, Reuters representative Tony Allday was quoted after that meeting as saying that the technical and commercial landscape has changed dramatically and though IPTC is alive and well, there is a need to ensure that the standards development process can move fast enough to respond to the changes in the market. "The present system with three main working meetings a year was probably not enough -- there was a need for more activity from ad hoc groups, more use of e-mail and teleconferences and possibly for some outside effort (from consultants)," Allday wrote.

As described in that newsletter, Reuters "did not consider that the NIFT (the existing IPTC standard using XML) offered the flexibility they needed and so was not a complete answer. Business needs had forced Reuters to produce an experimental DTD and work with XML in the real-world environment."

Reuters' Jo Rabin wrote a paper describing the NewsML prototype, saying that he considered it an "XML version of the IIM, but one that does much more." Rabin's vision is that one working group would investigate the "basic news container standard," another would look at "a news content standard for text, and a third would deal with news content metadata."

Relatedly, Allday agreed to chair a new IPTC Public Relations Committee to promote IPTC achievements and activities, and foster IPTC participation in the activities of other relevant standards bodies.

According to the report, "the NIFT itself will be little changed, continuing as a versatile text format alongside the new standard." Ultimately, three formal IPTC 2000 working groups were established from that meeting. The IPTC 2000 project, under the auspices of the Standards Committee, will continue to develop an XML-based standard to represent and manage news throughout its life cycle, including production, interchange and consumer use. The work will be undertaken by the News Structure and Management Working Party, chaired by Reuters' Rabin, and charged with developing an XML-based framework standard for structuring, relating and managing news objects and processes in a multimedia environment, drawing from IPTC published standards and other standards development (including news management and intellectual property rights); the News Metadata Working Party, chaired by Stephane Guerillot of Agence France Presse, which will identify and define descriptive data, with associated structure and values, about news objects and their relationships, and the News Text Working Party, chaired by Alan Karben of the Wall Street Journal Interactive Edition, charged with producing XML-based components for structuring and marking news text, drawing from IPTC published standards and other standard development (including NIFT, other text, tables and special tables).

The can of soup that is the newspaper industry's attempt to standardize the transmission of information -- whether between news agencies and newspapers or between newspapers and consumers -- certainly may not have the bigger -- and presumably better -- letters that its culinary counterpart may have, but it does have lots of letters and they continue to be swirled in new and creative ways.

-- L. Carol Christopher

The IPTC is on the Web at http://www.IPTC.org;
the NIFT working group is on the Web at http://www.NIFT.org;
the Newspaper Association of America is on the Web at http://www.naa.org/,
and the RTNDA is on the Web at http://www.RTNDA.org/.

See also Implementation guidelines

From THE COLE PAPERS, March 2000, Copyright © 2000, All Rights Reserved.

Top | ColeGroup.com | Consulting | Cole Papers | NewsInc. | Cole's Store | Miscellanea | Search
Copyright © 1990-2010, The Cole Group. All Rights Reserved. Contact us.
Modified date: 07/22/2002, 11:42:40 AM.
URL: http://www.colepapers.net/tcp.archive/cole_papers_00/TCP_00_03/standards.html