|
|
Grammarians start to spell out language for marking up 'news'DALLAS -- When last we left the American Press Institute's grammar conferees/task force/futurists, they had vowed to convene a subcommittee to develop a prototype of a "News Markup Language (NML)." The heterogeneous group had come together in November to explore the grammar of, and for, on-line media -- a grammar being the common, necessary tools of any communications medium, be it cinema or Morse code (see The Cole Papers, December 1998). In early January, 15 on-line journalists, programmers, information specialists and journalism professors assembled here to take up the challenge. Not an easy task, that. Such a markup language calls for embedding tags in digital content of all sorts, tags that will assist in the management, searching and repackaging of all editorial and ad content, as well as audio and video files. Such a system, ideally, will be applicable to any medium, and ease the migration from one channel to another. As a technical matter, embedding such tags is relatively easy. To see this, load a web page in your browser, then open up the page of code behind it by going to the View menu and clicking on Page Source in Netscape Navigator, or Source in Internet Explorer. Note the tags that look something like <Style Type="text/css">. That's a generic HTML markup tag, designed to describe the structure of a web page. On the other hand, XML, the tool for the News Markup Language, describes file content. Easy to say -- but what purpose should those NML tags serve? What should those tags be? Who should enter them? How can the system be flexible enough to meet the customization needs of any organization or medium? These are just some of the first-pass, relatively obvious and easy questions. Yet to come are such head-scratchers as: How can a system be created to link news content with transactions? How will we handle specifications driven by still-to-be-contemplated privacy or copyright laws? The Grammarians think the task is do-able. Hyperbole? Hubris? Possibly, but the need can't be ignored. Consider:
Clearly, the need for such a markup tool is becoming ever more obvious, and the technical skills and tools are ready for the challenge. Taking on the challenge of creating a grammar was the ad hoc team in Dallas, which included Dale Peskin, vice president for interactive media of the publishing division of Dallas-based A.H. Belo Corp.; Neil Chase, Northwestern University; Chris Feola, the Media Center at the American Press Institute of Reston, Va.; Kathy Foley, San Antonio (Texas) Express-News; J.T. Johnson, San Francisco State University; Alan Karben, Dow Jones & Co.; Retta Kelley, Cox Interactive Media of Atlanta; B.C. Krishna, FutureTense of Acton, Mass.; Chris Ryan, University of Kansas; Jay Small, Indianapolis Newspapers Inc.; Mike Steele, Media General Inc. of Richmond, Va.; Dave Swint, API's Media Center; Dennis Walsh, University of Miami of Ohio; Chris Willis, Dallasnews.com; Joe Wilson, WashingtonPost.com, and Steve Yelvington, StarTribune.com of Minneapolis. {The alert reader will note that, yes, that's the same Chase, Feola, Johnson, Small and Swint who contribute to various Cole Group publications; an alarming number of the others are subscribers to our works. What can we tell you? We're interested in this stuff.)
Innovation on Wall Street
For a couple of years, Dow Jones has been using its roll-your-own version -- christened, of course, Dow Jones Markup Language. Alan Karben, Dow Jones' associate director for interactive development, told the Dallas group, "The Djml is the central document format for text in the Wall Street Journal Interactive Edition." Unlike typical word-processing or desktop-publishing formats, which separate elements using formatting or typographic codes, elements marked up using Djml are distinguishable based upon their role or function. "This separation between content and formatting is designed to help documents created for the Interactive Edition survive trends in delivery formats and platforms," he said. For example, by marking the names of an article's author with the tag "byline," rather than with codes for bold and italic, the publisher has the flexibility to change how and where bylines are presented. Editors also tag stories to reflect things like company name or person, which facilitates pre- and post-publishing searching. Such tags can help draw a distinction between <person>Thomas Jefferson</person> and Thomas Jefferson High School, which may not be tagged. Still, should an editor wish, a <school> tag could be created or, if desired, a tag could be created for <president>. Karben's editors take stories destined for the ink-on-paper version of the Wall Street Journal and apply the appropriate tags. Then the complete story is translated into HTML for the web site. The Dallas Morning News also is moving vigorously to build its content with an eye toward increased utility and future products. It is using a software product called Dynabase "to relieve editors from the labors of tagging 200 stories a night to go into our 900,000-story archive," said Belo's Peskin. "We just try to simplify things as much as possible to release editors to do the good work they can do," he said. Peskin's colleague, Willis, came up with a tagging system that allows editors to "tag stories in a way that made sense and then put those tags in the workflow." A project in the works for the Belo on-line operation is the development of a CitySearch site for Arlington, Texas, a major market for the Morning News. "We need to work with CitySearch, which has its own set of markup tags," Willis said, "but the question is, how do we use CitySearch's information, but give the site the look-and-feel of the Dallas Morning News?" The Morning News is not yet using XML tags because Dynabase (http://www.inso.com/dynabase/index.htm#extensive) lets editors use XML-like tags, which convert smoothly to HTML code. Consequently, this strategy adequately transforms the digital content coming in from CitySearch databases into web pages that reflect the Morning News' look-and-feel. The process also permits easy updating of content and fits well into the content-management systems of other Belo media properties, Peskin said. Some of the company's television stations are starting to build a web presence; the ultimate goal is to be able to smoothly integrate data coming from newspapers and broadcast stations.
Plan your work ...
The first was dispatched to consider a conceptual structure for the tags, and to make a fast pass at a first-draft list of tags. A second gang brainstormed on how to implement such a system across the journalism industry. The, ahem, tag-team produced a conceptual scheme that recognizes the need for tags that describe a particular file or collection of related files. In the database world, these tags are metatags, which describe metadata. (Metadata are data about data.) Beneath the metatags are three overlapping sets of tags related to process, content and archiving.
Content in digital journalism tends to fall into three areas: news/editorial, commercial (ads and transactions) and community (content that is generated -- and largely maintained -- by individuals or community groups). The proposed system is malleable enough to accommodate all these data types. A well-designed tagging plan can help distinguish between the time and/or date a story was published and the time and/or date of an event. These tags can be used to mark up headlines, subheads, bylines and even such industry-specific components as ledes, nut grafs and kickers. The system is flexible enough --- by using XML ---so that each newspaper, magazine or TV station can customize the tags to fit its unique newsroom vocabulary, yet the data can be shared among all those organizations.
Typical archiving tags include identifier types, which may come from standard coded vocabularies of title types familiar to librarians (Original Title or Headline, Series Title, Subtitle, Translated Title, etc.) and code types (Isbn, Isrc, Iswc, DOI, UPC, Publisher Catalogue Number, Edition, Zone, Section).
... Work your plan
First, the team suggested a common sense guideline, the "five-minute rule." It posits that NML can't add more than five minutes to the editing cycle for a single article. The team also highlighted the need to determine how to include content providers such as the Associated Press and other wires, along with the software and front-end vendors. Finally, the team emphasized the need to "keep it simple," and to ensure that this not become a two-year-long standards project. API's Feola and WashingtonPost.com's Wilson are already creating a Document Type Definition (DTD) of the proposed markup language. On-line test sites are being signed up, including the Washington Post, Belo properties, Indianapolis Star and News, some Media General properties, the Medill School of Journalism at Northwestern University and the School of Journalism at the University of Kansas. All the time and effort to create a News Markup Language is laudable. Still, the journalism industry is somewhat late in recognizing and acting on the need for such a language. Librarians and geographers, for example, have been working on creating indexing or tagging schema for their respective disciplines for more than five years. The Europeans, especially the British information community, already have done much of the conceptual work that is directly transferable to an NML. Their efforts are addressing inevitable future questions of intellectual property rights, uniform geo-codes and codes to facilitate transactions. While it is likely that the API Grammarians will develop a coding plan, test it and implement it in at least some newsrooms, NML's long-term success is dependent not only on speed of implementation, but on the breadth of thought and research underlying the strategy. If NML serves only the newsroom and publication library, instead of the entire journalistic enterprise, much of its potential will be lost. The coding plan cannot ignore, for example, the relatively mature systems developed by librarians, the International Press Telecommunications Council, or the geo-code schemes coming from the U.S. Geological Survey that are being adopted by governments at all levels. If it does, it runs the risk of being just another good idea that "coulda beena contenda." In the last 36 months, an encouraging degree of cooperation has developed among professional and academic journalists as they struggle to learn how to survive in the new information environment. NML is another indicator of that cooperation. -- J.T. "Tom" Johnson Inso Corp., (617) 753-6500. See also Classify thisFrom THE COLE PAPERS, February 1999, Copyright © 1999, All Rights Reserved.
|
|
Top |
ColeGroup.com |
Consulting |
Cole Papers |
NewsInc. |
Cole's Store |
Miscellanea |
Search Copyright © 1990-2012, The Cole Group. All Rights Reserved. Contact us. Modified date: 07/22/2002, 11:43:28 AM. URL: http://www.colepapers.net/tcp.archive/cole_papers_99/TCP_99_02/grammarians.html |