XML Arrives in Word 2003By William H. DuBayThe XML train is finally pulling into the station. It brings an ocean change in the way we create, store, and manage information. In October of last year, Microsoft released Office 2003, which brings the promise of XML to the desktop. Previously, Word 2000 saved only the Properties of documents in an XML module in files converted to HTML.In this new edition, you can save or export all Office documents as XML documents. Using XML tags, we can now identify various elements of our documents for manipulation, storage, and retrieval as you would data in a data bank. It also enables us to more easily share information in those documents across other applications (including Web applications), networks, and operating systems. You Have to Use a SchemaThe Office 2003 implementation of XML requires the strict use of XML schemas to control and validate XML files. You cannot create or use an XML file without a schema. A schema is an XML file that, replacing the older DTD, defines each of the tagged elements in your XML files. All three applications have default schemas: WordML for Word, ReportML for Access, and XMLSS for Excel. Access, Excel, and the stand-alone or Professional version of Word also let you to use your own custom schemas. The version of Word that comes with the Office Standard Edition 2003 only supports the default schema. With it, you can save any Word document as an XML file, tagged automatically using the WordML schema. The Standard Office Edition of Word does not allow you to manually tag elements in a file as do the other versions. It does enable you, however, to create and use Smart Documents (see below). Because you can use more than one schema with an XML file, the stand-alone and Professional version of Word 2003 let you create an XML file with any combination of schemas, including the default. When you attach a custom schema to your file, the Task Pane shows the structure of the XML document. You can apply a tag by first selecting text in the task pane and then selecting the element available for that text in the Task Pane. In the main window, you can turn the markup tags on or off. The default WordML schema supports all the rich-text formatting and objects that we are used to in Word documents. If you have created a regular Word document, you can use a custom schema that tags only certain elements of a file. When you save the file as an XML document, your special elements will be tagged and validated according to your own schema. You also have the choice of including automatically tagging with WordML all the items you did not manually tag using your own schema. If you do not choose this option, the items you did not manually tag will not be saved in the XML document. The Professional and stand-alone of Word also supports XSLT files, files that you can use to transform and format your XML files into other formats, such as HTML. |

|
Fig 1. Custom-tagged data displayed in a Word 2003 document. You have the choice of saving or not saving in the XML file those items not manually tagged with the custom schema. If you choose to save them, the items not manually tagged will be automatically tagged according to the default WordML schema. Smart Documents and InfoPathAll versions of Word 2003 and Excel feature Smart Documents, which use XML-enabled Smart Tags. A smart document can automatically retrieve and enter related data in the correct places. When the smart document recognizes a name, for example, it can place a related address, telephone number, and other information in appropriate places elsewhere in the file. This database efficiency reduces the possibility of error. To read about smart documents, go to the Microsoft Smart Document Web site: You can also use Office's new InfoPath application, which also comes with the Professional Edition, to create and use highly structured XML forms. Both technical communicators and IT professionals will find many uses for these new documents. The full functionality of XML that comes with "data mining" is not yet available in Microsoft Office. This lets you search for items included in specified elements of a document. You will be able to search, for example, for all the documents that contain "Johnson" in the <author> element. This feature will have to wait for the next version of Indexing Services that will hopefully arrive in the next edition of Microsoft Windows. Microsoft has tons of information about the new technology. You can get a general introduction at: http://www.microsoft.com/office/editions/prodinfo/technologies/xml.mspx There is an excellent download, complete with a tutorial and sample XML, schema, and XSLT files at: http://msdn.microsoft.com/library/default.asp?url=/downloads/list/office2k3.asp For those interested in parsing and accessing the XML files created with the default WordML schema, you can download the complete schema and documentation at: |