|
EMR >> Technology
XML - Extensible Markup Language
Electronic Medical Records
XML
The Extensible Markup
Language (XML) is a W3C-recommended general-purpose
markup language for creating special-purpose markup
languages, capable of describing many different kinds of
data. It is a simplified subset of SGML. Its primary purpose
is to facilitate the sharing of data across different
systems, particularly systems connected via the Internet.
Languages based on XML (for example, Geography Markup
Language (GML), RDF/XML, RSS, MathML, Physical Markup
Language (PML), XHTML, SVG, MusicXML and cXML) are defined
in a formal way, allowing programs to modify and validate
documents in these languages without prior knowledge of
their form.
History
By the mid-1990s some practitioners of SGML had gained
experience with the then-new World Wide Web, and believed
that SGML offered solutions to some of the problems the Web
was likely to face as it grew. Jon Bosak argued that the W3C
should sponsor an "SGML on the Web" activity. After some
resistance he was authorized to launch that activity in
mid-1996, albeit with little involvement by or support from
the W3C leadership. Bosak was well-connected in the small
community of people who had experience both in SGML and the
Web. He received support in his efforts from Microsoft.
XML was designed by an eleven-member working group,
supported by an (approximately) 150-member Interest Group.
Technical debate took place on the Interest Group mailing
list and issues were resolved by consensus or, when that
failed, majority vote of the Working Group. James Clark
served as Technical Lead of the Working Group, notably
contributing the empty-element "<empty/>" syntax and the
name "XML". Other names that had been put forward for
consideration included "MAGMA" (Minimal Architecture for
Generalized Markup Applications), "SLIM" (Structured
Language for Internet Markup) and "MGML" (Minimal
Generalized Markup Language). The co-editors of the
specification were originally Tim Bray and Michael Sperberg-McQueen.
Halfway through the project Bray accepted a consulting
engagement with Netscape, provoking vociferous protests from
Microsoft. Bray was temporarily asked to resign the
editorship. This led to intense dispute in the Working
Group, eventually solved by the appointment of Microsoft's
Jean Paoli as a third co-editor.
The XML Working Group never met face-to-face; the design was
accomplished using a combination of email and weekly
teleconferences. The major design decisions were reached in
twenty weeks of intense work between July and November of
1996. Further design work continued through 1997, and XML
1.0 became a W3C Recommendation on February 10, 1998 .
XML 1.0 achieved the Working Group's goals of Internet
usability, general-purpose usability, SGML compatibility,
facilitation of easy development of processing software,
minimization of optional features, legibility, formality,
conciseness, and ease of authoring.
Clarifications and minor changes were accumulated in
published errata and then incorporated into a Second Edition
of the XML 1.0 Recommendation on October 6, 2000. Subsequent
errata were incorporated into a Third Edition on February 4,
2004.
Also published on the same day as XML 1.0 Third Edition was
XML 1.1, a variant of XML that encourages more consistency
in how characters are represented and relaxes restrictions
on names, allowable characters, and end-of-line
representations.
Both XML 1.0 Third Edition and XML 1.1 are considered
current versions of XML.
Features of XML
XML provides a text-based means to describe and apply a
tree-based structure to information. At its base level, all
information manifests as text, interspersed with markup that
indicates the information's separation into a hierarchy of
character data, container-like elements, and attributes of
those elements. In this respect, it is similar to the LISP
programming language's S-expressions, which describe tree
structures wherein each node may have its own property list.
The fundamental unit in XML is the character, as defined by
the Universal Character Set. Characters are combined in
certain allowable combinations to form an XML document. The
document consists of one or more entities, each of which is
typically some portion of the document's characters, encoded
as a series of bits and stored in a text file.
The ubiquity of text file authoring software (word
processors) facilitates rapid XML document authoring and
maintenance, whereas prior to the advent of XML, there were
very few data description languages that were
general-purpose, Internet protocol-friendly, and very easy
to learn and author. In fact, most data interchange formats
were proprietary, special-purpose, "binary" formats (based
foremost on bit sequences rather than characters) that could
not be easily shared by different software applications or
across different computing platforms, much less authored and
maintained in common text editors.
By leaving the names, allowable hierarchy, and meanings of
the elements and attributes open and definable by a
customizable schema, XML provides a syntactic foundation for
the creation of custom, XML-based markup languages. The
general syntax of such languages is rigid — documents must
adhere to the general rules of XML, assuring that all
XML-aware software can at least read (parse) and understand
the relative arrangement of information within them. The
schema merely supplements the syntax rules with a set of
constraints. Schemas typically restrict element and
attribute names and their allowable containment hierarchies,
such as only allowing an element named 'birthday' to contain
1 element named 'month' and 1 element named 'day', each of
which has to contain only character data. The constraints in
a schema may also include data type assignments that affect
how information is processed; for example, the 'month'
element's character data may be defined as being a month
according to a particular schema language's conventions,
perhaps meaning that it must not only be formatted a certain
way, but also must not be processed as if it were some other
type of data.
In this way, XML contrasts with HTML, which has an
inflexible, single-purpose vocabulary of elements and
attributes that, in general, cannot be repurposed. With XML,
it is much easier to write software that accesses the
document's information, since the data structures are
expressed in a formal, relatively simple way.
XML makes no prohibitions on how it is
used. Although XML is fundamentally text-based, software
quickly emerged to abstract it into other, richer formats,
largely through the use of datatype-oriented schemas and
object-oriented programming paradigms (in which the document
is manipulated as an object). Such software might only treat
XML as serialized text when it needs to transmit data over a
network, and some software doesn't even do that much. Such
uses have led to "binary XML", the relaxed restrictions of
XML 1.1, and other proposals that run counter to XML's
original spirit and thus garner an amount of criticism.
Source: Wikipedia contributors (2006). XML. Wikipedia, The
Free Encyclopedia. Retrieved 03:48, January 16, 2006 from
http://en.wikipedia.org/w/index.php?title=XML&oldid=35322569.
|