 |
Oct
2000 |
Mrs.
Saraswathy Ashok,
Technical Editor. |
| |
Way back in
1960s, IBM created GML (Generalized Markup Language)
to address the needs of structuring documents
in a standardized form so that reports and other
documents could be produced from a single set
of source but it could not address the issue on
a large scale.
It was later
that SGML (Standard Generalized Markup Language)
emerged from research done primarily at IBM on
text document representation that was expanded
and adapted for use in a wide range of industries
such as in large aerospace, automotive, and telecommunications
companies. It a metalanguage used for defining
other languages and each so defined language is
called an application of SGML. Although SGML is
extremely powerful, its complexity and overhead
was the reason behind SGML not being an option
for representing hypertext in the early days of
the Internet.
In 1989, Tim
Berners-Lee designed a tag-based language for
marking up technical documents that is shared
over the net known as the HTML (HyperText Mark
Up Language). Berners-Lee defined HTML in SGML,
and is thus an application of SGML.
What do you
exactly mean by "markup language"?
Almost all
documents have some structure. A markup language
is a mechanism to identify structures in a document.
Well, this is better understood in the form of
an example, say the creation of a file in a word
processor. Along with the content that is created,
the word processor stores extra information like
the instructions to control the layout and the
appearance of the words themselves, which is collectively
known as the markup.
You have propriety
markup languages like the one used by word processors
and famous open nonproprietary markup language
like the HTML.
In just a few
years, the WWW and the HTML had taken the world
by storm. Despite the popularity of HTML, it is
severely limited to what it can do. But for the
ever increasing demand for more flexibility in
the Internet systems there was a need for a new
markup technology that had the core benefits of
SGML and the simplicity of HTML.
It was then
XML, the Extensible Markup Language brewed the
horizon.
The figure
below clearly indicates that XML is a simplified
version of SGML; and HTML is an SGML application.
|
Figure
1. The relationship among SGML, XML, and
HTML
|
Is XML just
another HTML? You might wonder why, if XML can
be used to generate HTML, you can't write the
same HTML directly yourself. Well, HTML being
simple is not that flexible, which could be overcome
by using XML.
XML documents
looks a lot like a HTML document that consists
of a mixture of data and markup. With XML you
have the freedom to invent your own tags and the
structural relationships between them but in HTML,
both the tag semantics and the tag set are fixed.
For example, in HTML, a h1 is always a first level
heading and the tag is meaningless
whereas with XML, you could have a Description
element that has the as the start
tag and ends with an end-tag .
E-Business
applications are the new breed of application
where you can expect constant changes as the corporations
will need to deliver targeted products and services
to their customers faster than competition can.
XML's extensibility is the key to make this possible
and is thus revolutionizing e-business. As it
provides flexibility, extensibility and performance
benefits in e-commerce environment, the benefits
of XML outweigh those of simple HTML.
Example:
Consider an inventory of cars in which each car
has a make, model and price. Now suppose you wish
to record the mileage and the color of the car
also. This is possible with XML. The extensibility
features of XML enables you to create a new record
for that car with the additional attribute without
disrupting the other records as shown below.
<car>
<vehicle year="1996" make="Maruthi" model="DX">
<mileage> 76776878</mileage>
<color>blue</color>
<price>$32000</price>
</vehicle>
</car> |
In the modern
Web applications you have data of very specific
types. With HTML it is difficult to represent
such data and it is XML that provides means to
separate content (data) from presentation (how
data is viewed) in web documents. By focussing
on the content, let us see how the term metadata
relates to XML.
Metadata is
nothing but the data that describes data. It defined
a common language that allows data to be shared
among people and programs resulting in a more
effective communication.
Markup languages
such as XML provide a means to document metadata
in computing.
Defining XML
Documents ……… The minimal requirements for an
XML based system are illustrated in the diagram
below:
Figure 2. The minimum requirements for an XML-based
system
Firstly you
have the XML document itself that consists of
XML character strings collectively known as markup
along with the actual information content of the
document known as character data. This XML document
can be associated with a set of rules that specify
what order and occurrence of markup and character
data is permitted. For instance you could have
a rule stating that all the items must have a
unit price.
These rules
are housed in the Document Type Definition or
DTD. The role that XML processor (XML Parser)
plays in XML-based system is that it splits the
XML document (with or without the presence of
a DTD) into "chunks" of markup and "chunks" of
character data and this information is fed through
an XML application.
For a document
to conform to the XML standards, it must be well
formed. A well-formed document is one from which
an XML processor can successfully build a tree
structure. This can be done much without a DTD.
Well-formed XML documents can be further classified
as valid if they meet the constraints spelled
out in an associated DTD.
What could
one accomplish with XML?
There are a
number of reasons for application developers to
be motivated to use XML in their applications
and for generating Web contents. XML provides
a simple, robust, scalable, maintainable and reusable
programming model to generate static and dynamic
contents for web browsers.
As far as browsers
are concerned, the capabilities of XML provides
a number of benefits which includes:
The content
can be manipulated and rearranged. For example,
calculations can be performed to generate extra
content on the fly. The same content can be made
to look different for different users. The contents
can be intelligently searched within the browser,
based on what it contains.
Searching the
web has become so difficult the reason being that
search engines rely on full-text searches of information
instead of searches of descriptions of information.
The problem with such a search is that HTML is
weak when it comes to classifying information
in a web page. In order to search the web with
a reasonable amount of accuracy, we need an effective
way to catalog and describe data. Metadata provides
a solution to this problem.
The metadata
information about any given document could include
the following:
Authorship
information
Keywords and
descriptors
Recent updates
and revisions
For example,
Netscape has concentrated on using XML vocabularies
to describe metadata in a standard way named as
the Resource Description Format (RDF). Thus this
metadata browser will be able to integrate RDF
descriptions of data on the web to create a dynamic
navigation tool that allows users to easily navigate
the Web.
XML has numerous
advantages as it exhibits the following features:
Values are marked at the collection level, the
object level, and the field level to show what
they mean. Validating XML is not a problem because
there are two levels of checking available.
Overall structure
of an XML file is well defined. Using DTDs.
Nested tags
allow different kinds of data to be placed within
the same file.
The ordering
of the field need not be worried about as the
fields are tagged.
If additional
data fields need to be added, older implementations
will be able to ignore tags that have been added
for newer implementation.
The above example
mentioned is just one of the areas in which XML
proves to be a better option. There is a diversified
set of fields where you could see XML in action
like:
Online Banking
Push Technology with Microsoft Active Channels
Concluding thoughts
From above,
it is clear that XML is a neat technology that
allows to communicate using standard data formats
and it can thus contribute to a variety of enterprise
application areas like Information Distribution,
Searching and Pattern making, Application Integration,
Business transactions and Data transformation.
Thus XML has survived the initial skepticism and
is emerging as an essential tool for the E-Business
primarily because of its extensibility. This enables
applications to leverage structured and unstructured
data thus relieving one from the headache of managing
a major MIS initiative.
|