Understanding SOAP helps you better use SOAP implementations, and more importantly allows you to adopt SOAP as a general XML messaging medium. SOAP is a work in progress but is slated to become a W3C recommendation. As of this writing, the latest SOAP specification is the W3C Note available from http://www.w3c.org. W3C members from various companies, including DevelopMentor, IBM, UserLand, Lotus Development, and Microsoft, develop SOAP.
SOAP is an XML-based protocol, and defines three basic concepts:
An envelope that describes a message and how to process it.
Encoding requirements that describe message data types.
Remote Procedure Call conventions that allow for distributed method invocations.
In its most basic form, SOAP is used over HTTP to send a message to a SOAP server. In turn, the server implements some specific functionality and returns a SOAP response message back to the caller. This type of interaction uses HTTP’s inherent request/response design. The original SOAP message may be a method invocation and parameters; the response may be the return values.
A SOAP request may take the form of:
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <SOAP-ENV:Body> <m:GetLocalTemperature xmlns:m="http://localhost/temperApp"> <zipcode>90872</zipcode> </m:GetLocalTemperature> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
This message is sent over HTTP, and can be posted to a specific
URI capable of interpreting and responding to the SOAP message. The
return SOAP unit contains the response to the query GetLocalTemperature
.
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/> <SOAP-ENV:Body> <m:GetLocalTemperatureResponse xmlns:m="http://localhost/temperApp"> <Farenheit>59</Farenheit> </m:GetLocalTemperatureResponse> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
In this return example, the result of the method invocation is returned to the caller in a SOAP packet.
The current SOAP specification places no constraints on how a SOAP message is sent across the network. SOAP implementations are allowed to take advantage of any special features of their communications medium, be it HTTP, SMTP, or something yet to be imagined.
However, SOAP does define the concept of a Message Path. This critical concept allows a SOAP packet to be dealt with at intermediate steps along the way to its final destination. While it’s simple to think of the message delivery process as one that simply hops a message to its end-point, in reality this powerful concept mimics that of routing. It’s possible to add intelligence to a network to deal with SOAP packets, and distribute them where they need to go. This addition of intelligence to the network allows for a much greater level of scalability and traffic management by allowing multiple, distributed systems to route packets where they need to go, as opposed to forcing them through a central server.
SOAP requires intermediate processors to perform three steps in exact order:
The SOAP processor identifies any part of the SOAP message that’s intended for itself; that is, the application must understand which parts of the SOAP message relate to its own operation, and which parts do not.
The application must make a decision as to whether it can support all of the required processing that the message expects of it. If the application cannot, it must discard the message.
The application must remove the portions of the message that it has processed if it isn’t the end-point of the message, and is in fact just an intermediary or routing point. The removal must occur prior to the application forwarding the message to the next location.
For some middleware and routing applications, no parts
of the SOAP message will be intended for them specifically. In these
cases, the application may just look at the target URI or SoapAction
value, and route the SOAP packet
accordingly without modifying it.
The SOAP specification requires that all SOAP messages are encoded using XML. In addition, namespaces are used on elements and attributes, and any SOAP application must understand these concepts. The specification also dictates that messages with incorrect namespaces must be discarded—it defines two namespaces for use in SOAP. For envelopes, the correct namespace is http://schemas.xmlsoap.org/soap/envelope/.For serialization, the correct namespace is: http://schemas.xmlsoap.org/soap/encoding/.
These namespaces are associated with local names and inserted into element and attribute names per the W3C’s Namespaces in XML document. Interestingly enough, given their native development in XML, SOAP messages are not allowed to contain either a Document Type Declaration or a Processing Instruction.
A composite SOAP message contains three broad parts. The first and outermost part is the envelope. Beneath the envelope is the header. The header is the place where routing information or other nonapplication metadata may be stored. It is permissible in the eyes of the specification to temporarily modify SOAP headers during a routing or transport period, leaving the message in its original state when it finally reaches the destination. The SOAP body is the place where the application-specific payload resides.
Let’s consider the analogy of a physical package delivery. The envelope is obviously the shipping container, and the header data may be added and removed by transport stations regarding check-ins and checkouts. The body is the goods and materials nicely secured inside the box, not to be touched by anyone but the recipient.
From these constructs, the envelope and body are mandatory, whereas the header is optional. Furthermore, the specification requires that these additional constraints are minded when constructing packets:
For envelopes, SOAP requires that the element
names be Envelope
, with no
exceptions. The Envelope
element can optionally contain namespace declarations and
additional, informative attributes. However, if any of these
exist, they must be namespace-qualified. The specification
requires that SOAP messages have an Envelope
element marked with the
http://schemas.xmlsoap.org/soap/envelope/
namespace. If not within this namespace context, the
specification requires that the message is discarded.
For headers, SOAP requires element names to always
be Header
. The header is
allowed to have immediate child elements. Any child element must
be namespace-qualified.
For SOAP bodies, the element name must always be
Body
. The Body
element must be an immediate
descendant of the Envelope
element, and if a Header
element is present, it must immediately follow the
header.
SOAP allows for different serialization rules for SOAP
messages. To that end, the encodingStyle
attribute is used to
indicate which serialization techniques are used in the message. The
SOAP specification defines serialization rules within the document,
and utilizes the URI http://schemas.xmlsoap.org/soap/encoding/ to indicate
that this encoding style is in use.
SOAP allows for the extension of messages through
optional header data. The header
data may never actually be seen by sending and receiving end-point
applications, but may actually only be used and seen by intermediary
and middleware applications along the message’s path. However, there
is no requirement that forbids the use of headers by
applications.
According to the SOAP specification, headers must follow
a few rules. First, a header entry must utilize a fully qualified
element name within a namespace URI context. Second, the SOAP encodingStyle
attribute may be used to
denote the encoding style for header members. Third, the SOAP mustUnderstand
attribute and actor
attribute may also be used to indicate
processing directions.
actor
AttributeThe SOAP actor
attribute names the recipient of a header element. The recipient
is identified by URI.
mustUnderstand
AttributeThe mustUnderstand
attribute tells an application whether it is required to process
the information contained within the element. The mustUnderstand
element can have a
value of either 1
or 0
, with 1
indicating a positive condition
requiring the application to understand the element. A
nonexistent mustUnderstand
attribute is the same has having it set to 0
, or otherwise represents a false
condition.
The Body
element is
the primary piece of a SOAP packet with which an end-point application
is concerned. The Body
element
represents the SOAP packet’s payload.
Child elements of the Body
element are called body entries.
Each body entry is encoded as an independent element within the SOAP
body element. A body entry requires a namespace URI and local name.
The encodingStyle
attribute can be
used within body entries to indicate their encoding style.
The SOAP Fault
element is used to communicate error conditions back to a calling
application. The SOAP Fault
may be
used to communicate any type of failure relevant to your
application.
A Fault
element may have
the following four children elements:
faultcode
The faultcode
element
is required to appear within Fault
elements, and provides a
numeric code to applications for easier management of error
messages. The SOAP specification defines a few fault codes
automatically, covered in the Section 9.2.7.2
section.
faultstring
The faultstring
element is required within Fault
elements, and can be any type
of description appropriate for the error.
faultactor
The faultactor
element is used to pinpoint which actor caused the fault if
the message followed along a message path. If present, this
element indicates the origin of the fault. If an intermediary
application causes a fault, the specification requires that
the intermediary shows itself in the faultactor
element. The value of a
faultactor
element is a
URI.
detail
The detail
element
allows for application-specific information associated with
the XML payload in the Body
element. For example, if a business logic error occurs in your
distributed SOAP-powered application, the business error
detail rides in the detail
element. On the other hand, if an intermediary causes a
problem during the routing process, the detail
element is not used to
communicate the information. Like the Body
element, the detail
element allows for detail entries to be present as
immediate children of the detail element.
The fault codes defined by the SOAP specification list four different error conditions. If one of these conditions occurs, the following fault codes must be used. These fault codes are in the space defined by the URI prefix http://schemas.xmlsoap.org/soap/envelope/. The SOAP specification hopes the fault codes are extensible and will be used by developers. By default, the specification includes:
Used when an invalid namespace is used for the SOAP
Envelope
.
Used when an element is not understood or processed by
an application, but its mustUnderstand
attribute is set to
1
.
Used when the message is not well formed, or did not contain required information for success.
Used when the message cannot be processed by the server
for reasons other than physical makeup. That is, you may have
formatted your GetLocalTemperature
call correctly,
however the server could be offline momentarily. When this
error comes up, it is possible that the application may try
again at a later time.
The Client and Server classes of errors are meant to be extensible, so that a programmer could define a fault such as Client.AccessDenied or Server.Unavailable. The complete URI for Client.AccessDenied is http://schemas.xmlsoap.org/soap/envelope/Client.AccessDenied.
For faults that are not described by the SOAP specification, it is legal to use URIs that begin with a different prefix.
SOAP encoding defines a format for data types communicated in SOAP packets. If SOAP is to be used for Remote Procedure Calls (RPC) between applications, then application-specific data must be marshaled to and from the involved parties. These applications must be able to understand the types of data—to be able to distinguish, for example, arrays from strings and numbers from letters.
In the world of SOAP encoding, the SOAP specification sees two
types of data. Simple scalar types (dog =
"foo"
) and compound types (dog =
{"foo" : "bar", "bar" :
"foo"}
). SOAP encoding uses the namespace
URI http://schemas.xmlsoap.org/soap/encoding/.
SOAP does acknowledge that other types of encoding schemes may be used, but for applications to be interoperable, it’s easiest if they use the same encoding.
There are nine golden rules for data serialization using SOAP. These rules establish guidelines for both simple and complex data types and data representation. These nine rules are explored and illustrated in practice following these simple guidelines.
All data values must be represented as element content. This means that data is inside elements, not inside attributes as in:
<specialSymbol>DataValues</specialSymbol>
and not:
<specialSymbols symbol1="DataValue1" symbol2="DataValue2"/>
When an element contains a data value, the value must have one of the following features:
Have an xsi:type
attribute
Be contained within an element with a SOAP-ENC:arrayType
attribute
Have a type determinable from a schema
Simple values are represented as character data without any child elements. Simple values must have a type referenced in the XML Schemas Specification.
Compound values are represented as a sequence of elements. Access methods are represented by an element with a matching name. Qualified names must be used unless the access names are local to their containing types.
Multireference simple or compound values are represented as
independent elements with a local attribute ID of type ID (the ID
type listed in the XML Specification, which must be unique within
any XML document instance). Any access to this simple or compound
value must have an attribute named href
that points to a URI fragment
identifier referencing the element.
Strings and byte arrays should be multireference simple types, but rules exist for efficient representation in common cases. See the specification at http://www.w3.org for details.
Multiple references to a value can all be encoded separately, but only if the meaning of the XML instance is unaltered as a result.
Arrays are compound values. Arrays must have a type of
SOAP-ENC:Array
or a derived
type. SOAP arrays may be multidimensional, with the rightmost
index advancing first. SOAP arrays need a SOAP-ENC:arrayType
attribute that
indicates the contained element’s type and dimensions. In its
simplest form, the attribute may appear as:
arrayTypeValue: array-type array-size
where <array-type>
is an
XML Schema-defined type, and
<array-size>
is an integer
indicating the size of the array. Things get trickier when
encoding a multidimensional array. In the case of
multidimensionality, <array-size>
is a comma-separated list of integers.
A null value doesn’t require an accessor element.
However, a null value may be present and represented with an
accessor with an xsi:null
attribute set to 1
.
While these rules may seem quite complicated, learning more about types helps demystify them. When working with some SOAP APIs (and hopefully all SOAP APIs), such strict data typing is not manually required, and is taken care of by the API.
The SOAP specification declares that it adopts the types found in the XML Schema Part 2: Datatypes specification. In other words, the SOAP drafters are not reinventing the wheel, but utilizing the work done for the XML Schema effort.
Using established data typing makes data encoding far simpler to understand than the list of nine rules presented in the previous section. For example:
<element name="FirstName" type="xsd:string"/> <element name="LastName" type="xsd:string"/> <element name="Address1" type="xsd:string"/> <element name="City" type="xsd:string"/> <element name="State" type="xsd:string"/> <element name="Zip" type="int"/> <element name="BalanceDue" type="float"/>
The SOAP specification recognizes two primary types of compound data: structs and arrays. A struct is a compound type in which members are given names, and the names are used to access the values. An array, on the other hand, is an ordered list in which an integer index is used to access the values.
SOAP fits naturally over HTTP. SOAP’s request/response
RPC-style transactions are perfect for HTTP’s request/response
protocol. When sending SOAP over HTTP, the content-type must be
text/xml
.
The SOAPAction
HTTP
request header field is used to indicate the “intent” of the SOAP
request. A client is required to supply this header in a request.
The value of the header is a URI, but the specification places no
restrictions on what the URI represents.
SOAP over HTTP uses a hybrid combination of traditional HTTP response codes coupled with their equivalent meanings for the fate of the SOAP packet. That is, even if the HTTP request itself is okay, if for some reason there is an error on the server side while processing the request, the server must send back an HTTP 500 Internal Server Error. This is a slightly different process than that of HTTP, which only gives such a response when a CGI or ASP page ungracefully bails out of its execution. With SOAP, the execution of the SOAP server may proceed just fine, but if the logical execution of the SOAP message fails, the HTTP 500 error is returned.
Using SOAP for RPC-style development is really nothing different from using SOAP for any other purpose. The semantics of request/response are still present. A SOAP method invocation is just a SOAP envelope with a method name as payload, accompanied by any data parameters. The response is either the return value or the error status, also within a SOAP envelope.
When performing RPC with SOAP, the method calls and return values are stored in the SOAP body.