Cocoon and the Mediator

The NGWMN Mediator is an Apache Cocoon 2.1 application.

Cocoon is not well understood in general at CIDA, and the specifics of how the Mediator uses Cocoon also need to be explicitly described.

Documentation is necessary, or else the Mediator is going to impose a high recurring cost on CIDA: historically, every time a developer has been assigned to this project, just getting up to speed has been a research project in its own right.

Hence this page. Please update it as appropriate.

Cocoon in general is described first. Then, the Mediator is described in terms of the general Cocoon description.

Apache Cocoon

Cocoon TL;DR

Cocoon is a simple idea: a declarative mapping between

The actions taken are, in Cocoon, implemented as processing pipelines. These pipelines are ordered sequences of

actions that construct the HTTP Response to be returned. A pipeline may contain internal decision logic, which means that, although it's a sequence of steps at execution time, it's a tree structure in its definition.

Cocoon pipelines use SAX events, a design decision that favors run time efficiency and reduced memory footprint. Cocoon is intended primarily as a high-volume, high-speed request/response processing framework.

The Cocoon Application structure

Cocoon applications can be confusing to understand, because configuration, behavior, resources, and source code are all placed in a recursive folder structure.

Cocoon Request/Response Operation

A Cocoon application unit of work is a single HTTP Request.

(Well, yes. It's a... web app.)

Depending on the match characteristics of the Request, work is assigned to a single pipeline. (Note: pipeline definitions, including matchers, are found in the sitemap.xmap file (described below). For a given Request URL, the matchers in the most local sitemap are evaluated in document order.

Cocoon Request/Response Diagram

This is an extremely simplified, but still conceptually correct, diagram of how Cocoon handles an HTTP Request.

Review it. Seriously, it will make everything else easier to follow.

There is one kind of file that governs the behavior shown in the preceding diagram: sitemap.xmap.

sitemap.xmap: the XML files that define the behavior of your Cocoon application

A sitemap.xmap file defines

All of the conditional logic structures will resolve to:

All in all, remember this:

All of your sitemap.xmap files, taken together, define the application logic.

sitemap.xmap does some other things (like setting up debugging and testing logic, because it's too sane and handy to do it there), but if you want to know about application behavior, your mantra in Cocoon should always be "It's all about the sitemap.xmap files."

sitemap.xmap syntax consists of three kinds of logical component:

Cocoon Pipelines

A "pipeline" is an ordered sequence of discrete steps:

A pipeline may also contain

Note that although a Pipeline is an ordered sequence of action, its definition is, generally speaking, a reentrant tree, because of the conditional logic.

Note also that it is completely OK to define a pipeline with nested matchers. You can read the Cocoon documentation until the cows come home and you won't actually find that mentioned. (True story: I was refactoring the Mediator's sitemap.xmap, and feeling frustrated because the refactoring was topologically inconsistent with the application URL structure. "This might not work. Why couldn't I just nest matchers??" I snarled. Then I thought, "Wait. What if I can nest matchers?" I could, and I did, and it worked great. So, now you don't need to endure that. Moral of the story: the Cocoon documentation, while admirably clear in many respects, omits some very helpful, and in retrospect completely obvious, clarifications. Don't hesitate to try something just because it's not described or shown.)

Pipeline beginning: the Generator

A generator has one specific responsibility: to emit data as SAX events.

The presumption is that a generator is obtaining XML data from an external source - e.g., a web service - and then parsing it; but a custom-built generator can also be receiving prebuilt SAX events from some upstream source, or providing static data for test purposes. The point is, the generator will always emit SAX events and nothing else.

Out of the box, generators come in two basic flavors:

Pipeline middle: Transformers

The transformers are an ordered sequence of data transformations (including computations, calculations, inclusion of contingent content, sorting, filtering, etc). These are commonly written as XSLT transformations. Cocoon is also capable of invoking an XQuery application for executing really complex logic, but that's pretty rare (and doesn't occur in the NGWMN Mediator AFAIK.)

It is common for the XSLT files to be stored in the Cocoon application folder structure at an appropriate scope. Some XSLTs are very specific to a particular logical case; others are shared resources used in multiple cases.

Pipeline end: the Serializer

The final step is an output step, normally a serialization to a wire format suitable for an HTTP message body. The serializer is ordinarily (and in the case of the NGWMN Mediator, always) an XML writer. It can, however, be a custom-built output formatter that creates such outputs as tabular CSV/TSV, JSON, etc.

Pipeline inspection hatches: Views

A view is primarily a way of inspecting the content of the pipeline at some intermediate point. The usefulness of this is pretty obvious: it's very handy to be able to inspect various inputs and outputs if you're trying to diagnose a bug, or inspect the anticipated input to an XSLT, or something like that.

A view is simply a short-cut to a serializer that is invoked at a specific place in the pipeline. Views are invoked by adding a reserved "cocoon-view" request parameter to the Request URL. The value of the cocoon-view request parameter is the name of the view, as defined in sitemap.xmap.


The Mediator

Particulars of the Mediator/Hub Cocoon Application

Fortunately, the Mediator's behavior is not defined in multiple places. There is one specific sitemap.xmap that governs the Mediator Hub. All of the application logic is there.

This application definition sitemap.xmap is  Path redacted  (You may now perform your Happy Dance for up to two minutes, because I just saved you a great deal of research.) As a general rule, all Mediator control logic can be found in that single sitemap.xmap. You will not need to look elsewhere unless your task involves more unusual configuration or definition work.

The GWDP Mediator pipelines are pretty straightforward. They, and their data, are accessible via three URLs with different jobs.

NGWMN Mediator: data access diagram