Exercising and Testing the NGWMN

This page contains explanations, specifications, procedures, tips, tricks, folklore, and arcane voodoo rituals that can help a developer exercise the pieces correctly.

The 3 basic parameters

For any kind of work with NGWMN, you will typically need to know three bits of info:

The service-alias, which defines which service you're calling.
The agency-alias, which identifies the source from which the Mediator gets its data.

The site_id, which is the agency's identifier for the well.

Note that these parameters are actually named different things in different parts of the app and for different data. For instance, for the SOS data service from the Mediator, site_id is called featureId, for QW data it is called siteid. These variations are described on the NGWMN Data and Service Definitions page.

The NGWMN Data Cache Dashboard

This is the place to go to:

Find out what wells are available for the NGWMN on your server.
View metadata about the Cache and the prefetch operations that populate it.
Manually execute a prefetch service call to refresh Cache Data for a particular well, for a particular service.

To get there, go to <your server's hostname and port>/ Context redacted .

For example, for the dev server: http: Hostname redacted / Context redacted .

This gets you to a display page that lets you access wells, and metatdata about cache prefetch operations.

The most useful links are:

Well List (complete flat list of wells on the NGWMN server)
Well Prefetch by Agency (wells grouped by providing agency)
- Prefetch list for IL EPA
- Prefetch list for MPCA
- Prefetch list for ISWS
- Prefetch list for NJGS
- Prefetch list for MN DNR
- Prefetch list for TWDB
- Prefetch list for MBMG
- Prefetch list for USGS

The cache can only access wells that are listed in the Well Registry, which is a separately managed component. The Well Registry varies by tier: the Dev and QA registries (and thus those caches) do not contain all the wells. I think this is by design so that the non-production servers do not use so much storage and/or are easier to test. If you are trying to diagnose a problem w/ a particular well in production, it may not be possible to access it on Dev or QA.

Mediator

The Mediator is an Apache Cocoon application, which means that it's a set of processing pipelines that are mapped to characteristics of an HTTP Request. The output of the selected pipeline is used to generate the corresponding HTTP Response. See the Cocoon and Mediator Technical Overview for more detailed information.

The one most important thing you need to know about the Mediator: Context redacted /sitemap.xmap. Application logic is defined in Cocoon by sitemap.xmap, which declaratively describes the Request matches and the structures of pipelines. sitemap.xmap files are distributed recursively in Cocoon. Fortunately, the Mediator's logic is defined in a single sitemap.xmap file. See

How do I get a Mediator result for a particular service for a particular well?

Use the Core Service APIs.

The Core Service APIs include the way to call the Mediator directly.

How do I get the input information to the Mediator for a particular service for a particular well?

Source view, documented here, is a Mediator Specialty Service that uses the 'cocoon-view=source' flag on a mediator request.

How do I get the original source data from the provider agency for a particular service for a particular well?

This is unfortunately a bit harder. Go to Context redacted /sitemap.xmap and look up the web service call listed there. You will have to piece together the appropriate url using the logic in the xmap file. Unless you are paranoid or have reason to suspect something weird is happening, you should be able to use the 'cocoon-view=source' flag described above.

How do I do a Regression Test on the Mediator?

A scripted Regression Test has been written for the Mediator. It is included in the Mediator source code Internal Project redacted in its own separate subfolder.

The regression test is a brute force approach, which favors accumulating a lot of information as the default behavior, and allowing the tester to decide how, and how heavily, to inspect the new data. This approach is potentially very expensive in terms of time and server load. Therefore:

The regression test is not intended for automated use. It should be executed by a human (preferably one who understands what s/he is doing.)
The regression test should be used on the dev server, or perhaps QA; never prod unless there's a sufficient reason for doing so. It's intended to catch regressions due to code changes, not server deployment errors.

The regression test works as follows:

There is a baseline or reference (aka "ref") dataset that contains all of the data for all of the wells, for all of the agencies, for each service that can be called. This is established once, and replaced only seldom. (The dev server uses a reduced dataset; it took ~ 1.5 hours to get everything.) This is in a "ref" folder on the machine from which you are running the tests.
When you want to perform a regression test, you specify how much of the full dataset you want to test. The parameters (set in a configuration file) are:
1. Which agencies
2. Which services
3. the maximum number of wells per agency.
Executing the regression test creates a test data folder (which can basically be named anything except "ref") containing the data.
The regression test offers a superficial canary-in-the-mine utility, quick-scan-new-results.py, that can quickly identify certain kinds of discrepancies. However, it also stores the messagebody of every successful("successful" meaning HTTP status code < 400) invocation. You can do what you like with those XML files: compare them with manual applications such as oXygen; script automated inventory using python, or command line with xmllint; do simple file compares; look at them in a text editor and gnaw your thumbnail; etc. It's up to you.

A quick regression test HOWTO

First: Get the latest regression test code and data from source control. Note that, if ref.zip has changed, that will take a little extra time to download.

Next: you will need to install Python and the 'requests' and 'lxml' Python modules.

Installing the required libraries on OS X via Homebrew

If you have OS X with the Homebrew package manager:

➜ ~ $ brew install python ➜ ~ $ brew link --overwrite python [Update the links from the system default Python] ➜ ~ $ pip install requests ➜ ~ $ pip install lxml ➜ ~ $ python -V [Should return 2.7.5 or greater] ➜ ~ $

Next: Edit test-config.txt to set up your regression test. For example, if you wanted to test sos for MBMG, USGS, and TWDB only, and want no more than 15 wells per agency, you would edit as follows:

========== AGENCIES: # This will determine which agencies you want the test to run against. # Just comment out the ones you do not want. #IL_EPA #ISWS MBMG #MN_DNR #MPCA #NJGS TWDB USGS ========== SERVICES: # This will determine which services you want to test. # Just comment out the ones you do not want. #qw sos #wfs ========== MAX WELLS PER AGENCY: # Each well, and its three service calls, will take time. # This config section allows you to restrict the number of wells so you # can run shorter regression tests. # If you supply multiple numbers, the script will take the last uncommented # one. # If you leave no number uncommented, the script will do ALL wells for all # selected agencies. 15

Next: Unzip the ref.zip file into a folder of the same name (just ref).

Next: at the commandline, run get-well-data.py:

Make sure the username & password credentials have sufficient permissions to call the Mediator's web service endpoints. (hint: Hint redacted ). Your selected name for an output subdirectory for the data can be any legal folder name, but must not already exist. To make a point, let's use "murgatroyd" in this example.

➜ ~ $ python get-well-data.py <user> <password> murgatroyd ➜ ~ $

Next: wait till it finishes. This one will be fast (it's configured for only 45 service calls, maximum: at most 15 for each of the three agencies). Note that the Response HTTP Status Code will be color-coded: anything >= 400 will be red , otherwise green .

Next: See if there were any discrepancies in status or content-length. Run:

➜ ~ $ python quick-scan-new-results.py murgatroyd ➜ ~ $

If everything thing is good, there will be no output. If there are a lot of discrepancies, you may want to rerun the scan and make a hard copy of the readout:

➜ ~ $ python quick-scan-new-results.py murgatroyd > omgomg_were_all_gonna_die.txt ➜ ~ $

From this point forward, you can inspect the XML data for the pieces showing discrepancies (or, if you think it's warranted, for all of the new data in murgatroyd, even if it didn't trigger any discrepancies noted by quick-scan-new-results.py.) Because this is simply text data (for the HTTP response header info) and XML (for the response message bodies) you can use any tools you please.

The parts of the regression test: what they are and how to use them

These are the code and data resources that you will see if you list the contents of the regression test directory:

get-well-data.py

This is the main script of the regression test. It's the one that actually executes the HTTP calls and constructs the results folder.

dependencies:
- wells.txt
- test-config.txt

usage:
- python get-well-data.py username password newfoldername
  - username and password for any Tomcat account (e.g. Example redacted ) that has sufficient permissions on the server.
  - newfoldername is the name of the folder that will be created. You can call it pretty much anything - except "ref", which is a reserved foldername. Also, you need to delete any preexisting folder with that particular name. get-well-data.py will not overwrite an existing result directory.

parse-well-list.py

This is the script that generated wells.txt. It is also the one you would use to regenerate it if you want to change the list of wells. Warning: if you decide to regenerate the list, you must also regenerate your ref directory. Which is expensive and time-consuming.

So don't blow away the old one, if you still have it. Rename it or move it to protect it from being overwritten. (Of course, you can always get it out of source control.)

usage:
- python parse-well-list.py username password (host)
  - username and password for any Tomcat account (e.g. Example redacted ) that has sufficient permissions on the server.
  - optional host (defaults to Host redacted ) is a url of the form http://hostname:port". Realistically, you won't use this unless you're setting up for a run against QA or (fate forfend) Prod.

ref.zip

This is the archive containing the official baseline data for comparison to your new run. Do not edit this file. Ever. ...But if you do, decide whether to commit your changes (e.g., if it's intended to be only a local copy, be very careful to not commit your changes.)

Please remember that the ref data is server-specific.

If regenerating this data, here's your quick checklist:

If you also plan to update wells.txt, do that first.
Make sure that wells.txt specifies the same server for which you will be doing the ref regeneration.
Think very carefully about the configuration entries you want in test-config.txt. You will be hating your life if you do a big long run without setting up the configuration correctly.
Remember, for a regeneration—an update of the baseline—you should commit your changes unless something awful happens.

Your regeneration operation should go like this:

Edit test-config.txt to ensure all agencies and services are uncommented, and that you do not have any uncommented number entry under MAX WELLS PER AGENCY. These things will ensure that get-well-data.py will not exclude any wells.
Verify that you have the correct server selected in test-config.txt.
Run get-well-data.py. Pick a non-confusing folder name, like newref.
Go do something else. Have lunch, work on a different project, change the transmission in your car, something.
If it completes successfully, change the name of your target folder to ref/. And I know you verified that the content of ref was backed up. Say you started with newref as suggested above:
➜ ~ $ rm ref ➜ ~ $ mv newref ref ➜ ~ $
If you want to create a new ref.zip, do this, or the moral equivalent of this:
➜ ~ $ zip -r ref.zip ref/ ➜ ~ $
If you intend for this new zipfile to replace its predecessor in source control, commit it right away, before you forget.

wells.txt

This is the list of wells eligible for regression testing. It's a simple text file, in which each line represents a single well. Each line is of the form

agency_id|feature_id

and yes, that's a pipe character delimiter. Pipes pretty much never break, unless they get sideways with a regex. You're welcome.

Never edit this file. Just regenerate it if you must; best is just to retrieve the old one from source control.

quick-scan-new-results.py

This is a superficial check of an http.txt file, comparing the new regression test's HTTP responses against the matching HTTP Responses from ref/. It will flag any well/service combination for which:

The HTTP Status Code is different; or
The HTTP "content-length" header shows a decrease in messagebody size in bytes.

This script writes its findings to stdout.

usage:
- python quick-scan-new-results.py newfoldername
  - newfoldername is the local name of the folder in which you have stashed your regression results.

If you want to have a written record of the findings, just do

➜ ~ $ python quick-scan-new-results.py newfoldername > somefilename.txt ➜ ~ $

Please, if you want to perform other HTTP-level checks, you can either write them into this script, or - probably safer - make a copy of this script with a different name and hack on that one. If you do add your new file to source control, don't forget to document it.

test-config.txt

test-config.txt is the config file that get-well-data.py obeys.

The first rule of text-config.txt is that you do not modify the header names. If you do, you will break get-well-data.py, which is expecting those exact headers.

If you really really want to change a header, you should understand get-well-data.py well enough to make corresponding changes there. And then test it thoroughly.

test-config.txt provides the ability to configure the following characteristics of the get-well-data.py operation:

HOST: the HTTP address of the server (dev, QA, prod) against which the regression test will be executed
AGENCIES: the back-end data providers/owners against whom the regression tests will be run
SERVICES: the services (currently available: 'qw', 'sos', 'wfs') against which the regression tests will be run
MAX WELLS PER AGENCY: A limit on the number of wells, in any agency, whose data get-well-data.py will attempt to obtain.

Specific directions about how to configure the different characteristics are written into the config file itself. Please keep this updated.

Other things that are not in source control, but that may show up in the regression test directory:

ref/: somebody unzipped ref.zip here. Or accidentally named a different subdirectory "ref" because they were deeply confused.
other directories: probably the results of regression test runs.
files with content like this will be saved results of a quick-scan-new-results.py run:
- labrat/IL_EPA/P407875
- sos
- new data content-length is 760; reference data content-length was 766.

The folder structure of regression data

The internal structure of ref and of your generated test data is the same:

test or reference folder
- agency_id
  - feature_id
    - http.txt
    - qw.xml
    - sos.xml
    - wfs.xml

In other words, your test data is grouped by agency, then by individual well; and for each well, there will be at least one file (http.txt, which records the status and headers of the response for each HTTP service call you attempted: win, lose, or draw) and possibly several more (for each successful response, an XML file holding the messagebody.)

NGWMN Cache

How do I get the actual data stored in the Cache for a particular service for a particular well?

Use the Core Service APIs.

The Core Service APIs include the ways to call the Cache directly.

How do I force a data prefetch for a particular well?

There are two ways to do this:

Manual: use the NGWMN Data Cache Dashboard.
HTTP Service call (via command line or script): use the prefetch service.

How do I find out what the Cache sends to the Web App via AJAX for a particular well?

Use the Direct AJAX Support for UI service.