Chapter 6: ADaM Metadata and ADaM Define.xml

Metadata Spreadsheets

Variable Metadata in ADaM

Analysis Parameter Value-Level Metadata

Analysis Results Metadata

Building Define.xml

Define.xml Navigation and Rendering

Chapter Summary

As discussed in Chapter 1, one of the fundamental principles of ADaM analysis datasets is that they be accompanied by metadata. The ADaM 2.1 model document contains a full section on ADaM metadata, including illustrations of concepts such as analysis dataset metadata, analysis variable metadata, analysis parameter value-level metadata, and analysis results metadata. If you are familiar with SDTM metadata, then many of these ADaM metadata concepts will be familiar. They serve the same purpose of helping to describe the data or, in the case of analysis results metadata, describing the relationship between the analysis data and the analysis results.

In this chapter, we take what we learned in Chapter 2 about SDTM metadata and apply it to ADaM metadata and the creation of an ADaM define.xml file using a Base SAS implementation.  The CDISC Define-XML version 2.0 release package contains a sample ADaM implementation with two ADaM datasets and a dummy reviewer’s guide.   Building off of that is an even more complete example packaged as a part of CDISC’s Analysis Results Metadata specification.  The style sheet that comes packaged there will be used for the implementation to be demonstrated in this chapter.  Both packages are available from the CDISC website, www.cdisc.org.

More Information

Appendix C – ADaM Metadata

The %make_define2 SAS macro and the ADaM metadata spreadsheets used for this book can be found on the authors’ pages at http://support.sas.com/publishing/authors/index.html.

See Also

Define-XML version 2.0 (http://www.cdisc.org/standards/foundational/define-xml. )

Analysis Results Metadata version 1.0 for Define-XML v2.0 (http://www.cdisc.org/standards/foundational/adam)

Analysis Data Model version 2.1 (http://www.cdisc.org/standards/foundational/adam)

Metadata Spreadsheets

As stated in Chapter 2, good up-front SDTM metadata is essential for driving the entire SDTM implementation process. The same holds true for an ADaM creation process. Fortunately, many of the spreadsheets created for SDTM metadata can be used for ADaM metadata as well. We do not revisit those here, but if you are interested in seeing the specific ADaM metadata for our implementation, you can find it in Appendix C. However, two of the spreadsheets, VARIABLE_METADATA and VALUELEVEL_METADATA, deserve additional details for an ADaM implementation and are covered in the following sections. One set of metadata, analysis results metadata, is completely unique to ADaM and is covered for the first time in “Analysis Results Metadata.”

We should mention the existence of one additional column called DOCUMENTATION that is used in ADaM dataset metadata (which is captured in the TOC_METADATA spreadsheet) but not in the SDTM dataset metadata. It is here that you can provide further details of how the dataset was derived or additional references to protocols or analysis plans.

Variable Metadata in ADaM

Variable metadata in SDTM and ADaM are very similar. There are, however, fields that exist in one but not in the other. The columns for ROLE and ROLECODELIST, for example, have specific purposes for SDTM data but not for ADaM. They can therefore be ignored when you are using the spreadsheet for ADaM data. For your own implementation, you might choose to remove them altogether.

The updates in Define-XML version 2.0 are in many ways very friendly to the needs of ADaM metadata.  In addition, there are also changes to the new style sheets that contribute equally as well to efforts to standardize how ADaM metadata should be captured and represented.  Although the style sheet is not a part of the standard, it is important to discuss the two together since what people actually see when they view the metadata is determined by the style sheet.

One difference between the presentation of SDTM and ADaM variable metadata is with the Origin element.  In SDTM, Origin is presented as its own column in data definition tables (DDTs).  In ADaM, it appears under one column of Source/Derivation/Comment.  This is largely due to specifications in the ADaM version 2.1 model document.  One possible reason for the use of “Source” in ADaM metadata rather than “Origin” may be due to the ADaM fundamental principle that (emphasis added) “analysis datasets and their metadata should provide traceability between the analysis data and its source data.”  For practical reasons, the Origin metadata element is used to capture the ADaM variable’s source, but what is represented in the Origin field in ADaM metadata should be the immediate predecessor of a given variable rather than its “genesis” origin.  This is an example of how the define 2.0 standard has become friendlier to ADaM metadata: Among the list of controlled terminology for the Origin element’s type is “Predecessor.”

In our ADaM metadata and our %make_define2 macro, any Origin value that does not start with other values from the controlled terminology list (“CRF”, “Derived”, “Assigned”, “Protocol”, and “eDT”) is assumed to be a predecessor.  Any predecessor value that does not contain a period assumes that only the source dataset is being provided, and the variable names between the current dataset and the source dataset are the same.  The macro therefore uses the existing variable name to complete the predecessor value.  For example, AGE in ADEF can have an origin of just “ADSL,” and the macro will complete this to “ADSL.AGE” in the define file.  This is intended to save typing and reduce errors.  If, however, the predecessor has a different variable name, then the full origin (with a period between the source domain/dataset and the source variable) should be specified.

In addition to “Predecessors”, one of the most common origin types in ADaM metadata may be “Derived”.  The new style sheet presents derived fields in the Source/Derivation/Comment DDT column as “Derived:”, and appends computational algorithm metadata to come afterward. The colon provides good motivation for ensuring that your metadata are complete with such details, although admittedly, such “derivations” can sometimes be difficult to specify succinctly with pseudo-code or other methods.  As we will see later in the validation chapters, such derivations are now expected and can generate validation errors if not provided.

If a particular variable requires parameter value-level metadata, then there is no need to specify this explicitly in the variable metadata.  All that is required is that the dataset and variable name be provided appropriately in the value-level metadata spreadsheet.  This will ensure that the information is properly linked between the two spreadsheets. In our simplified examples, both BDS-structured datasets (ADEF and ADTTE) have only one parameter.  So while the parameter value-level metadata is not necessary, it will be applied in order to demonstrate the functionality.  In order to avoid confusion, information such as a codelist and computational algorithm will not be provided in the variable metadata for variables such as AVAL in BDS-structured datasets (ADEF and ADTTE in our example). Rather, just a comment to “Refer to parameter value-level metadata” is provided for these variables.      

Analysis Parameter Value-Level Metadata

An analysis dataset that follows the ADaM Basic Data Structure (BDS) can contain many analysis parameters. The analysis variable in that structure, typically AVAL or AVALC, can have different attributes depending on the parameter. For example, as shown in Chapter 2, values for different lab tests will likely require different formatting. Some tests might be displayed only as whole numbers, and others might need three decimal places in order to capture meaningful differences between values.

The analysis parameter value-level metadata can go a bit beyond just describing attributes of analysis variables. With the possibility to add numerous derived columns to a BDS dataset, many of which can vary in their derivation or meaning depending on the given parameter, the supporting metadata can get rather intricate. To demonstrate this, consider an analysis dataset, ADEF, that contains efficacy results derived from the SDTM QS domain. In our simplified example data, we have only one questionnaire test, the pain question. This translates to an analysis parameter in our BDS-structured ADEF dataset. The analysis variable, AVAL, captures the response to the pain question; BASE captures the baseline value; and CHG captures the change from baseline. The primary endpoint in the study is a responder definition where a subject is considered a responder if he or she has an improvement in the pain scale of two or more points at 6 months. In the ADEF dataset, we use the variable CRIT1FL to indicate whether a subject met this responder definition at the given visit. CRIT1 is a text string to describe the result captured by CRIT1FL.

In theory, each parameter could have a different primary set of criteria that should be flagged. For example, other pain scales might be less sensitive, and a 1-point improvement in the scale might be considered clinically noteworthy. The value of CRIT1 would then vary depending on the parameter, and the derivation of CRIT1FL would also differ from the derivation for the primary pain question. These differences would have to be documented at the parameter level. Even with one parameter, documenting these differences at the parameter level would be the proper way to do it. This documentation for our example is shown in Table 6.1.

Table 6.1: Selected Analysis Parameter Value-Level Metadata for ADEF

VARIABLE VALUEVAR VALUENAME TYPE LENGTH COMPUTATIONMETHODOID CODELISTNAME
AVAL PARAMCD XPPAIN Integer 8   PAINSCORE
CRIT1 PARAMCD XPPAIN Text 51   ADEF.CRIT1
CRIT1FL PARAMCD XPPAIN Text 1 RESPONDER YN

In order to make the data entry easier and less error prone, only the dataset name and variable name need to be entered into the value-level metadata spreadsheet.  From those, the %make_define2 macro will create a VALUELISTOID that is used to uniquely identify each row in the define file.  This is done as a simple concatenation of the two fields, separated by a period.

Using the same metadata fields shown in Chapter 2 for the SDTM, we can describe details at the parameter level that could not be shown otherwise. In rows where VARIABLE=AVAL, details for each pain parameter can be provided. In rows where VARIABLE=CRIT1, the length for each value of CRIT1 and the codelist that contains each unique value can be shown. Finally, rows where VARIABLE=CRIT1FL can display the computational method that might be unique for the given parameter, as well as codelists that might be parameter-dependent. Providing analysis parameter value-level metadata for multiple variables is more robust than the standard approach for SDTM value-level metadata. In the standard approach, only details pertaining to --ORRES variables for a given value of --TEST or --TESTCD are provided. With ADaM analysis parameter value-level metadata, details pertaining to essentially any variable for a given value of PARAM or PARAMCD can be provided.

Not shown in Table 6.1 is the WHERECLAUSEOID field.  For simple value-level metadata, where the where clause condition is simply based on specific values of PARAMCD (as indicated by the VALUENAME values), manually entering the where clause (and coming up with unique where clause OIDs) would be cumbersome.  The %make_define2 macro was therefore designed to programmatically create these simple where clauses.  The field exists in the spreadsheet in case a more complicated where clause is needed.  Otherwise, if the field is left blank, the macro will construct the where clause for you.  If you are interested in seeing examples of more involved where clauses, see the analysis results metadata in Appendix C.

In a later section, we will show how this information is represented in the define.xml file.

Analysis Results Metadata

A unique component to ADaM metadata compared to its SDTM counterpart is analysis results metadata. Analysis results metadata contain some information pertaining to the study results that you might find in the clinical study report (CSR). Most ADaM metadata provide the traceability that you need to understand the data’s lineage from SDTM to ADaM. But analysis results metadata provide the traceability that you might need to understand how the ADaM data are used to produce some of the key results that appear in a CSR. Although many might think of it as information that would be provided after analyses have been completed, it can be used, in conjunction with the statistical analysis plan (SAP), to provide SAS programmers with some details and specifications needed to carry out analyses.

The metadata captured in the ANALYSIS_RESULTS worksheet is described in Table 6.2.

Table 6.2: Description of Metadata Captured in the ANALYSIS_RESULTS Worksheet

Excel Column Description
DISPLAYID This is a unique identifier for the display (that is, a table, figure, or listing) that contains the analysis result. It is often populated with values that correspond to the number of the output table in the CSR. If you want to provide links from the define file to the CSR table, the necessary details must be provided in the EXTERNAL_LINKS sheet, and the DISPLAYID here must match the LEAF_ID in that sheet.
DISPLAYNAME A title for the display of the analysis results. Often a table or figure title.
RESULTNAME A text description of the analysis result. One analysis result can appear in multiple displays, and one display can contain multiple analysis results.
REASON The rationale for performing this analysis. It indicates when the analysis was planned. Extensible controlled terminology includes: SPECIFIED IN PROTOCOL, SPECIFIED IN SAP, DATA DRIVEN, and REQUESTED BY REGULATORY AGENCY.
PURPOSE The purpose of the analysis within the body of evidence (for example, a section in the clinical study report). Extensible controlled terminology includes: PRIMARY OUTCOME MEASURE, SECONDARY OUTCOME MEASURE, EXPLORATORY OUTCOME MEASURE.
PARAMCD The PARAMCD value to which the analysis applies, if applicable
ANALYSISVARIABLES The analysis variables to be analyzed. Often AVAL, CHG, or both in BDS-structured datasets. Multiple analysis variables should be separated by a comma.
ANALYSISDATASET The dataset from which the analysis variables come.
WHERECLAUSEOID The OID for the where clause that selects proper records for analysis. The WHERECLAUSEOID should exist in the WHERE_CLAUSES spreadsheet.
DOCUMENTATION A short description of the analysis.
REFLEAFID The value of a LEAFID in the EXTERNAL_LINKS spreadsheet. Can, for example, point to a section of the SAP that describes the analysis.
CONTEXT Specifies the software used for the PROGRAMMINGCODE or PROGRAM, if provided.
PROGRAMMINGCODE In accordance with ADaM principles, ADaM data should be analysis-ready. This would be the place to demonstrate adherence to that principle by providing the code to a statistical procedure that could be used to replicate the results.
PROGRAMLEAFID The value of a LEAFID in the EXTERNAL_LINKS spreadsheet that points to a script file that can perform the analysis.

The following screen shows a sample of the analysis results metadata produced for our fictional example study data. For readability, the PARAMCD column (column F) has been hidden, and the PROGRAMLEAFID (column N) is not shown.

image

The following screen shows a sample of the external link information needed for the ADaM metadata. Most of the rows are for LEAFIDs specified in the analysis results metadata, either as links to references for analyses (such as sections of the SAP) or to the output display itself, as it appears in, for example, the CSR.

image

Building Define.xml

With all of the metadata groups working together, including ones not shown here but covered in Chapter 2 for the SDTM data, our define.xml file will provide useful links within the file and external links to other documents and files such as data reviewer guides, SAS datasets, output results in the CSR, and original documentation in the SAP or protocol.

The code used to create the SDTM define file using Base SAS is also used for the ADaM define file. You can again refer to the authors’ pages (http://support.sas.com/publishing/authors/index.html) for the code and documentation that describe the details of what is being done. Here is an example call of the %MAKE_DEFINE2 macro:

%make_define2(path=C:ProjectsStudyXYZ123dataADAM-metadata,metadata=ADAM_METADATA.xlsx);

As with the SDTM metadata, all that is needed is to provide the path where the metadata spreadsheet exists and the name of the spreadsheet file itself. The macro will then build the define.xml file in the same directory.

Define.xml Navigation and Rendering

The style sheet provided with the ADaM Analysis Results Metadata release package controls the display of the ADaM metadata used in our example. This particular style sheet has a navigation or bookmark pane on the left side and uses JavaScript to collapse and expand the primary sections into subsections. Such style sheets make navigation of the define file much easier compared to one that does not provide the bookmark pane. Be aware that some browsers or operating system security settings might initially block the JavaScript from running. Even if the JavaScript content is blocked, the style sheet still displays all metadata content, but without the interactive functionality such as folding and unfolding of bookmark pane menu items.

The following screen shows how the define file should look with our metadata and the define2-0-0.xsl style sheet using Firefox v43.0.

image

At the top of the define file, you see a table of contents of the analysis results metadata discussed in the previous section. Below that are the details of the analysis results metadata organized by the display ID, and then the analysis result ID within each display. Further down are the data definition tables that display details about each variable from each analysis dataset. The following screen displays these details for ADEF.

image

There are too many variables to display all of them in one screenshot.  If you could scroll farther down, you could see that the variables AVAL, BASE, CRIT1, and CRIT1FL are in blue and underlined, indicating that they have hyperlinks. These hyperlinks bring you to the analysis parameter value-level metadata. The following screen shows these details. Also shown are the computational methods, including the one used for CRIT1FL.

image

When trying to render or view the define.xml, remember that the style sheet file must exist in the same directory as the define file and the datasets in order for everything to function properly.

Unfortunately, there are many factors that can create rendering problems when you try to view your define file in a browser. Certain combinations of the browser software, browser version, operating system, and security settings can all result in a blank or unreadable display. If you experience such problems in your computing environment, consider using a style sheet that does not use JavaScript, although even this is not guaranteed to resolve all issues.

Chapter Summary

Thanks to the many similarities between SDTM and ADaM metadata, many of the spreadsheets used within our Excel workbook to capture SDTM metadata can also be used to capture ADaM metadata. Likewise, much of the same code used to create the SDTM define file can also be used for the ADaM define file. However, there is one notable piece of metadata that is unique to ADaM: analysis results metadata. In this chapter, we looked closely at the analysis results metadata and also discussed aspects of value-level metadata that are more specific to ADaM.

As mentioned in Chapter 2, having your metadata in place before implementing a CDISC standard will help drive your process and ensure consistency. The same holds true for ADaM metadata, including analysis results metadata. Although certain features, such as links to CSR tables, will not be ready when you are first starting your implementation, certain things will (or can) be ready. Table numbers and titles, documentation, and programming code can be useful metadata components that serve as specifications that your ADaM data might eventually be checked against.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset