Chapter 7: Implementing ADaM with Base SAS

ADaM Tools

ISO 8601 Date and DateTime Conversions

Merging in Supplemental Qualifiers

ADSL – The Subject-Level Dataset

The ADaM Basic Data Structure (BDS)

ADAE – Adverse Event Analysis Datasets

ADTTE – The Time-to-Event Analysis Dataset

Chapter Summary

When your SDTM implementation has been mapped out, your SDTM datasets constructed, and your ADaM metadata is in place, you are ready to start producing your ADaM datasets. In this chapter, we demonstrate an ADaM implementation using an approach similar to the one we used to demonstrate the SDTM conversions in Chapter 3. We produce four ADaM datasets: ADSL, the subject-level dataset; ADEF, a BDS-structured dataset of the pain efficacy data; ADAE, an analysis dataset of the adverse event data; and ADTTE, an analysis dataset for time-to-event data. Keep in mind the two fundamental principles of ADaM, which were also discussed in Chapter 1, as we create these datasets:

1.   Keep the data “readily usable by commonly available software tools” (also known as being “one statistical procedure away” or, in SAS parlance, “one PROC away”).

2.   Make sure that the data provide “traceability between the analysis data and its source data” (ultimately SDTM).

Before diving into this, however, some tools that can be used for common SDTM-to-ADaM conversions will be introduced.

See Also

Analysis Data Model version 2.1 (http://www.cdisc.org/standards/foundational/adam)

ADaM Structure for Occurrence Data (OCCDS) version 1.0 (http://www.cdisc.org/standards/foundational/adam)

ADaM Basic Data Structure for Time-to-Event (TTE) Analyses version 1.0 (http://www.cdisc.org/standards/foundational/adam)

ADaM Tools

As you work on creating ADaM datasets, you will find the need to develop a few tools or macros to assist with automating repetitive tasks such as converting SDTM DTC dates to numeric dates and merging in supplemental qualifier information. These topics are discussed in the following sections.

ISO 8601 Date and DateTime Conversions

Numeric dates in ADaM are typically stored as SAS dates, which represent the number of days since January 1, 1960. Similarly, date/time combination variables are typically stored as SAS date/times, which represent the number of seconds since January 1, 1960. SAS 9.2 has a new class of formats and informats devoted to reading in and creating ISO 8601 dates, times, datetimes, and durations. The informat e8601DAw. can be used for converting SDTM DTC date values to SAS dates. The informat i8601DTw. is convenient for converting SDTM DTC date/time values to SAS datetimes. The log from the following code demonstrates how these informats can be used.

Log 7.1:  Use of ISO 8601 Informats

data a;

    adtc = ‘2008-09-15’;

    adt  = input(adtc, e8601da10.);

    bdtc = adtc || “T15:53:00”;

    bdtm = input(bdtc, e8601dt19.);

    put adt=date9. bdtm=datetime16.;

run;

adt=15SEP2008 bdtm=15SEP08:15:53:00

NOTE: The data set WORK.A has 1 observations and 4 variables.

   The date width is specified at a length of 10. To prevent errors during execution of your SAS code, you must ensure that the date is complete before trying to convert it to a SAS date. Note that for dates of this length, this informat is no different from YYMMDD10.

   As shown, note that the width of the datetime field needs to be 19. Because seconds usually are not captured in clinical trial databases, an SDTM DTC variable with hours and minutes collected would need :00 appended to the field in order to convert the value to a SAS datetime.

Because SDTM DTC values might contain incomplete dates and times, every conversion from a DTC value to a SAS date or datetime must first involve a check of the field’s length in order to determine what parts of the value can be converted. A repetitive task such as this, applied to a number of date values used for analysis, would be best handled by a macro. The DTC2DT macro can be used to convert a DTC variable to a SAS date or date/time for an ADaM dataset. The macro can also calculate relative day variables if a reference date is provided as a parameter. This macro is shown below:

%macro dtc2dt(dtcvar , prefix=a, refdt= );

  if length(&dtcvar)=10 and index(&dtcvar,'--')=0 then

    &prefix.dt = input(&dtcvar, yymmdd10.);

  else if length(&dtcvar)=16 and index(&dtcvar,'--')=0 and

    index(&dtcvar,'-:')=0 then

    do;

       &prefix.dtm = input(trim(&dtcvar) ||":00", e8601dt19.);

       &prefix.dt  = datepart(&prefix.dtm);

    end;     

  %if &refdt^= %then

  %do;

          if .<&prefix.dt<&refdt then

            &prefix.dy = &prefix.dt - &refdt;

          else if &prefix.dt>=&refdt then

            &prefix.dy = &prefix.dt - &refdt + 1;

  %end;

%mend dtc2dt;                                   

First, the macro checks the length of an SDTM DTC variable and then converts the date or datetime to a SAS date or datetime if the length of the field is sufficient. It assumes, by default, that the SAS date or datetime is for an ADaM BDS dataset and the creation of one or a combination of ADT, ADTM, and ADY. (Note, however, that the –DY variable is created only if the REFDT parameter is assigned.) Because the ADaM convention is that all numeric date variables end in DT and that all analysis study day variable names end in DY, these suffixes cannot be changed. However, the prefix can be changed from “A” to something else via the PREFIX parameter.

Let’s say you have a lab date in an LB SDTM dataset that you want to convert to ADT for an analysis dataset, ADLB. A sample call of the macro in a DATA step for such a conversion might look like this:

Data ADaM.ADLB;

  Merge SDTM.LB ADaM.ADSL (keep=usubjid trtsdt);

    By usubjid;

    .

    .

    .

    ** convert the lab date to a SAS date for variable ADT;

    ** and calculate the study day for variable ADY;

    %dtc2dt(LBDTC, refdt=TRTSDT );

Run;

Because the REFDT parameter was assigned, the ADY variable will be created in addition to ADT.

It is common practice to impute values for incomplete dates. What is not common is how this is implemented from one organization to the next or from one study to the next. Imputation conventions can range from the simple, such as always imputing missing days to be 1 (for the first of the month), to the complex. Examples could include imputing one value for events that occur during the same month as the first dose of study medication; another value for events that are known to occur before the first dose; another value for events that are known to occur after the first dose; and then a completely different set of rules depending on whether the date is associated with an efficacy result or a safety result. Whatever your imputation method might be, you can consider incorporating it into the DTC2DT macro.

Merging in Supplemental Qualifiers

You will often need to merge supplemental qualifiers from a SUPP-- SDTM dataset into an ADaM dataset. For example, there might be important dates stored in SUPPDM that will be needed for the creation of ADSL, or a treatment-emergent flag stored in SUPPAE that will be needed for the creation of ADAE.

Consider the sample data in the SUPPDM dataset from Chapter 3. Data from the first four subjects are shown in Table 7.1. All subjects have their date of randomization saved as supplemental qualifiers; and subjects with an Other race have the specification of their race also saved as a supplemental qualifier. Both pieces of information might be needed for construction of the ADSL dataset. To do this, you would have to first transpose SUPPDM, converting records where QNAM=’RACEOTH’ and QNAM=’RANDDTC’ into a horizontal structure with RACEOTH and RANDDTC as variables, and then merge those variables in with the ADSL dataset.

Table 7.1:  Supplemental Qualifiers to DM

STUDYID RDOMAIN USUBJID QNAM QLABEL QVAL QORIG
XYZ123 DM UNI101 RACEOTH Race, Other LAOTIAN CRF Page 1
XYZ123 DM UNI101 RANDDTC Date of Randomization 2010-04-02 CRF Page 1
XYZ123 DM UNI102 RANDDTC Date of Randomization 2010-02-13 CRF Page 1
XYZ123 DM UNI103 RANDDTC Date of Randomization 2010-05-16 CRF Page 1
XYZ123 DM UNI104 RANDDTC Date of Randomization 2010-01-02 CRF Page 1

The %MERGSUPP macro was designed for this task. It can be used either to merge supplemental qualifiers from a specified list of domains or, if the DOMAINS parameter is left blank, to merge all supplemental qualifiers with all parent domains in a given source library. This macro is below, followed by an explanation of certain points along the way.  

*------------------------------------------------------------;

* Merge supplemental qualifiers into the parent SDTM domain  ;

* This can either be for an entire library or for specified  ;

* domains                                                    ;

*------------------------------------------------------------;

%macro mergsupp(sourcelib=library, outlib=WORK, domains= , suppqual=0);

  %local domain;

  %** de-normalize suppqual and merge into the given domain;

  %macro domainx(domain= ,suppqual=0);

    %local suppdata idvar varlist nvars;

    %if &suppqual %then

      %let suppdata=suppqual;

    %else

      %let suppdata=supp&domain;

    ;

    %* count the number of supplemental qualifiers for the given
        domain;

     proc sort

      data = &sourcelib..&suppdata

      out = nvars

      nodupkey;  

        where rdomain=upcase("&domain");

        by qnam idvar;

    run;

    data _null_;

      set nvars end=eof;

        by qnam idvar;

         length varlist $200;

         retain varlist;

         if not first.qnam then

          put 'PROB' 'LEM: More than one IDVAR for the domain-- '
                   rdomain= qnam=idvar= ;

         else

            do;

                nvars + 1;

                varlist = trim(varlist) || " " || trim(qnam);

            end;

          if eof then

            do;

               call symput("nvars", put(nvars, 2.));

               call symput("varlist", trim(left(varlist)));

               call symput("idvar", trim(idvar));

            end;

    run;

    %put domain=&domain idvar=&idvar nvars=&nvars varlist=&varlist;

    proc sort

      data = &sourcelib..&suppdata

      out = supp&domain;

         where rdomain=upcase("&domain");

         by usubjid idvar idvarval;

    run;

    %*  determine whether IDVAR in the parent domain is character or
         numeric;

    %if &idvar^= %then

      %do;

        %let dsetnum=%sysfunc(open(&sourcelib..&domain));

        %let varnum=%sysfunc(varnum(&dsetnum,&idvar));

        %let idtype=%sysfunc(vartype(&dsetnum,&varnum));

        %let rc=%sysfunc(close(&dsetnum));

      %end;

    %else

        %let idtype= ;

    data supp&domain;

      set supp&domain;

        by usubjid idvar idvarval;

           drop qnam qval idvarval idvar i rdomain;

           length &varlist $200.;

           retain &varlist;

           array vars{*} &varlist;

           if first.idvarval then

             do i = 1 to dim(vars);

               vars{i} = '';

             end;

          do i = 1 to dim(vars);

            if upcase(qnam)=upcase(vname(vars{i})) then

              vars{i} = qval;

          end;

          %** convert to numeric if numeric in the parent domain;

          %if &idvar^= and &idtype=N %then

             &idvar = input(idvarval, best.);

          %else %if &idvar^= %then

               &idvar = idvarval;

              ;

              if last.idvarval;

    run;

    proc sort

      data = supp&domain;

        by usubjid &idvar;

 

    proc sort

      data = &sourcelib..&domain

      out = __tmp;

        by usubjid &idvar;

    data &outlib..&domain;

      merge __tmp supp&domain ;

        by usubjid &idvar;

    run;

    %mend domainx;

    %*---------------------------------------------------------;

    %* If DOMAINS parameter specified, then loop through those ;

    %* domains otherwise, dynamically identify the SUPPxx data ;

    %* sets and go through them all                            ;

    %*---------------------------------------------------------;

    %let _wrd=1;

    %if &DOMAINS^= %then

      %do %while(%scan(&domains,&_wrd)^= );           

          %let domain=%scan(&domains,&_wrd);

          %domainx(domain=&domain,suppqual=0);

          %let _wrd=%eval(&_wrd+1);

      %end;

    %else

      %do;           

           %** find all of the SUPPxx datasets and loop through each one;

           ods output members=members;

           proc contents

             data = &sourcelib.._all_ memtype=data nods ;

           run;

           data members;

              set members;

                if upcase(name)=:'SUPP' and upcase(name)^=:'SUPPQUAL' then

                          do;

                          rdomain = substr(name,5,2);

                          put name= rdomain= ;

                          output;

                        end;

                      else if upcase(name)=:'SUPPQUAL' then

                        call symput("suppqual","1");

       run;

       %** loop through each domain;

       proc sql noprint;

         select count(distinct rdomain)

            into :domn

            from %if &suppqual %then &sourcelib..suppqual; %else
                 members;

            ;

         select distinct rdomain

            into :domain1 - :domain%left(&domn)

            from %if &suppqual %then &sourcelib..suppqual; %else
                 members;

            ;

         %do _i=1 %to &domn;

           %domainx(domain=&&domain&_i,suppqual=&suppqual);

         %end;

      %end; %* if domains not specified explicitly...;

%mend mergsupp;

   In the macro call, the SOURCELIB parameter is used to specify the LIBNAME where all of the SDTM datasets exist. The OUTLIB parameter defaults to WORK, so that the resulting domain or domains with the supplemental qualifiers are created as WORK datasets. The DOMAINS parameter can be used to explicitly define domains for which you want to merge in any supplemental qualifiers.

   For each domain, the unique number of supplemental qualifiers, which will become variables, is determined.

   The domain, ID variable for the domain (which is used as a part of the unique key for the dataset), number of supplemental qualifiers, and the list of unique QNAM values are recorded in the SAS log.

   When merging with the parent domain, the variable defined by IDVAR must match in type with the parent domain. Otherwise, an error will occur. Because the value of the IDVAR can be stored only as a character in the SUPP dataset, it must be converted to numeric when the data are transposed if the IDVAR is numeric in the parent domain.

   In this step, the supplemental qualifiers are finally merged with the parent domain.

   If the DOMAINS parameter is not blank, then this routine is performed for each domain in the DOMAINS list. If the DOMAINS parameter is blank, then the following conditions are checked: If SUPPQUAL=1, then the routine is run for each domain found in the SUPPQUAL dataset. Otherwise, the source library is scoured for datasets that begin with “SUPP”.

The following sections demonstrate how these tools can be used for the creation of ADaM datasets.

ADSL – The Subject-Level Dataset

One of two dataset structures described in the ADaM IG is ADSL, the subject-level analysis dataset. (ADSL is both the name of the structure and the expected name of the dataset that follows that structure.) The existence of an ADSL dataset is the minimum requirement for an ADaM submission. It is extremely important as a source for key information about every randomized or treated subject in a clinical trial. It contains key demographic data, treatment information, and population flags. It can often contain additional information such as final study disposition; safety information (for example, a death flag); and sometimes even key outcome data (for example, whether the subject was a responder).

As you can see, a lot of data are merged together when creating ADSL. As mentioned in the previous section, sometimes these data must come from supplemental qualifiers. The following code demonstrates some of the common conversions and variable declarations that go into an ADSL creation program.

*------------------------------------------------------------*;

* ADSL.sas creates the ADaM ADSL data set

* as permanent SAS datasets to the ADaM libref.

*------------------------------------------------------------*;

**** CREATE EMPTY ADSL DATASET CALLED EMPTY_ADSL;

%let metadatafile=&path/data/adam-metadata/adam_metadata.xlsx

%make_empty_dataset(metadatafile=&metadatafile, dataset=ADSL)

** merge supplemental qualifiers into DM;

%mergsupp(sourcelib=sdtm, domains=DM);

** find the change from baseline so that responders can be flagged;

** (2-point improvement in pain at 6 months);

%cfb(indata=sdtm.xp, outdata=responders, dayvar=xpdy, avalvar= xpstresn,

     keepvars=usubjid visitnum chg);

data ADSL;

    merge EMPTY_ADSL

              DM         (in = inDM)

              responders (in = inresp where=(visitnum=2))

              ;

      by usubjid;

        * convert RFXSTDTC to a numeric SAS date named TRTSDT;

        %dtc2dt(RFXSTDTC, prefix=TRTS );  

        * create BRTHDT, RANDDT, TRTEDT;

        %dtc2dt(BRTHDTC, prefix=BRTH);   

        %dtc2dt(RANDDTC, prefix=RAND);

        %dtc2dt(RFXENDTC, prefix=TRTE);

        * created flags for ITT and safety-evaluable;

        ittfl = put(randdt, popfl.);

        saffl = put(trtsdt, popfl.);

        trt01p = ARM;

        trt01a = trt01p;

        trt01pn = input(put(trt01p, $trt01pn.), best.);

        trt01an = trt01pn;

        agegr1n = input(put(age, agegr1n.), best.);

        agegr1  = put(agegr1n, agegr1_.);

        RESPFL = put((.z <= chg <= -2), _0n1y.);         

run;

**** SORT ADSL ACCORDING TO METADATA AND SAVE PERMANENT DATASET;

%make_sort_order(metadatafile=&metadatafile, dataset=ADSL)  

proc sort

  data=adsl

  (keep = &ADSLKEEPSTRING)

  out=adam.adsl;

    by &ADSLSORTSTRING;

run;

   Similar to what was done for SDTM conversions, the %MAKE_EMPTY_DATASET macro is used to create a 0-observation dataset from the metadata.

   As demonstrated in the previous section, the %MERGSUPP macro can be used to merge in supplemental qualifiers from the SDTM data that might be needed either as columns in the ADaM data or for deriving columns in the ADaM data. In this case, the randomization date is used for both—as a column in ADSL and for calculating relative days in ADaM.  In this example, relative days are defined by using the randomization date as the anchor date rather than the date of first dose.

   Sometimes it is necessary to have efficacy results, such as a flag for study responders, in ADSL. However, this often creates circular references if the code to derive the efficacy result first relies on the existence of ADSL. A way around this, and to avoid having duplicated code, is to create a macro that does the derivation and is called by multiple programs—in this case, ADSL.SAS and the program devoted to efficacy results at all visits (ADEF.SAS, which is shown later). Both programs call the %CFB macro (shown later) to derive changes from baseline so that responders can be easily identified.

   As shown in the previous section, the %DTC2DT macro is used to convert SDTM --DTC dates to numeric (SAS) dates. In this particular spot, the macro is used to create TRTSDT, the date of first treatment.

   In this example, we are assuming that all subjects took the dose to which they were randomized, so TRT01A is set equal to TRT01P. However, in many real-life situations, this is not the case; and this assignment is not, therefore, always valid. Because actual treatment information is not known at the beginning of a trial when dataset specifications are developed, the convention applied here is to include TRT01A and TRT01AN regardless of whether they differ from the planned assignment. This also allows you to begin table programming without needing to know the actual treatment results (if that approach is desired).

   Here the CHG variable created within the %CFB macro is used to create the responder flag.

   As was done with the SDTM conversion programs, the %MAKE_SORT_ORDER macro is used to properly sort the data. The &ADSLKEEPSTRING macro variable, which was created by the %MAKE_EMPTY_DATASET macro, is used to keep only those variables in the metadata.

The following code shows the %CFB macro. As stated, this macro is also used for the efficacy analysis dataset and is discussed more in the next section. Its primary purpose is to calculate changes from baseline (variable CHG). By doing so, it also creates common analysis variables AVAL and PCHG (percent change from baseline), the baseline value (BASE), and a flag for the baseline record (ABLFL). For ADSL, the CHG variable is needed to derive the responder flag (RESPFL). In the next section, we will see it used for an analysis dataset with a different ADaM structure.

*-------------------------------------------------------------;

* Change From Baseline                                        ;

* Macro for deriving ABLFL, BASE, CHG, and PCHG for a BDS     ;

*   formatted ADaM data set;                                  ;

* Assumes baseline is the last non-missing value on or before ;

*   study day 1 and that the INDDATA is an SDTM data set with ;

*   variables USUBJID and VISITNUM                            ;

*-------------------------------------------------------------;

%macro cfb(indata= ,outdata= ,avalvar= ,dayvar= ,keepvars= );

    proc sort

      data = &indata

      out = &outdata (rename = (&avalvar = aval));

        by usubjid visitnum;

    run;

    * Baseline is defined as the last non-missing value prior to study
       day 1 first dose;

    * (note, values on Day 1 are assumed to occur before the first dose);

    data base1 (keep = usubjid visitnum) base2 (keep = usubjid base);

      set &outdata;

        where &dayvar<=1 and aval > .z;

        by usubjid visitnum;

   

        rename aval = base;

        if last.usubjid;

    run;   

    * Do one merge to identify the baseline record;

    data &outdata;

       merge &outdata base1 (in = inbase);

          by usubjid visitnum;

           if inbase then

             ablfl = 'Y';

    run;

    * Do another merge to get the baseline value;            

    data &outdata;

       merge &outdata base2;

          by usubjid;

           %if &keepvars^= %then

             keep  &keepvars;

           ;

           if base > .z then

             chg  = aval - base;

           if base>.z and base ne 0 then

             pchg = chg/base*100;

    run;

%mend cfb;

Note that the typical needs of a real-life study can be much more complex than what is being shown here. For the sake of simplicity, methods for dealing with missing values, visits that fall outside of a particular visit window, or unscheduled visit values are not dealt with here.

The ADaM Basic Data Structure (BDS)

The ADaM Basic Data Structure (BDS) is the other of the two data structures detailed in the ADaM IG. Unlike ADSL, which is specific to one intended dataset, the BDS is a general structure—flexible enough to use for a number of different analysis datasets. It can loosely be described as a “tall and skinny” structure where, for example, each row contains one result per test, per analysis visit, per subject.

In this section, we demonstrate an ADaM BDS dataset for the creation of an efficacy dataset (ADEF) using the XP (pain) SDTM data shown in earlier chapters. Consider the structure of this dataset shown in Table 7.2. It contains one record per visit, per subject. In this simplified case there is only one test or result per visit. In a more complex (and probably more typical) arrangement, more than one related efficacy result would be collected at each visit. If this were the case, the SDTM data would contain a new row with a new test and test code (that is, XPTEST and XPTESTCD).

Table 7.2:  Snippet of XP SDTM Data

USUBJID XPSEQ XPTESTCD XPTEST XPORRES XPSTRESN VISIT XPDTC XPDY
UNI101 1 XPPAIN Pain Score Severe 3 Baseline 2010-04-02 1
UNI101 2 XPPAIN Pain Score Moderate 2 3 Months 2010-07-03 93
UNI101 3 XPPAIN Pain Score Mild 1 6 Months 2010-10-10 192

Fortunately, for implementers of ADaM, the BDS structure is similar to SDTM domains of the Findings class (although the BDS structure is by no means limited to findings). The conversion to ADaM in this simplified example therefore involves little more than keeping, dropping, renaming, and adding a few fields. No complicated transformations are needed. This is demonstrated in the following code. The primary fields being added are the BASE and CHG variables, which are needed for the efficacy analysis.

*------------------------------------------------------------*;

* ADEF.sas creates the ADaM BDS-structured data set          *;

* for efficacy data (ADEF), saved to the ADaM libref.        *;

*------------------------------------------------------------*;

**** CREATE EMPTY ADSL DATASET CALLED EMPTY_ADSL;

%let metadatafile=&path/data/adam-metadata/adam_metadata.xlsx

%make_empty_dataset(metadatafile=&metadatafile,dataset=ADEF)

** derive AVAL, BASE, CHG, and PCHG;

%cfb(indata=sdtm.xp, outdata=adef, dayvar=xpdy, avalvar= xpstresn);

proc sort

  data = adam.adsl

  (keep = usubjid siteid country age agegr1 agegr1n sex race randdt trt01p trt01pn ittfl)

  out = adsl;

    by usubjid;

data adef;

  merge adef (in = inadef) adsl (in = inadsl);

    by usubjid ;

        if not(inadsl and inadef) then

          put 'PROB' 'LEM: Missing subject?-- ' usubjid= inadef= inadsl= ;

        rename trt01p    = trtp

               trt01pn   = trtpn

               xptest    = param

               xptestcd  = paramcd

               visit     = avisit

               visitnum  = avisitn

               xporres   = avalc

        ;          

        if inadsl and inadef;

        %dtc2dt(xpdtc, refdt=randdt);  

        retain crit1 "Pain improvement from baseline of at least 2 points";

        crit1fl = put((.z <= chg <= -2), _0n1y.);

run;

** assign variable order and labels;

data adef;

  retain &ADEFKEEPSTRING;

  set EMPTY_ADEF adef;

run;

**** SORT ADEF ACCORDING TO METADATA AND SAVE PERMANENT DATASET;

%make_sort_order(metadatafile=&metadatafile, dataset=ADEF)

proc sort

  data=adef(keep = &ADEFKEEPSTRING)

  out=adam.adef;

    by &ADEFSORTSTRING;

run;   

   Here the %CFB macro used in ADSL is again used for the creation of ADEF. The difference now is that all visits from the resulting WORK.ADEF dataset are kept, as are the variables AVAL, BASE, CHG, and ABLFL (via the metadata, the %make_empty_dataset macro, and the &ADEFKEEPSTRING macro variable).

   As mentioned, when converting an SDTM findings domain to the BDS, the structure is essentially the same. But many variable names change, and the crucial analysis variables such as AVAL and CHG derived by the %CFB macro have been added. Although using the RENAME statement can potentially apply incorrect labels and lengths to the new variables, the consequences are small because the next DATA step uses the EMPTY_ADEF dataset to assign proper labels and lengths.

   Here the %DTC2DT macro is used to create both ADT and, by virtue of specifying a REFDT, the relative day.

   The code here to define responders is similar to that used in ADSL. The primary difference is that here we are defining responders at the visit level. But in ADSL, we were defining study-wide responders (those who met the criteria at the primary time point or visit). The variable name, CRIT1FL, is different as well, in order to use BDS naming conventions. (We also could have used an AVALCATy variable such as AVALCAT1.)

As shown in the previous section, the %CFB macro assigns pain scores from the SDTM data to AVAL and calculates parameter-invariant fields BASE and CHG. (PCHG is also created but not used for this dataset.) Parameter invariant means that the derivation does not change from one parameter to the next. In this simplified example, there is only one PARAM in ADEF. Generally speaking, it would be best to derive all parameter-invariant columns added to a BDS dataset, such as BASE or CHG, in one DATA step for all parameters at once. The advantage of doing this is that it ensures that the same algorithm was applied for all parameters and that the code for that algorithm is not repeated.

ADAE – Adverse Event Analysis Datasets

As a follow-up to the ADaM team’s ADAE document, an entirely new ADaM data structure has been established— the Occurrence Data Structure, or OCCDS.  ADAE can now be considered an example of the OCCDS data structure.  The structure of ADAE is very similar to that of the SDTM AE dataset. The advantage of ADAE is that it includes other variables used for analysis, such as population flags, treatment variables, and imputations of severity for cases where the severity is missing. Other variables that are useful for AE analyses include those that flag events within a certain pool of events. Assuming that AEs have been coded to the MedDRA dictionary, pooled events can either be grouped based on standardized MedDRA Queries (SMQs) or on sponsor-defined customized queries (CQs).

The following code demonstrates the creation of an ADAE file using the AE SDTM dataset first created in Chapter 3. It uses the variable CQ01NAM to identify events relating to pain (which are important to know about in a pain study).

*------------------------------------------------------------*;

* ADAE.sas creates the ADaM ADAE-structured data set         *;

* for AE data (ADAE), saved to the ADaM libref.              *;

*------------------------------------------------------------*;

**** CREATE EMPTY ADAE DATASET CALLED EMPTY_ADAE;

%let metadatafile=&path/data/adam-metadata/adam_metadata.xlsx

%make_empty_dataset(metadatafile=&metadatafile,dataset=ADAE)

proc sort

  data = adam.adsl

  (keep = usubjid siteid country age agegr1 agegr1n sex race trtsdt trt01a trt01an saffl)

  out = adsl;

    by usubjid;

data adae;

  merge sdtm.ae (in = inae) adsl (in = inadsl);

    by usubjid ;

        if inae and not inadsl then

          put 'PROB' 'LEM: Subject missing from ADSL?-- ' usubjid= inae=
                inadsl= ;

        length CQ01NAM $40.;

        rename trt01a  = trta

               trt01an = trtan

        ;          

        if inadsl and inae;

        %dtc2dt(aestdtc, prefix=ast, refdt=trtsdt);   

        %dtc2dt(aeendtc, prefix=aen, refdt=trtsdt);

         if index(upcase(AEDECOD), 'PAIN')>0 or upcase(AEDECOD)='HEADACHE' then

          CQ01NAM = 'PAIN EVENT';

        else

          CQ01NAM = ' ';

        aereln = input(put(aerel, $aereln.), best.);

        aesevn = input(put(aesev, $aesevn.), best.);

        relgr1n = (aereln>0); ** group related events (AERELN>0);

        relgr1  = put(relgr1n, relgr1n.);

   

        * Event considered treatment emergent if it started on or after;

        * the treatment start date.  Assume treatment emergent if start;

        * date is missing (and the end date is either also missing or >=;

        * the treatment start date)  ;

          trtemfl = put((astdt>=trtsdt or (astdt<=.z and

                     not(.z<aendt<trtsdt))), _0n1y.);

run;

** assign variable order and labels;

data adae;

  retain &adaeKEEPSTRING;

  set EMPTY_adae adae;

run;

**** SORT adae ACCORDING TO METADATA AND SAVE PERMANENT DATASET;

%make_sort_order(metadatafile=&metadatafile, dataset=ADAE)

proc sort

  data=adae(keep = &adaeKEEPSTRING)

  out=adam.adae;

    by &adaeSORTSTRING;

run;   

   Contrary to efficacy analyses, safety analyses are usually conducted with treatment group assignments based on the drug subjects actually received, rather than on what they were randomized to, hence the use of TRT01A and TRT01AN from ADSL. Although there is usually no difference between what a subject actually took and what a subject was supposed to take, it is helpful for planning and for pre-programming purposes (that is, being able to program before a study is unblinded) to have variables that can differentiate between the two.

   Other differences from efficacy analyses are the population studied and the calculation of relative day. Safety analyses are usually based on a population represented by all subjects who received the study drug, rather than by all subjects randomized. Consequently, relative days are calculated as the number of days since first dose rather than the number of days since the subject was randomized. (Again, in most cases you would expect these two values to be the same.) The REFDT in the %DTC2DT macro is therefore TRTSDT rather than RANDDT used for ADEF.

   CQ01NAM is used to identify adverse events related to occurrences of pain. Rather than having a flag variable, these are simply signified by populating CQ01NAM with the name for this customized MedDRA query. A CQ would typically be a bit more thorough than simply including any term with the word “PAIN” or “HEADACHE” and would typically be defined before knowing the precise terms or MedDRA codes that appear in a study. But for the purposes of illustration, this example should suffice.

ADTTE – The Time-to-Event Analysis Dataset

Now that we have our ADEF and ADAE datasets created, we can illustrate the creation of an ADaM dataset for time-to-event analyses. Suppose the SAP includes an endpoint for time to first pain relief without pain worsening. Events are defined by records in ADEF where the change in pain is negative. Subjects with pain that gets worse before it improves are censored at the time that the pain gets worse. The censored records will therefore come from records in ADEF where the change in pain is positive. Suppose the SAP states that adverse event data are also considered for identifying subjects with pain that gets worse before it improves. This is where our customized query comes in handy. Subjects with a nonmissing CQ01NAM that occurs before the first pain relief or before the first pain worsening will be censored at the time of the pain-related AE.

The ADaM Basic Data Structure for Time-to-Event Analyses document (http://www.cdisc.org/standards/foundational/adam) demonstrates precisely how you should comply with the ADaM principle of traceability by reflecting in your data how the derivations for such an analysis were implemented. The EVNTDESC variable is used to describe what event occurred. For this example, we will define four possible values, one for an actual event of pain relief, and three for censoring events. A unique value for CNSR will be assigned for each of these events, as shown in Table 7.3.

Table 7.3:  CNSR and EVNTDESC Values

EVNTDESC CNSR Comment
PAIN RELIEF 0 Actual events of pain relief that occur prior to pain worsening
PAIN WORSENING
PRIOR TO RELIEF
1 Pain worsening from ADEF
PAIN ADVERSE EVENT
PRIOR TO RELIEF
2 Pain AE prior to pain relief
COMPLETED STUDY
PRIOR TO RELIEF
3 Standard censoring at study end

With this plan in mind, the following code can be used to create ADTTE from ADEF, ADAE, and ADSL.

*----------------------------------------------------------------*;

* ADTTE.sas creates the ADaM BDS-structured data set for a       *;

* time-to-event analysis (ADTTE), saved to the ADaM libref.      *;

*----------------------------------------------------------------*;

**** CREATE EMPTY ADTTE DATASET CALLED EMPTY_ADTTE;

options mprint ;

%let metadatafile=&path/data/adam-metadata/adam_metadata.xlsx

%make_empty_dataset(metadatafile=&metadatafile,dataset=ADTTE)

proc sort

  data = adam.adsl

  (keep = studyid usubjid siteid country age agegr1 agegr1n sex race

randdt trt01p trt01pn ittfl trtedt)

  out = adtte;

    by usubjid;

proc sort

  data = adam.adef

  (keep = usubjid paramcd chg adt visitnum xpseq)

  out = adef;

    where paramcd='XPPAIN' and visitnum>0 and abs(chg)>0;

    by usubjid adt;

data adef;

  set adef;

    by usubjid adt;

        drop paramcd visitnum;

        if first.usubjid;

run;

proc sort

  data = adam.adae

  (keep = usubjid CQ01NAM astdt trtemfl aeseq)

  out = adae;

 

    where CQ01NAM ne '' and trtemfl='Y';

    by usubjid astdt;

run;

** keep only the first occurence of a pain event;

data adae;

  set adae;

    by usubjid aesdt;

        if first.usubjid;

run;   

data adtte;

  merge adtte (in = inadtte rename=(randdt=startdt))

        adef  (in = inadef)

        adae  (in = inadae)

        ;

    by usubjid ;

        retain param "Time to first pain relief (days)"

               paramcd "TTPNRELF"

        ;

        rename trt01p    = trtp

               trt01pn   = trtpn

        ;          

        length srcvar $10. srcdom $4.;

        if (.<chg<0) and (adt<astdt or not inadae) then  

          do;

            ** ACTUAL PAIN RELIEF BEFORE WORSENING;

            cnsr = 0;

            adt  = adt;

            evntdesc = put(cnsr, evntdesc.) ;

            srcdom = 'ADEF';

            srcvar = 'ADY';

            srcseq = xpseq;

          end;

        else if chg>0 and (adt<astdt or not inadae) then

          do;

            ** CENSOR: PAIN WORSENING BEFORE RELIEF;

            cnsr = 1;

            adt  = adt;

            evntdesc = put(cnsr, evntdesc.) ;

            srcdom = 'ADEF';

            srcvar = 'ADY';

            srcseq = xpseq;

          end;

        else if (.<astdt<adt) then

          do;

            ** CENSOR: PAIN AE BEFORE RELIEF;

            cnsr = 2;

            adt  = astdt;

            evntdesc = put(cnsr, evntdesc.) ;

            srcdom = 'ADAE';

 

            srcvar = 'ASTDY';

            srcseq = aeseq;

          end;

        else

          do;

            ** CENSOR: COMPLETED STUDY BEFORE PAIN RELIEF OR WORSENING;

            cnsr = 3;

            adt  = trtedt;

            evntdesc = put(cnsr, evntdesc.) ;

            srcdom = 'ADSL';

            srcvar = 'TRTEDT';

            srcseq = .;

          end;

        aval = adt - startdt + 1;

        format adt yymmdd10.;

run;

** assign variable order and labels;

data adtte;

  retain &adtteKEEPSTRING;

  set EMPTY_adtte adtte;

run;

**** SORT adtte ACCORDING TO METADATA AND SAVE PERMANENT DATASET;

%make_sort_order(metadatafile=&metadatafile, dataset=ADTTE)

proc sort

  data=adtte(keep = &adtteKEEPSTRING)

  out=adam.adtte;

    by &adtteSORTSTRING;

run;   

   From ADEF, the first worsening or pain improvement event is saved and merged with ADSL and ADAE.

   From ADAE, the first treatment-emergent, pain-related AE is kept and merged with ADSL and ADEF.

   Condition-based code exists for each censoring reason. Note the use of the SRC--- variables for traceability back to other ADaM datasets. The ADaM IG mentions that “traceability is built by clearly establishing the path between an element and its immediate predecessor.” As such, despite the reserved name of SRCDOM, which implies that the source is an SDTM domain, the referenced source is actually another ADaM dataset. Note also, however, that the SRCSEQ variable is still from the original SDTM source that was carried over to the ADaM dataset. For this example, this is acceptable because the SRCSEQ variables still uniquely identify subjects’ data records. This will not always be the case because one record from an SDTM source file can sometimes be used for multiple ADaM records (for example, when a Last Observation Carried Forward algorithm is implemented). In situations where the SDTM --SEQ variable does not uniquely identify an ADaM record, the variable ASEQ can be added to the ADaM data for traceability purposes.

 

Chapter Summary

Analysis datasets are a perfect complement to the SDTM. While the SDTM serves the purpose of providing all data as collected (at least those data that ultimately could be used, in some way, to evaluate a product’s safety or efficacy), ADaM provides analysis-ready data that can quickly be used for producing (or re-producing) important study results.

In this chapter, we illustrated the implementation of ADaM using fairly straightforward, though somewhat common, examples. For some of the common ADaM creation tasks, such as converting SDTM --DTC dates to SAS dates and merging supplemental qualifiers into a parent domain, macros have been provided for standardizing the process. Many other macros introduced during the Base SAS SDTM implementation in Chapter 3 were also applied here, thus illustrating that the advantages of setting up your specifications and metadata ahead of time apply to the ADaM creation process as well.

The true complexities with creating ADaM data tend to fall with the study-specific derivations themselves. Derivations that are somewhat common across studies, such as a Last Observation Carried Forward for missing value imputations (despite its many criticisms it is still widely applied), or assigning analysis visit values based on a windowing algorithm, could be added to your ADaM implementation toolbox, applying the rules generally used for such algorithms within your organization.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset