Define.xml is a file required as part of drug submission to FDA. It describes the structure and contents of the data collected during the clinical trial.

Over the past 12 years of our experience, we have generated define.xml files for studies pertaining to various therapeutic areas. The main challenge faced during this process has been in trying to provide a faster turnaround without any compromise on quality. With the aim of expediting the process, from the finalization of SDTM to generation of define.xml, Kreara has setup an automated process using the SAS macro facility.
Once a person has experience on this automated process, the turnaround time for define.xml can drastically reduced. This approach has clear benefits with regard to efficiency, quality and process management.

Figure 1: Time comparison- Traditional (Study 1) and Automated Approach (Study 2)


In define.xml, the metadata is classified into five levels namely dataset metadata, variable metadata, value level metadata, controlled terminology metadata and computational algorithm metadata.

A specification document in excel format with separate sheets for each type of metadata, forms the input to the SAS macro. The structure and layout of the specification document is so designed that the in-house developed SAS macro generates the xml code corresponding to each metadata type.

The working process of the macro is as follows.

Figure 2 : Process flow chart


As mentioned above, this automated process was initiated with the motive of reducing development time without compromise in quality. This approach has also helped us to implement generation of define.xml using smaller team. In a nut shell, the advantages over traditional method is as below

Table 1 : Advantages


Traditional coding method

Automated process

Manually dragging and dropping of each XML element is slow, hinders the change process, and is prone to errorAutomate single-cell XML mapping by deriving the XML map from a grid
Updates to the code against any data change is difficultOnce metadata is updated, the changes are automatically implemented
Time consuming and needs a thorough manual validation.Takes less time and once the metadata is perfect, mapping is correct.
Need to ensure that SDTM standards are implementedStandards built into the process (by using metadata)
Data managers have to manually review errors related to missing dataData Managers can focus on improving data quality instead of performing technical programming tasks


The other relevant documents to be included in define.xml are

  • XPT files for each dataset
  • Annotated CRF
  • Supplemental documents

Since aCRF and supplemental documents are single files, they can be directly used. XPT file attachment is again automated using the standard domain names.

Post A Comment

Protected by WP Anti Spam