Dataset Conversions Best Practice
converting datasets in a clinical trial can improve the accuracy, consistency, and utility of data dataset conversions may be configured to facilitate data sharing, analysis and produce submission ready datasets in ryze, we use the dataset mapper plugin to configure dataset conversions see the docid\ dkmua9njlzyseskhiiz0d to get started with mapping your datasets and variables in this guide, we’ll highlight some of the things to remember when working with the dataset mapper plugin and running dataset conversions standardize your mappings different data sources used in a clinical trial may have different formats which can make it difficult to combine and analyze data converting datasets into a standardized format can make it easier to work with the data and ensure that it is accurate and consistent in ryze, dataset conversion tasks are usually run between asset groups within a study for example, the source dataset and the target dataset are contained within their own asset groups in the example below, we configure mappings to convert the non clinical data (source datasets) to the vedc (proprietary datasets) format, and then we configure another set of mappings to convert vedc to sdtm format this means that there are two conversion tasks to convert the source datasets to proprietary format to convert the proprietary dataset to sdtm in practice, you might want to include more stages in your conversion process when working with the dataset mapper, it is important to remember that the mappings that you configure at the study level can then be saved as a standard this allows you to use your mapping configuration as a library in ryze and subsequently pull this configuration into future studies start small test conversions as you go by running smaller conversion tasks this allows you to validate your mapping commands incrementally validating with test conversions means you can develop a conversion standard for larger scale use use descriptions add descriptions to mapped variables, these can be used to describe where the data is coming from and how you expect it to be programmed, e g "concatenate (project, siteid, and subjectid)" descriptions also help with reviewing your mapping configuration, for example, if you need to share it with others in your organization use the mapper visualization you can use the mapping visualization to export your mapping configuration as a spreadsheet this can be useful for sharing mapping specifications with others and double checking your conversion process it’s also useful for documenting the programming steps for each conversion map your datasets in multiple stages dataset mapper can be used for a variety of purposes, including data conversion, data migration, and data normalization you can take raw data from your edc system and standardize it into a structure that is the same for every study that way you’ll have a mapping configuration that is shareable between studies and repeatable with other sources for example, you can set up multistage conversions where data flows in the following order vital signs a01 (edc) vital signs b01 (proprietary dataset) vital signs c01 (sdtm) this means that only the first step in the conversion process has to be configured per edc or study, and the second conversion step can be standardized into a docid\ t1w53uucbfozehj4xhs26 you can also choose to have more stages in the conversion process such as, an extra step in the process vital signs a01 (edc) vital signs b01 (proprietary dataset) vital signs c01 (sdtm+) vital signs d01 (sdtm) or, mapping more than one dataset into another vital signs a01 (edc) vital signs a02 (edc) vital signs b01 (proprietary dataset) vital signs c01 (sdtm) using this naming convention makes it easier to track each step of the process, for example, we are combining a01 and a02 datasets in the first step of the process