Installation

Overview

For checking the dataset from EDC in clinical trials. Notice, your dataset should have a postfix( _V1 ) or a prefix( V1_ ) in the names of variables. Column names should be unique.

  • laboratory - Does the investigator correctly estimate the laboratory analyzes?
  • dates - Do all dates correspond to the protocol’s timeline?
  • rename the dataset

Usage

laboratory

For laboratory check, you need to create the excel table like in the example.

  • AGELOW - number, >= number
  • AGEHIGH - if none, type Inf, <= number
  • SEX - for both sex, use |
  • LBTEST - What was the lab test name? (can be any convenient name for you)
  • LBORRES* - What was the result of the lab test?
  • LBNRIND* - How [did/do] the reported values compare within the [reference/normal/expected] range?
  • LBORNRLO - What was the lower limit of the reference range for this lab test, >=
  • LBORNRHI - What was the high limit of the reference range for this lab test, <=

*column names without prefix or postfix

lab reference ranges
AGELOW AGEHIGH SEX LBTEST LBORRES LBNRIND LBORNRLO LBORNRHI
18 45 f|m Glucose GLUC GLUC_IND 3.9 5.9
18 45 m Aspartate transaminase AST AST_IND 0 42
18 45 f Aspartate transaminase AST AST_IND 0 39
dataset
ID AGE SEX V1_GLUC V1_GLUC_IND V2_AST V2_AST_IND
01 19 f 5.5 norm 30 norm
02 20 m 4.1 NA 48 norm
03 22 m 9.7 norm 31 norm
# "norm" and "no" it is an example, necessary variable for the estimate, get from the dataset
# parameter is_post has value FALSE because a dataset has a prefix( V1_ ) in the names of variables
refs <- system.file("labs_refer.xlsx", package = "dmtools")
obj_lab <- lab(refs, ID, AGE, SEX, "norm", "no", is_post = FALSE)
obj_lab <- obj_lab %>% check(df)

# ok - analysis, which has a correct estimate of the result
obj_lab %>% choose_test("ok")
#>   ID AGE SEX                 LBTEST LBTESTCD VISIT LBORNRLO LBORNRHI LBORRES
#> 1 01  19   f                Glucose     GLUC   V1_      3.9      5.9     5.5
#> 2 01  19   f Aspartate transaminase      AST   V2_      0.0     39.0      30
#> 3 03  22   m Aspartate transaminase      AST   V2_      0.0     42.0      31
#>   LBNRIND RES_TYPE_NUM IND_EXPECTED
#> 1    norm          5.5         norm
#> 2    norm         30.0         norm
#> 3    norm         31.0         norm

# mis - analysis, which has an incorrect estimate of the result
obj_lab %>% choose_test("mis")
#>   ID AGE SEX                 LBTEST LBTESTCD VISIT LBORNRLO LBORNRHI LBORRES
#> 1 02  20   m Aspartate transaminase      AST   V2_      0.0     42.0      48
#> 2 03  22   m                Glucose     GLUC   V1_      3.9      5.9     9.7
#>   LBNRIND RES_TYPE_NUM IND_EXPECTED
#> 1    norm         48.0           no
#> 2    norm          9.7           no

# skip - analysis, which has an empty value of the estimate
obj_lab %>% choose_test("skip")
#>   ID AGE SEX  LBTEST LBTESTCD VISIT LBORNRLO LBORNRHI LBORRES LBNRIND
#> 1 02  20   m Glucose     GLUC   V1_      3.9      5.9     4.1    <NA>
#>   RES_TYPE_NUM IND_EXPECTED
#> 1          4.1         norm

# all analyzes 
obj_lab %>% get_result()
#>   ID AGE SEX                 LBTEST LBTESTCD VISIT LBORNRLO LBORNRHI LBORRES
#> 1 01  19   f                Glucose     GLUC   V1_      3.9      5.9     5.5
#> 2 01  19   f Aspartate transaminase      AST   V2_      0.0     39.0      30
#> 3 02  20   m                Glucose     GLUC   V1_      3.9      5.9     4.1
#> 4 02  20   m Aspartate transaminase      AST   V2_      0.0     42.0      48
#> 5 03  22   m                Glucose     GLUC   V1_      3.9      5.9     9.7
#> 6 03  22   m Aspartate transaminase      AST   V2_      0.0     42.0      31
#>   LBNRIND RES_TYPE_NUM IND_EXPECTED IS_RIGHT
#> 1    norm          5.5         norm     TRUE
#> 2    norm         30.0         norm     TRUE
#> 3    <NA>          4.1         norm       NA
#> 4    norm         48.0           no    FALSE
#> 5    norm          9.7           no    FALSE
#> 6    norm         31.0         norm     TRUE

dates

For dates check, you need to create the excel table like in the example.

  • MINUS, PLUS, VISITDY - parameter of a timeline
  • VISITNUM - clinical encounter number, parameter for function e.g. contains(num_visit)
  • VISIT - protocol-defined description of a clinical encounter (can be any convenient name)
  • STARTDAT - column name of start date, with postfix or prefix
  • STARTVISIT - can be any convenient name of start date for you
  • IS_EQUAL - Boolean data type(T/F) to check date equality within a visit
  • EQUALDAT - column name for check date’s equality, with postfix or prefix
timeline
VISITNUM VISIT MINUS PLUS VISITDY STARTDAT STARTVISIT IS_EQUAL EQUALDAT
E1 screening 0 3 0 screen_date_E1 date of screening F NA
E2 rand 0 0 0 rand_date_E2 date of randomization T rand_date_E2
E3 visit 2 1 1 5 rand_date_E2 date of randomization T ph_date_E3
dataset
id screen_date_E1 rand_date_E2 ph_date_E3 bio_date_E3
01 1991-03-13 1991-03-15 1991-03-21 1991-03-23
02 1991-03-07 1991-03-11 1991-03-16 1991-03-16
03 1991-03-08 1991-03-10 1991-03-16 1991-03-16
# use parameter str_date for search columns with dates, default:"DAT"
dates <- system.file("dates.xlsx", package = "dmtools")
obj_date <- date(dates, id, dplyr::contains, dplyr::matches)
obj_date <- obj_date %>% check(df)

# out - dates, which are out of the protocol's timeline
obj_date %>% choose_test("out")
#>   id            STARTVISIT   STARTDAT   VISIT        TERM     VISDAT
#> 1 01 date of randomization 1991-03-15 visit 2 bio_date_E3 1991-03-23
#>                          PLANDAT DAYS_OUT
#> 1 1991-03-19 UTC--1991-03-21 UTC        2

# uneq - dates, which are unequal
obj_date %>% choose_test("uneq")
#>   id   VISIT        TERM     VISDAT   EQUALDAT IS_TIMELINE
#> 1 01 visit 2 bio_date_E3 1991-03-23 1991-03-21       FALSE

# ok - correct dates
obj_date %>% choose_test("ok")
#>    id            STARTVISIT   STARTDAT     VISIT           TERM     VISDAT
#> 1  01     date of screening 1991-03-13 screening screen_date_E1 1991-03-13
#> 2  01 date of randomization 1991-03-15      rand   rand_date_E2 1991-03-15
#> 3  01 date of randomization 1991-03-15   visit 2     ph_date_E3 1991-03-21
#> 4  02     date of screening 1991-03-07 screening screen_date_E1 1991-03-07
#> 5  02 date of randomization 1991-03-11      rand   rand_date_E2 1991-03-11
#> 6  02 date of randomization 1991-03-11   visit 2     ph_date_E3 1991-03-16
#> 7  02 date of randomization 1991-03-11   visit 2    bio_date_E3 1991-03-16
#> 8  03     date of screening 1991-03-08 screening screen_date_E1 1991-03-08
#> 9  03 date of randomization 1991-03-10      rand   rand_date_E2 1991-03-10
#> 10 03 date of randomization 1991-03-10   visit 2     ph_date_E3 1991-03-16
#> 11 03 date of randomization 1991-03-10   visit 2    bio_date_E3 1991-03-16
#>                           PLANDAT   EQUALDAT
#> 1  1991-03-13 UTC--1991-03-16 UTC 1991-03-13
#> 2  1991-03-15 UTC--1991-03-15 UTC 1991-03-15
#> 3  1991-03-19 UTC--1991-03-21 UTC 1991-03-21
#> 4  1991-03-07 UTC--1991-03-10 UTC 1991-03-07
#> 5  1991-03-11 UTC--1991-03-11 UTC 1991-03-11
#> 6  1991-03-15 UTC--1991-03-17 UTC 1991-03-16
#> 7  1991-03-15 UTC--1991-03-17 UTC 1991-03-16
#> 8  1991-03-08 UTC--1991-03-11 UTC 1991-03-08
#> 9  1991-03-10 UTC--1991-03-10 UTC 1991-03-10
#> 10 1991-03-14 UTC--1991-03-16 UTC 1991-03-16
#> 11 1991-03-14 UTC--1991-03-16 UTC 1991-03-16

# all dates
obj_date %>% get_result()
#>    id            STARTVISIT   STARTDAT     VISIT           TERM     VISDAT
#> 1  01     date of screening 1991-03-13 screening screen_date_E1 1991-03-13
#> 2  01 date of randomization 1991-03-15      rand   rand_date_E2 1991-03-15
#> 3  01 date of randomization 1991-03-15   visit 2     ph_date_E3 1991-03-21
#> 4  01 date of randomization 1991-03-15   visit 2    bio_date_E3 1991-03-23
#> 5  02     date of screening 1991-03-07 screening screen_date_E1 1991-03-07
#> 6  02 date of randomization 1991-03-11      rand   rand_date_E2 1991-03-11
#> 7  02 date of randomization 1991-03-11   visit 2     ph_date_E3 1991-03-16
#> 8  02 date of randomization 1991-03-11   visit 2    bio_date_E3 1991-03-16
#> 9  03     date of screening 1991-03-08 screening screen_date_E1 1991-03-08
#> 10 03 date of randomization 1991-03-10      rand   rand_date_E2 1991-03-10
#> 11 03 date of randomization 1991-03-10   visit 2     ph_date_E3 1991-03-16
#> 12 03 date of randomization 1991-03-10   visit 2    bio_date_E3 1991-03-16
#>                           PLANDAT   EQUALDAT IS_TIMELINE IS_EQUAL DAYS_OUT
#> 1  1991-03-13 UTC--1991-03-16 UTC 1991-03-13        TRUE     TRUE        0
#> 2  1991-03-15 UTC--1991-03-15 UTC 1991-03-15        TRUE     TRUE        0
#> 3  1991-03-19 UTC--1991-03-21 UTC 1991-03-21        TRUE     TRUE        0
#> 4  1991-03-19 UTC--1991-03-21 UTC 1991-03-21       FALSE    FALSE        2
#> 5  1991-03-07 UTC--1991-03-10 UTC 1991-03-07        TRUE     TRUE        0
#> 6  1991-03-11 UTC--1991-03-11 UTC 1991-03-11        TRUE     TRUE        0
#> 7  1991-03-15 UTC--1991-03-17 UTC 1991-03-16        TRUE     TRUE        0
#> 8  1991-03-15 UTC--1991-03-17 UTC 1991-03-16        TRUE     TRUE        0
#> 9  1991-03-08 UTC--1991-03-11 UTC 1991-03-08        TRUE     TRUE        0
#> 10 1991-03-10 UTC--1991-03-10 UTC 1991-03-10        TRUE     TRUE        0
#> 11 1991-03-14 UTC--1991-03-16 UTC 1991-03-16        TRUE     TRUE        0
#> 12 1991-03-14 UTC--1991-03-16 UTC 1991-03-16        TRUE     TRUE        0

dplyr::contains - A function, which select necessary visit or event e.g. dplyr::start_with, dplyr::contains. It works like df %>% select(contains("E1")). You also can use dplyr::start_with, works like df %>% select(start_with("V1"))

dplyr::matches - A function, which select dates from necessary visit e.g. dplyr::matches, dplyr::contains. It works like visit_one %>% select(contains("DAT")), default: dplyr::contains()

rename

Function to rename the dataset, using crfs.

rename_dataset("./crfs", "old_name", "new_name", 2)
  • “./crfs” - path to crfs
  • “old_name” - variable for names in the dataset, without postfix or prefix
  • “new_name” - variable for necessary names, names should be unique
  • 2 - a position of a sheet in the excel document, where dmtools can find “old_name” and “new_name”