dmtools_intro • dmtools

Installation

library(dmtools)

Overview

For checking the dataset from EDC in clinical trials. Notice, your dataset should have a postfix( _V1 ) or a prefix( V1_ ) in the names of variables. Column names should be unique.

laboratory - Does the investigator correctly estimate the laboratory analyzes?
dates - Do all dates correspond to the protocol’s timeline?
rename the dataset

Usage

laboratory

For laboratory check, you need to create the excel table like in the example.

AGELOW - number, >= number
AGEHIGH - if none, type Inf, <= number
SEX - for both sex, use |
LBTEST - What was the lab test name? (can be any convenient name for you)
LBORRES* - What was the result of the lab test?
LBNRIND* - How [did/do] the reported values compare within the [reference/normal/expected] range?
LBORNRLO - What was the lower limit of the reference range for this lab test, >=
LBORNRHI - What was the high limit of the reference range for this lab test, <=

*column names without prefix or postfix

lab reference ranges
AGELOW	AGEHIGH	SEX	LBTEST	LBORRES	LBNRIND	LBORNRLO	LBORNRHI
18	45	f\|m	Glucose	GLUC	GLUC_IND	3.9	5.9
18	45	m	Aspartate transaminase	AST	AST_IND	0	42
18	45	f	Aspartate transaminase	AST	AST_IND	0	39

dataset
ID	AGE	SEX	V1_GLUC	V1_GLUC_IND	V2_AST	V2_AST_IND
01	19	f	5.5	norm	30	norm
02	20	m	4.1	NA	48	norm
03	22	m	9.7	norm	31	norm

# "norm" and "no" it is an example, necessary variable for the estimate, get from the dataset
# parameter is_post has value FALSE because a dataset has a prefix( V1_ ) in the names of variables
refs <- system.file("labs_refer.xlsx", package = "dmtools")
obj_lab <- lab(refs, ID, AGE, SEX, "norm", "no", is_post = FALSE)
obj_lab <- obj_lab %>% check(df)

# ok - analysis, which has a correct estimate of the result
obj_lab %>% choose_test("ok")
#>   ID AGE SEX                 LBTEST LBTESTCD VISIT LBORNRLO LBORNRHI LBORRES
#> 1 01  19   f                Glucose     GLUC   V1_      3.9      5.9     5.5
#> 2 01  19   f Aspartate transaminase      AST   V2_      0.0     39.0      30
#> 3 03  22   m Aspartate transaminase      AST   V2_      0.0     42.0      31
#>   LBNRIND RES_TYPE_NUM IND_EXPECTED
#> 1    norm          5.5         norm
#> 2    norm         30.0         norm
#> 3    norm         31.0         norm

# mis - analysis, which has an incorrect estimate of the result
obj_lab %>% choose_test("mis")
#>   ID AGE SEX                 LBTEST LBTESTCD VISIT LBORNRLO LBORNRHI LBORRES
#> 1 02  20   m Aspartate transaminase      AST   V2_      0.0     42.0      48
#> 2 03  22   m                Glucose     GLUC   V1_      3.9      5.9     9.7
#>   LBNRIND RES_TYPE_NUM IND_EXPECTED
#> 1    norm         48.0           no
#> 2    norm          9.7           no

# skip - analysis, which has an empty value of the estimate
obj_lab %>% choose_test("skip")
#>   ID AGE SEX  LBTEST LBTESTCD VISIT LBORNRLO LBORNRHI LBORRES LBNRIND
#> 1 02  20   m Glucose     GLUC   V1_      3.9      5.9     4.1    <NA>
#>   RES_TYPE_NUM IND_EXPECTED
#> 1          4.1         norm

# all analyzes 
obj_lab %>% get_result()
#>   ID AGE SEX                 LBTEST LBTESTCD VISIT LBORNRLO LBORNRHI LBORRES
#> 1 01  19   f                Glucose     GLUC   V1_      3.9      5.9     5.5
#> 2 01  19   f Aspartate transaminase      AST   V2_      0.0     39.0      30
#> 3 02  20   m                Glucose     GLUC   V1_      3.9      5.9     4.1
#> 4 02  20   m Aspartate transaminase      AST   V2_      0.0     42.0      48
#> 5 03  22   m                Glucose     GLUC   V1_      3.9      5.9     9.7
#> 6 03  22   m Aspartate transaminase      AST   V2_      0.0     42.0      31
#>   LBNRIND RES_TYPE_NUM IND_EXPECTED IS_RIGHT
#> 1    norm          5.5         norm     TRUE
#> 2    norm         30.0         norm     TRUE
#> 3    <NA>          4.1         norm       NA
#> 4    norm         48.0           no    FALSE
#> 5    norm          9.7           no    FALSE
#> 6    norm         31.0         norm     TRUE

dates

For dates check, you need to create the excel table like in the example.

MINUS, PLUS, VISITDY - parameter of a timeline
VISITNUM - clinical encounter number, parameter for function e.g. contains(num_visit)
VISIT - protocol-defined description of a clinical encounter (can be any convenient name)
STARTDAT - column name of start date, with postfix or prefix
STARTVISIT - can be any convenient name of start date for you
IS_EQUAL - Boolean data type(T/F) to check date equality within a visit
EQUALDAT - column name for check date’s equality, with postfix or prefix

timeline
VISITNUM	VISIT	MINUS	PLUS	VISITDY	STARTDAT	STARTVISIT	IS_EQUAL	EQUALDAT
E1	screening	0	3	0	screen_date_E1	date of screening	F	NA
E2	rand	0	0	0	rand_date_E2	date of randomization	T	rand_date_E2
E3	visit 2	1	1	5	rand_date_E2	date of randomization	T	ph_date_E3

dataset
id	screen_date_E1	rand_date_E2	ph_date_E3	bio_date_E3
01	1991-03-13	1991-03-15	1991-03-21	1991-03-23
02	1991-03-07	1991-03-11	1991-03-16	1991-03-16
03	1991-03-08	1991-03-10	1991-03-16	1991-03-16

# use parameter str_date for search columns with dates, default:"DAT"
dates <- system.file("dates.xlsx", package = "dmtools")
obj_date <- date(dates, id, dplyr::contains, dplyr::matches)
obj_date <- obj_date %>% check(df)

# out - dates, which are out of the protocol's timeline
obj_date %>% choose_test("out")
#>   id            STARTVISIT   STARTDAT   VISIT        TERM     VISDAT
#> 1 01 date of randomization 1991-03-15 visit 2 bio_date_E3 1991-03-23
#>                          PLANDAT DAYS_OUT
#> 1 1991-03-19 UTC--1991-03-21 UTC        2

# uneq - dates, which are unequal
obj_date %>% choose_test("uneq")
#>   id   VISIT        TERM     VISDAT   EQUALDAT IS_TIMELINE
#> 1 01 visit 2 bio_date_E3 1991-03-23 1991-03-21       FALSE

# ok - correct dates
obj_date %>% choose_test("ok")
#>    id            STARTVISIT   STARTDAT     VISIT           TERM     VISDAT
#> 1  01     date of screening 1991-03-13 screening screen_date_E1 1991-03-13
#> 2  01 date of randomization 1991-03-15      rand   rand_date_E2 1991-03-15
#> 3  01 date of randomization 1991-03-15   visit 2     ph_date_E3 1991-03-21
#> 4  02     date of screening 1991-03-07 screening screen_date_E1 1991-03-07
#> 5  02 date of randomization 1991-03-11      rand   rand_date_E2 1991-03-11
#> 6  02 date of randomization 1991-03-11   visit 2     ph_date_E3 1991-03-16
#> 7  02 date of randomization 1991-03-11   visit 2    bio_date_E3 1991-03-16
#> 8  03     date of screening 1991-03-08 screening screen_date_E1 1991-03-08
#> 9  03 date of randomization 1991-03-10      rand   rand_date_E2 1991-03-10
#> 10 03 date of randomization 1991-03-10   visit 2     ph_date_E3 1991-03-16
#> 11 03 date of randomization 1991-03-10   visit 2    bio_date_E3 1991-03-16
#>                           PLANDAT   EQUALDAT
#> 1  1991-03-13 UTC--1991-03-16 UTC 1991-03-13
#> 2  1991-03-15 UTC--1991-03-15 UTC 1991-03-15
#> 3  1991-03-19 UTC--1991-03-21 UTC 1991-03-21
#> 4  1991-03-07 UTC--1991-03-10 UTC 1991-03-07
#> 5  1991-03-11 UTC--1991-03-11 UTC 1991-03-11
#> 6  1991-03-15 UTC--1991-03-17 UTC 1991-03-16
#> 7  1991-03-15 UTC--1991-03-17 UTC 1991-03-16
#> 8  1991-03-08 UTC--1991-03-11 UTC 1991-03-08
#> 9  1991-03-10 UTC--1991-03-10 UTC 1991-03-10
#> 10 1991-03-14 UTC--1991-03-16 UTC 1991-03-16
#> 11 1991-03-14 UTC--1991-03-16 UTC 1991-03-16

# all dates
obj_date %>% get_result()
#>    id            STARTVISIT   STARTDAT     VISIT           TERM     VISDAT
#> 1  01     date of screening 1991-03-13 screening screen_date_E1 1991-03-13
#> 2  01 date of randomization 1991-03-15      rand   rand_date_E2 1991-03-15
#> 3  01 date of randomization 1991-03-15   visit 2     ph_date_E3 1991-03-21
#> 4  01 date of randomization 1991-03-15   visit 2    bio_date_E3 1991-03-23
#> 5  02     date of screening 1991-03-07 screening screen_date_E1 1991-03-07
#> 6  02 date of randomization 1991-03-11      rand   rand_date_E2 1991-03-11
#> 7  02 date of randomization 1991-03-11   visit 2     ph_date_E3 1991-03-16
#> 8  02 date of randomization 1991-03-11   visit 2    bio_date_E3 1991-03-16
#> 9  03     date of screening 1991-03-08 screening screen_date_E1 1991-03-08
#> 10 03 date of randomization 1991-03-10      rand   rand_date_E2 1991-03-10
#> 11 03 date of randomization 1991-03-10   visit 2     ph_date_E3 1991-03-16
#> 12 03 date of randomization 1991-03-10   visit 2    bio_date_E3 1991-03-16
#>                           PLANDAT   EQUALDAT IS_TIMELINE IS_EQUAL DAYS_OUT
#> 1  1991-03-13 UTC--1991-03-16 UTC 1991-03-13        TRUE     TRUE        0
#> 2  1991-03-15 UTC--1991-03-15 UTC 1991-03-15        TRUE     TRUE        0
#> 3  1991-03-19 UTC--1991-03-21 UTC 1991-03-21        TRUE     TRUE        0
#> 4  1991-03-19 UTC--1991-03-21 UTC 1991-03-21       FALSE    FALSE        2
#> 5  1991-03-07 UTC--1991-03-10 UTC 1991-03-07        TRUE     TRUE        0
#> 6  1991-03-11 UTC--1991-03-11 UTC 1991-03-11        TRUE     TRUE        0
#> 7  1991-03-15 UTC--1991-03-17 UTC 1991-03-16        TRUE     TRUE        0
#> 8  1991-03-15 UTC--1991-03-17 UTC 1991-03-16        TRUE     TRUE        0
#> 9  1991-03-08 UTC--1991-03-11 UTC 1991-03-08        TRUE     TRUE        0
#> 10 1991-03-10 UTC--1991-03-10 UTC 1991-03-10        TRUE     TRUE        0
#> 11 1991-03-14 UTC--1991-03-16 UTC 1991-03-16        TRUE     TRUE        0
#> 12 1991-03-14 UTC--1991-03-16 UTC 1991-03-16        TRUE     TRUE        0

dplyr::contains - A function, which select necessary visit or event e.g. dplyr::start_with, dplyr::contains. It works like df %>% select(contains("E1")). You also can use dplyr::start_with, works like df %>% select(start_with("V1"))

dplyr::matches - A function, which select dates from necessary visit e.g. dplyr::matches, dplyr::contains. It works like visit_one %>% select(contains("DAT")), default: dplyr::contains()

rename

Function to rename the dataset, using crfs.

rename_dataset("./crfs", "old_name", "new_name", 2)

“./crfs” - path to crfs
“old_name” - variable for names in the dataset, without postfix or prefix
“new_name” - variable for necessary names, names should be unique
2 - a position of a sheet in the excel document, where dmtools can find “old_name” and “new_name”