--- title: "Generating QC-Ready Result Data Frames (ARDs) from Tables" author: "Davide Garolini" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Generating QC-Ready Result Data Frames (ARDs) from Tables} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r, include = FALSE} suggested_dependent_pkgs <- c("dplyr") knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = all(vapply( suggested_dependent_pkgs, requireNamespace, logical(1), quietly = TRUE )) ) ``` ```{r, echo=FALSE} knitr::opts_chunk$set(comment = "#") ``` ```{css, echo=FALSE} .reveal .r code { white-space: pre; } ``` # Disclaimer This vignette is a work in progress and subject to change. ## Creating an Example Table In order to generate an ARD (Analysis Results Dataset), we first need to create a table from which all the necessary information will be retrieved. We will borrow a simple table from [this vignette](https://insightsengineering.github.io/rtables/latest-tag/articles/clinical_trials.html) about clinical trials. ```{r} library(rtables) ADSL <- ex_adsl # Example ADSL dataset # Very simple table lyt <- basic_table() %>% split_cols_by("ARM") %>% analyze(c("AGE", "SEX")) tbl <- build_table(lyt, ADSL) tbl ``` ## Converting the Table to a Result Data Frame (ARD) The `as_result_df()` function is used to convert a table to a result data frame. The result data frame is a data frame that contains the result of the summary table and is ready to be used for quality control purposes. This may be customized according to different standards. Let's see how we can produce different result `data.frame`s. The following outputs can be returned by setting different parameters in the `as_results_df()` function, and these results can be transformed back into a table using the `df_to_tt()` function. ```{r} as_result_df(tbl) as_result_df(tbl, data_format = "strings") as_result_df(tbl, simplify = TRUE) as_result_df(tbl, simplify = TRUE, keep_label_rows = TRUE) as_result_df(tbl, simplify = TRUE, keep_label_rows = TRUE, expand_colnames = TRUE) ``` Now let's generate our final ARD output, which is ready to be used for quality control purposes. ```{r} as_result_df(tbl, make_ard = TRUE) ``` ## Customizing the Output `as_result_df()` and ARDs depend on the content of the table, so it is possible to modify the table to customize the output. For example, we can add some user-defined statistics with custom names: ```{r} # rcell and in_rows are the core of any analysis function rc <- rcell(c(1, 2), stat_names = c("Rand1", "Rand2")) print(obj_stat_names(rc)) # c("Rand1", "Rand2") rc_row <- in_rows( .list = list(a = c(NA, 1), b = c(1, NA)), .formats = c("xx - xx", "xx.x - xx.x"), .format_na_strs = list(c("asda", "lkjklj")), .stat_names = list(c("A", "B"), c("B", "C")) ) # Only a getter for this object print(obj_stat_names(rc_row)) # list(a = c("A", "B"), b = c("B", "C")) # if c("A", "B"), one for each row # if single list, duplicated rc_row <- in_rows( .list = list(a = c(NA, 1), b = c(1, NA)), .formats = c("xx - xx", "xx.x - xx.x"), .format_na_strs = list(c("asda", "lkjklj")), .stat_names = c("A", "B") ) print(obj_stat_names(rc_row)) # c("A", "B") # one for each row print(lapply(rc_row, obj_stat_names)) # identical to above + row names rc_row <- in_rows( .list = list(a = c(NA, 1), b = c(1, NA)), .formats = c("xx - xx", "xx.x - xx.x"), .format_na_strs = list(c("asda", "lkjklj")), .stat_names = list(c("A", "B")) # It is duplicated, check it yourself! ) ``` Let's put it into practice: ```{r} mean_sd_custom <- function(x) { mean <- mean(x, na.rm = FALSE) sd <- sd(x, na.rm = FALSE) rcell(c(mean, sd), label = "Mean (SD)", format = "xx.x (xx.x)" # , # stat_names = c("Mean", "SD") ) } counts_percentage_custom <- function(x) { cnts <- table(x) out <- lapply(cnts, function(x) { perc <- x / sum(cnts) rcell(c(x, perc), format = "xx. (xx.%)") }) in_rows( .list = as.list(out), .labels = names(cnts), .stat_names = list(c("Count", "Percentage")) ) } lyt <- basic_table(show_colcounts = TRUE, colcount_format = "N=xx") %>% split_cols_by("ARM", split_fun = keep_split_levels(c("A: Drug X", "B: Placebo"))) %>% analyze(vars = "AGE", afun = mean_sd_custom) %>% analyze(vars = "SEX", afun = counts_percentage_custom) tbl <- build_table(lyt, ex_adsl) as_result_df(tbl, make_ard = TRUE) ``` # More Complex Outputs Let's add hierarchical row and column splits: ```{r} lyt <- basic_table() %>% split_rows_by("STRATA2") %>% summarize_row_groups() %>% split_cols_by("ARM") %>% split_cols_by("STRATA1") %>% analyze(c("AGE", "SEX")) tbl <- build_table(lyt, ex_adsl) as_result_df(tbl, make_ard = TRUE) ```