--- title: "Missing Values in Tern" date: "2022-04-12" output: rmarkdown::html_document: theme: "spacelab" highlight: "kate" toc: true toc_float: true vignette: > %\VignetteIndexEntry{Missing Values in Tern} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} editor_options: markdown: wrap: 72 --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The packages used in this vignette are: ```{r, message=FALSE} library(rtables) library(formatters) library(tern) library(dplyr) ``` ## Variable Class Conversion `rtables` requires that split variables to be factors. When you try and split a variable that isn't, a warning message will appear. Here we purposefully convert the SEX variable to character to demonstrate what happens when we try splitting the rows by this variable. To fix this, `df_explict_na` will convert this to a factor resulting in the table being generated. ```{r} adsl <- tern_ex_adsl adsl$SEX <- as.character(adsl$SEX) vars <- c("AGE", "SEX", "RACE", "BMRKR1") var_labels <- c( "Age (yr)", "Sex", "Race", "Continous Level Biomarker 1" ) result <- basic_table(show_colcounts = TRUE) %>% split_cols_by(var = "ARM") %>% add_overall_col("All Patients") %>% analyze_vars( vars = vars, var_labels = var_labels ) %>% build_table(adsl) result ``` ## Including Missing Values in `rtables` Here we purposefully convert all `M` values to `NA` in the `SEX` variable. After running `df_explicit_na` the `NA` values are encoded as `` but they are not included in the table. As well, the missing values are not included in the `n` count and they are not included in the denominator value for calculating the percent values. ```{r} adsl <- tern_ex_adsl adsl$SEX[adsl$SEX == "M"] <- NA adsl <- df_explicit_na(adsl) vars <- c("AGE", "SEX") var_labels <- c( "Age (yr)", "Sex" ) result <- basic_table(show_colcounts = TRUE) %>% split_cols_by(var = "ARM") %>% add_overall_col("All Patients") %>% analyze_vars( vars = vars, var_labels = var_labels ) %>% build_table(adsl) result ``` If you want the `Na` values to be displayed in the table and included in the `n` count and as the denominator for calculating percent values, use the `na_level` argument. ```{r} adsl <- tern_ex_adsl adsl$SEX[adsl$SEX == "M"] <- NA adsl <- df_explicit_na(adsl, na_level = "Missing Values") result <- basic_table(show_colcounts = TRUE) %>% split_cols_by(var = "ARM") %>% add_overall_col("All Patients") %>% analyze_vars( vars = vars, var_labels = var_labels ) %>% build_table(adsl) result ``` ## Missing Values in Numeric Variables Numeric variables that have missing values are not altered. This means that any `NA` value in a numeric variable will not be included in the summary statistics, nor will they be included in the denominator value for calculating the percent values. Here we make any value less than 30 missing in the `AGE` variable and only the valued greater than 30 are included in the table below. ```{r} adsl <- tern_ex_adsl adsl$AGE[adsl$AGE < 30] <- NA adsl <- df_explicit_na(adsl) vars <- c("AGE", "SEX") var_labels <- c( "Age (yr)", "Sex" ) result <- basic_table(show_colcounts = TRUE) %>% split_cols_by(var = "ARM") %>% add_overall_col("All Patients") %>% analyze_vars( vars = vars, var_labels = var_labels ) %>% build_table(adsl) result ```