7 Table by gtsummary
7.1 Packages for this Lesson
7.2 Introduction to ‘gtsummary’
The gtsummary
package is useful mainly for creating publication-ready tables (i.e.demographic table, simple summary table, contingency-table, regression table, etc.). The best feature of this package is it can automatically detect if the data is continuous, dichotomous or categorical, and which descriptive statistics needs to apply.
7.3 Data Source and Description
The public.ctn0094data
package provides harmonized and normalized data sets from the CTN-0094 clinical trial. These data sets describe the experiences of care-seeking individuals suffering from opioid use disorder (OUD). The trial is part of the Clinical Trials Network (CTN) protocol number 0094, funded by the US National Institute of Drug Abuse (NIDA). It is used by the NIDA to develop, validate, refine, and deliver new treatment options to patients.
In this lesson, I used the demographics
, and fagerstrom
data sets from the public.ctn0094data package to demonstrate the gtsummary
function. The demographics
part contains the demographic variables such as age, sex, race, marital status etc. The fagerstrom
part contains data on smoking habit (smoker/non-smoker, Fagerstrom Test for Nicotine Dependence Score (ranging from 0 to 10) ~ FTND, Number of cigarettes smoked per day.). The FTND is a questionnaire that assesses the physical dependence of adults on nicotine. The test uses yes/no questions scored from 0 to 1 and multiple-choice questions scored from 0 to 3, and the total score ranges from 0 to 10. The higher the score, the more intense the patient’s nicotine dependence is. The score categories are: 8+: High dependence, 7–5: Moderate dependence, 4–3: Low to moderate dependence and 0–2: Low dependence.
# Searching suitable data sets: You can skip
data(package = "public.ctn0094data")
#data(demographics, package = "public.ctn0094data")
#names(demographics)
#data(fagerstrom, package = "public.ctn0094data")
#names(fagerstrom)
#table(fagerstrom$ftnd)
7.4 Creating Model Data Frames
The demographics
and fagerstrom
data sets within the public.ctn0094data package were joined by ID (who
variable) and a new dta frame smoking_df
is created.
7.5 Demographic Table with tbl_summary
Function
7.5.1 Creating Table 1: Demographic Characteristic
In order to create a basic demographic table, I will now select which variables I want to show in the table and then use the tbl_summary
function to create the table. I am also adding the description of the variables I included in my table.
-
age
: an integer variable that indicates the Age of the patient. -
race:
a factor variable with levels ‘Black’, ‘Other Refused/missing’, and ‘White’, which represents the Self-reported race of the patient. -
education
: a factor variable denotes the Education level at intake, with levels ‘HS/GED’ for high school graduate or equivalent, ‘Less than HS’ for less than high school education, ‘More than HS’ for some education beyond high school, and ‘Missing’ if the information is not provided. -
is_male
: a factor variable with levels ‘No’ and ‘Yes’, describing the Sex (not gender) of the patient, where ‘Yes’ indicates male. -
marital:
a factor variable indicating the Marital status at intake, with levels ‘Married or Partnered’, ‘Never married’, ‘Separated/Divorced/Widowed’, and ‘Not answered’ if the question was not asked during intake. -
is_smoker
: a factor indicating whether the patient is a smoker or not. Levels include “No” (not a smoker) and “Yes” (a smoker).
# Selecting variables in a new data frame `table_1df` for table 1
table_1df <- smoking_df %>%
select(age, race, education, is_male, marital, is_smoker)
# Table 1
table_1 <- table_1df %>% tbl_summary()
table_1
Characteristic | N = 3,5601 |
---|---|
age | 34 (27, 45) |
Unknown | 208 |
race | |
Black | 365 (10%) |
Other | 506 (14%) |
Refused/missing | 58 (1.6%) |
White | 2,631 (74%) |
education | |
HS/GED | 691 (39%) |
Less than HS | 352 (20%) |
More than HS | 724 (41%) |
Unknown | 1,793 |
is_male | 2,351 (66%) |
Unknown | 4 |
marital | |
Married or Partnered | 329 (19%) |
Never married | 1,028 (59%) |
Separated/Divorced/Widowed | 394 (23%) |
Unknown | 1,809 |
is_smoker | 2,631 (85%) |
Unknown | 460 |
1 Median (IQR); n (%) |
7.5.2 Customizing Table 1: Changing the Label
I am using label
function to change the label of all variables. Other customization will be shown in the next contingency table.
# Changing the Label
table_1 <-
table_1df %>%
tbl_summary(
label = list(
age = "Age",
race = "Race",
education = "Education level",
is_male = "Male",
marital = "Marital status",
is_smoker = "Smoker"
)
)
table_1
Characteristic | N = 3,5601 |
---|---|
Age | 34 (27, 45) |
Unknown | 208 |
Race | |
Black | 365 (10%) |
Other | 506 (14%) |
Refused/missing | 58 (1.6%) |
White | 2,631 (74%) |
Education level | |
HS/GED | 691 (39%) |
Less than HS | 352 (20%) |
More than HS | 724 (41%) |
Unknown | 1,793 |
Male | 2,351 (66%) |
Unknown | 4 |
Marital status | |
Married or Partnered | 329 (19%) |
Never married | 1,028 (59%) |
Separated/Divorced/Widowed | 394 (23%) |
Unknown | 1,809 |
Smoker | 2,631 (85%) |
Unknown | 460 |
1 Median (IQR); n (%) |
7.6 Contingency Table with tbl_summary
Function
7.6.1 Creating Table 2: Demographic Variables by Smoking Status
I will now show the table 1 demographic variables by smoking habit status (is_smoker
, Yes
= smoker and No
= non-smokers)
# Contingency table
table_2 <- table_1df %>% tbl_summary(by = is_smoker)
table_2
Characteristic | No, N = 4691 | Yes, N = 2,6311 |
---|---|---|
age | 36 (28, 47) | 33 (26, 44) |
Unknown | 7 | 79 |
race | ||
Black | 46 (9.8%) | 259 (9.8%) |
Other | 68 (14%) | 376 (14%) |
Refused/missing | 1 (0.2%) | 18 (0.7%) |
White | 354 (75%) | 1,978 (75%) |
education | ||
HS/GED | 104 (34%) | 531 (41%) |
Less than HS | 23 (7.5%) | 290 (22%) |
More than HS | 179 (58%) | 488 (37%) |
Unknown | 163 | 1,322 |
is_male | 336 (72%) | 1,724 (66%) |
marital | ||
Married or Partnered | 75 (25%) | 236 (18%) |
Never married | 152 (50%) | 775 (59%) |
Separated/Divorced/Widowed | 77 (25%) | 293 (22%) |
Unknown | 165 | 1,327 |
1 Median (IQR); n (%) |
7.6.2 Removing Missing Data
If I do not want to show the missing data in my table, I will use missing = "no"
.
# Removing Missing Data
table_2nm <- table_1df %>% tbl_summary(by = is_smoker,
missing = "no")
table_2nm
Characteristic | No, N = 4691 | Yes, N = 2,6311 |
---|---|---|
age | 36 (28, 47) | 33 (26, 44) |
race | ||
Black | 46 (9.8%) | 259 (9.8%) |
Other | 68 (14%) | 376 (14%) |
Refused/missing | 1 (0.2%) | 18 (0.7%) |
White | 354 (75%) | 1,978 (75%) |
education | ||
HS/GED | 104 (34%) | 531 (41%) |
Less than HS | 23 (7.5%) | 290 (22%) |
More than HS | 179 (58%) | 488 (37%) |
is_male | 336 (72%) | 1,724 (66%) |
marital | ||
Married or Partnered | 75 (25%) | 236 (18%) |
Never married | 152 (50%) | 775 (59%) |
Separated/Divorced/Widowed | 77 (25%) | 293 (22%) |
1 Median (IQR); n (%) |
7.6.3 Applying Statistical Tests
I will use add_p
function to show the statistical analysis. This will automatically detect if data in each variable is continuous, dichotomous or categorical, and apply the appropriate descriptive statistics accordingly.
# Adding p-value
table_2 <- table_1df %>% tbl_summary(by = is_smoker,
missing = "no") %>%
add_p()
table_2
Characteristic | No, N = 4691 | Yes, N = 2,6311 | p-value2 |
---|---|---|---|
age | 36 (28, 47) | 33 (26, 44) | <0.001 |
race | 0.8 | ||
Black | 46 (9.8%) | 259 (9.8%) | |
Other | 68 (14%) | 376 (14%) | |
Refused/missing | 1 (0.2%) | 18 (0.7%) | |
White | 354 (75%) | 1,978 (75%) | |
education | <0.001 | ||
HS/GED | 104 (34%) | 531 (41%) | |
Less than HS | 23 (7.5%) | 290 (22%) | |
More than HS | 179 (58%) | 488 (37%) | |
is_male | 336 (72%) | 1,724 (66%) | 0.010 |
marital | 0.006 | ||
Married or Partnered | 75 (25%) | 236 (18%) | |
Never married | 152 (50%) | 775 (59%) | |
Separated/Divorced/Widowed | 77 (25%) | 293 (22%) | |
1 Median (IQR); n (%) | |||
2 Wilcoxon rank sum test; Fisher’s exact test; Pearson’s Chi-squared test |
Note: The footnote 2 shows all the statistical tests applied to this table. It can be understandable from the table that for categorical variable it applied Pearson’s Chi-squared test, for continuous non-normal distributed variable it applied Wilcoxon rank sum test; and for small sample data, it applied Fisher’s exact test. It would be great to see different footnotes for each of the test next to each p-value, however, I did not find a way to do that.
7.6.4 Customizing Table 2(a)
I will now customize the table 2 to show total number and overall number and show missing values by using the following functions:
# Adding total and overall number
table_2 <- table_1df %>% tbl_summary(by = is_smoker,
label = list(
age = "Age",
race = "Race",
education = "Education level",
is_male = "Male",
marital = "Marital status"
),
missing_text = "(Missing)"
) %>%
add_p() %>%
add_n() %>%
add_overall()
table_2
Characteristic | N | Overall, N = 3,1001 | No, N = 4691 | Yes, N = 2,6311 | p-value2 |
---|---|---|---|---|---|
Age | 3,014 | 34 (27, 45) | 36 (28, 47) | 33 (26, 44) | <0.001 |
(Missing) | 86 | 7 | 79 | ||
Race | 3,100 | 0.8 | |||
Black | 305 (9.8%) | 46 (9.8%) | 259 (9.8%) | ||
Other | 444 (14%) | 68 (14%) | 376 (14%) | ||
Refused/missing | 19 (0.6%) | 1 (0.2%) | 18 (0.7%) | ||
White | 2,332 (75%) | 354 (75%) | 1,978 (75%) | ||
Education level | 1,615 | <0.001 | |||
HS/GED | 635 (39%) | 104 (34%) | 531 (41%) | ||
Less than HS | 313 (19%) | 23 (7.5%) | 290 (22%) | ||
More than HS | 667 (41%) | 179 (58%) | 488 (37%) | ||
(Missing) | 1,485 | 163 | 1,322 | ||
Male | 3,100 | 2,060 (66%) | 336 (72%) | 1,724 (66%) | 0.010 |
Marital status | 1,608 | 0.006 | |||
Married or Partnered | 311 (19%) | 75 (25%) | 236 (18%) | ||
Never married | 927 (58%) | 152 (50%) | 775 (59%) | ||
Separated/Divorced/Widowed | 370 (23%) | 77 (25%) | 293 (22%) | ||
(Missing) | 1,492 | 165 | 1,327 | ||
1 Median (IQR); n (%) | |||||
2 Wilcoxon rank sum test; Fisher’s exact test; Pearson’s Chi-squared test |
7.6.5 Customizing Table 2(b)
I will now customize the title, caption and header and made the variable names bold of table 2 by using the following functions:
# Adding title, caption and header
table_2 <- table_1df %>% tbl_summary(by = is_smoker,
label = list(
age = "Age",
race = "Race",
education = "Education level",
is_male = "Male",
marital = "Marital status"
),
missing_text = "(Missing)"
) %>%
add_p() %>%
add_n() %>%
add_overall() %>%
bold_labels() %>%
modify_caption("Table 2. Demographic characteristics according to smoking status") %>%
modify_header(label ~ "**Demographic characteristics**") %>%
modify_spanning_header(c("stat_1", "stat_2") ~ "**Smoking status**")
table_2
Demographic characteristics | N | Overall, N = 3,1001 | Smoking status | p-value2 | |
---|---|---|---|---|---|
No, N = 4691 | Yes, N = 2,6311 | ||||
Age | 3,014 | 34 (27, 45) | 36 (28, 47) | 33 (26, 44) | <0.001 |
(Missing) | 86 | 7 | 79 | ||
Race | 3,100 | 0.8 | |||
Black | 305 (9.8%) | 46 (9.8%) | 259 (9.8%) | ||
Other | 444 (14%) | 68 (14%) | 376 (14%) | ||
Refused/missing | 19 (0.6%) | 1 (0.2%) | 18 (0.7%) | ||
White | 2,332 (75%) | 354 (75%) | 1,978 (75%) | ||
Education level | 1,615 | <0.001 | |||
HS/GED | 635 (39%) | 104 (34%) | 531 (41%) | ||
Less than HS | 313 (19%) | 23 (7.5%) | 290 (22%) | ||
More than HS | 667 (41%) | 179 (58%) | 488 (37%) | ||
(Missing) | 1,485 | 163 | 1,322 | ||
Male | 3,100 | 2,060 (66%) | 336 (72%) | 1,724 (66%) | 0.010 |
Marital status | 1,608 | 0.006 | |||
Married or Partnered | 311 (19%) | 75 (25%) | 236 (18%) | ||
Never married | 927 (58%) | 152 (50%) | 775 (59%) | ||
Separated/Divorced/Widowed | 370 (23%) | 77 (25%) | 293 (22%) | ||
(Missing) | 1,492 | 165 | 1,327 | ||
1 Median (IQR); n (%) | |||||
2 Wilcoxon rank sum test; Fisher’s exact test; Pearson’s Chi-squared test |
7.6.6 Customizing Table 2(c)
Here, I am keeping only those customization that I prefer to have in my final table 2.
# Final table
table_2 <- table_1df %>% tbl_summary(by = is_smoker,
label = list(
age = "Age",
race = "Race",
education = "Education level",
is_male = "Male",
marital = "Marital status"
),
missing = "no"
) %>%
add_p() %>%
#add_n() %>%
#add_overall() %>%
bold_labels() %>%
modify_caption("Table 2. Demographic characteristics according to smoking status") %>%
modify_header(label ~ "**Characteristics**") %>%
modify_spanning_header(c("stat_1", "stat_2") ~ "**Smoking Status**") %>%
modify_footnote(all_stat_cols() ~ "Median (IQR) for Age; n (%) for all other variables")
table_2
Characteristics | Smoking Status | p-value2 | |
---|---|---|---|
No, N = 4691 | Yes, N = 2,6311 | ||
Age | 36 (28, 47) | 33 (26, 44) | <0.001 |
Race | 0.8 | ||
Black | 46 (9.8%) | 259 (9.8%) | |
Other | 68 (14%) | 376 (14%) | |
Refused/missing | 1 (0.2%) | 18 (0.7%) | |
White | 354 (75%) | 1,978 (75%) | |
Education level | <0.001 | ||
HS/GED | 104 (34%) | 531 (41%) | |
Less than HS | 23 (7.5%) | 290 (22%) | |
More than HS | 179 (58%) | 488 (37%) | |
Male | 336 (72%) | 1,724 (66%) | 0.010 |
Marital status | 0.006 | ||
Married or Partnered | 75 (25%) | 236 (18%) | |
Never married | 152 (50%) | 775 (59%) | |
Separated/Divorced/Widowed | 77 (25%) | 293 (22%) | |
1 Median (IQR) for Age; n (%) for all other variables | |||
2 Wilcoxon rank sum test; Fisher’s exact test; Pearson’s Chi-squared test |
7.6.7 Interpretation of Table 2
Interpreting the variable Education level
:
Null Hypothesis (H₀)
: There is no association between education level and smoking status.
Alternative Hypothesis (H₁)
: There is an association between education level and smoking status.
Since the p-value is less than 0.001, we reject the null hypothesis. This indicates that there is a statistically significant association between education level and smoking status. However, to understand the nature of this association (whether education level affects smoking status or vice versa), further analysis would be needed.
7.6.8 Missing value distribution in Table 2
We often want to see the missing value distribution among the the demographic variables. For example, we want to see the missing value distribution for the smoking status variable. First, we need to re-code the NA
into a new category for is_smoker
variable and recreate the table.
7.6.8.1 Missing value data creation
# Recoding `is_smoker` variable into `is_smoker_new`
table_1df <- table_1df %>%
mutate(is_smoker_new = ifelse(is.na(is_smoker), 99, is_smoker)) # converting all NA to 99
# Convert into factor
table_1df$is_smoker_new <- factor(table_1df$is_smoker_new,
levels = c(1, 2, 99),
labels = c("No", "Yes", "Missing"))
# New data frame
table_1df_new <- table_1df %>%
select(age, race, education, is_male, marital, is_smoker_new)
7.6.8.2 Missing value table creation
# Final table
table_2miss <- table_1df_new %>% tbl_summary(by = is_smoker_new,
label = list(
age = "Age",
race = "Race",
education = "Education level",
is_male = "Male",
marital = "Marital status"
),
missing = "no"
) %>%
add_p() %>%
#add_n() %>%
#add_overall() %>%
bold_labels() %>%
modify_caption("Table 2. Demographic characteristics according to smoking status") %>%
modify_header(label ~ "**Characteristics**") %>%
modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**Smoking Status**") %>%
modify_footnote(all_stat_cols() ~ "Median (IQR) for Age; n (%) for all other variables")
table_2miss
Characteristics | Smoking Status | p-value2 | ||
---|---|---|---|---|
No, N = 4691 | Yes, N = 2,6311 | Missing, N = 4601 | ||
Age | 36 (28, 47) | 33 (26, 44) | 39 (29, 47) | <0.001 |
Race | <0.001 | |||
Black | 46 (9.8%) | 259 (9.8%) | 60 (13%) | |
Other | 68 (14%) | 376 (14%) | 62 (13%) | |
Refused/missing | 1 (0.2%) | 18 (0.7%) | 39 (8.5%) | |
White | 354 (75%) | 1,978 (75%) | 299 (65%) | |
Education level | <0.001 | |||
HS/GED | 104 (34%) | 531 (41%) | 56 (37%) | |
Less than HS | 23 (7.5%) | 290 (22%) | 39 (26%) | |
More than HS | 179 (58%) | 488 (37%) | 57 (38%) | |
Male | 336 (72%) | 1,724 (66%) | 291 (64%) | 0.019 |
Marital status | <0.001 | |||
Married or Partnered | 75 (25%) | 236 (18%) | 18 (13%) | |
Never married | 152 (50%) | 775 (59%) | 101 (71%) | |
Separated/Divorced/Widowed | 77 (25%) | 293 (22%) | 24 (17%) | |
1 Median (IQR) for Age; n (%) for all other variables | ||||
2 Kruskal-Wallis rank sum test; Pearson’s Chi-squared test |
7.7 Regression Table with tbl_regression()
Function
7.7.1 Creating Regression Model
Here, we are creating a logistic regression model where smoking status is the response variable, education is exploratory variable and age, race and sex are considered as confounders.
# Building the Multivariable logistic model
m1 <- glm(is_smoker ~ education + age + race + is_male,
table_1df,
family = binomial)
# View raw model results
summary(m1)$coefficients
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.50434562 0.389242905 9.00297879 2.196748e-19
educationLess than HS 1.01143171 0.252724965 4.00210447 6.278157e-05
educationMore than HS -0.60886151 0.144410039 -4.21619932 2.484542e-05
age -0.04564764 0.006417912 -7.11253757 1.139285e-12
raceOther -0.24842217 0.315210858 -0.78811425 4.306299e-01
raceRefused/missing 0.39602359 1.124629178 0.35213704 7.247355e-01
raceWhite -0.01922531 0.251971208 -0.07629961 9.391807e-01
is_maleYes -0.39712021 0.147363550 -2.69483338 7.042384e-03
7.7.2 Creating Table 3: Regression Table
Here, I am using tbl_regression
function to see the regression results in the table. The exponentiate = TRUE
shows the data as Odds Ratio after exponentiation of the beta values.
# Creating Regression Table
table_3 <- tbl_regression(m1, exponentiate = TRUE)
table_3
Characteristic | OR1 | 95% CI1 | p-value |
---|---|---|---|
education | |||
HS/GED | — | — | |
Less than HS | 2.75 | 1.71, 4.61 | <0.001 |
More than HS | 0.54 | 0.41, 0.72 | <0.001 |
age | 0.96 | 0.94, 0.97 | <0.001 |
race | |||
Black | — | — | |
Other | 0.78 | 0.42, 1.44 | 0.4 |
Refused/missing | 1.49 | 0.23, 29.4 | 0.7 |
White | 0.98 | 0.59, 1.59 | >0.9 |
is_male | |||
No | — | — | |
Yes | 0.67 | 0.50, 0.89 | 0.007 |
1 OR = Odds Ratio, CI = Confidence Interval |
7.7.3 Customizing Table 3
Here, I have customized the table 3 by using functions I applied in table 1.
# Customizing Regression Table
table_3 <- tbl_regression(m1, exponentiate = TRUE,
label = list(
age = "Age",
race = "Race",
education = "Education level",
is_male = "Male"
),
missing = "no"
) %>%
bold_labels() %>%
bold_p(t = 0.10) %>%
italicize_levels() %>%
modify_caption("Table 3. Logistic Regression for smoking status as response varialbe (n=3014)")
table_3
Characteristic | OR1 | 95% CI1 | p-value |
---|---|---|---|
Education level | |||
HS/GED | — | — | |
Less than HS | 2.75 | 1.71, 4.61 | <0.001 |
More than HS | 0.54 | 0.41, 0.72 | <0.001 |
Age | 0.96 | 0.94, 0.97 | <0.001 |
Race | |||
Black | — | — | |
Other | 0.78 | 0.42, 1.44 | 0.4 |
Refused/missing | 1.49 | 0.23, 29.4 | 0.7 |
White | 0.98 | 0.59, 1.59 | >0.9 |
Male | |||
No | — | — | |
Yes | 0.67 | 0.50, 0.89 | 0.007 |
1 OR = Odds Ratio, CI = Confidence Interval |
7.7.4 Interpreting Table 3
Interpreting the variable Education level
:
For individuals with less than high school education, the odds of being a smoker are 2.75 times higher compared to those with HS/GED, after adjusting for age, race, and sex.
Conversely, for individuals with more than high school education, the odds of being a smoker are 0.54 times lower compared to those with HS/GED, after adjusting for age, race, and sex.
Interpreting the variable Age
:
For each unit increase in age, the odds of being a smoker decrease by a factor of 0.96 (or 4%), after adjusting for education, race, and sex.
In R, for interpreting categorical variables, reference level is selected by alphabetic order, therefore, the HS/GED
is selected as reference level (H
), next one is Less than HS
(L
) and then More than HS
(M
).
7.7.5 Changing the Reference Level in Table 3
Often, we need to change the reference level as per our analysis need or aim of the study. We can select the specific reference level and run the table 3. First step is to check if the variable is in factor format. If it is not in factor format, we need to convert it into factor. Next, we can use the following codes to refer and use in table 3.
7.7.5.1 New Model with New Reference Level
Here I am creating model 2 (m2) wit the new reference as Less than HS
for the education
variable.
# Check factor format
str(table_1df$education) # It shows that it is in factor format.
Factor w/ 3 levels "HS/GED","Less than HS",..: 3 3 3 3 NA 1 3 NA 1 3 ...
# Building the glm model with specific reference level for education = "Less than HS".
m2 <- glm(is_smoker ~ relevel(factor(education), ref = "Less than HS") + age + race + is_male,
table_1df,
family = binomial)
# View raw model results
summary(m2)$coefficients
Estimate
(Intercept) 4.51577733
relevel(factor(education), ref = "Less than HS")HS/GED -1.01143171
relevel(factor(education), ref = "Less than HS")More than HS -1.62029322
age -0.04564764
raceOther -0.24842217
raceRefused/missing 0.39602359
raceWhite -0.01922531
is_maleYes -0.39712021
Std. Error
(Intercept) 0.436459823
relevel(factor(education), ref = "Less than HS")HS/GED 0.252724965
relevel(factor(education), ref = "Less than HS")More than HS 0.244106320
age 0.006417912
raceOther 0.315210858
raceRefused/missing 1.124629178
raceWhite 0.251971208
is_maleYes 0.147363550
z value
(Intercept) 10.34637575
relevel(factor(education), ref = "Less than HS")HS/GED -4.00210447
relevel(factor(education), ref = "Less than HS")More than HS -6.63765370
age -7.11253757
raceOther -0.78811425
raceRefused/missing 0.35213704
raceWhite -0.07629961
is_maleYes -2.69483338
Pr(>|z|)
(Intercept) 4.346284e-25
relevel(factor(education), ref = "Less than HS")HS/GED 6.278157e-05
relevel(factor(education), ref = "Less than HS")More than HS 3.187156e-11
age 1.139285e-12
raceOther 4.306299e-01
raceRefused/missing 7.247355e-01
raceWhite 9.391807e-01
is_maleYes 7.042384e-03
7.7.5.2 Creating and Customizing New Table 3 with New Reference Level
Here, I have created the new table 3 for m2 model and customized it accordingly.
# Customizing Regression Table
table_3n <- tbl_regression(m2, exponentiate = TRUE, # Creating the table
label = list(
age = "Age",
race = "Race",
education = "Education level",
is_male = "Male"
),
missing = "no"
) %>%
bold_labels() %>%
bold_p(t = 0.10) %>%
italicize_levels() %>%
modify_caption("Table 3. Logistic Regression for smoking status as response varialbe (n=3014)")
table_3n
Characteristic | OR1 | 95% CI1 | p-value |
---|---|---|---|
relevel(factor(education), ref = "Less than HS") | |||
Less than HS | — | — | |
HS/GED | 0.36 | 0.22, 0.59 | <0.001 |
More than HS | 0.20 | 0.12, 0.31 | <0.001 |
Age | 0.96 | 0.94, 0.97 | <0.001 |
Race | |||
Black | — | — | |
Other | 0.78 | 0.42, 1.44 | 0.4 |
Refused/missing | 1.49 | 0.23, 29.4 | 0.7 |
White | 0.98 | 0.59, 1.59 | >0.9 |
Male | |||
No | — | — | |
Yes | 0.67 | 0.50, 0.89 | 0.007 |
1 OR = Odds Ratio, CI = Confidence Interval |
7.8 Conclusion (Take Home Message)
- We can use
gtsummary
package for creating publication-ready tables. - The
tbl_summary()
and thetbl_regression()
are the frequently used functions in this package. - Multiple other functions can be used to customize the table and can address the journal requirements.