7  Table by gtsummary

Author

Tarana Ferdous

Published

May 28, 2024

7.1 Packages for this Lesson

# Installing Required Packages
# install.packages("public.ctn0094data")
# install.packages("tidyverse")
# install.packages("gtsummary")

# Loading Required Packages
library(public.ctn0094data)
library(tidyverse)
library(gtsummary)
library(dplyr) # for re-coding

7.2 Introduction to ‘gtsummary’

The gtsummary package is useful mainly for creating publication-ready tables (i.e.demographic table, simple summary table, contingency-table, regression table, etc.). The best feature of this package is it can automatically detect if the data is continuous, dichotomous or categorical, and which descriptive statistics needs to apply.

7.3 Data Source and Description

The public.ctn0094data package provides harmonized and normalized data sets from the CTN-0094 clinical trial. These data sets describe the experiences of care-seeking individuals suffering from opioid use disorder (OUD). The trial is part of the Clinical Trials Network (CTN) protocol number 0094, funded by the US National Institute of Drug Abuse (NIDA). It is used by the NIDA to develop, validate, refine, and deliver new treatment options to patients.

In this lesson, I used the demographics, and fagerstrom data sets from the public.ctn0094data package to demonstrate the gtsummary function. The demographics part contains the demographic variables such as age, sex, race, marital status etc. The fagerstrom part contains data on smoking habit (smoker/non-smoker, Fagerstrom Test for Nicotine Dependence Score (ranging from 0 to 10) ~ FTND, Number of cigarettes smoked per day.). The FTND is a questionnaire that assesses the physical dependence of adults on nicotine. The test uses yes/no questions scored from 0 to 1 and multiple-choice questions scored from 0 to 3, and the total score ranges from 0 to 10. The higher the score, the more intense the patient’s nicotine dependence is. The score categories are: 8+: High dependence, 7–5: Moderate dependence, 4–3: Low to moderate dependence and 0–2: Low dependence.

# Searching suitable data sets: You can skip 
data(package = "public.ctn0094data")
#data(demographics, package = "public.ctn0094data")
#names(demographics)
#data(fagerstrom, package = "public.ctn0094data")
#names(fagerstrom)
#table(fagerstrom$ftnd)

7.4 Creating Model Data Frames

The demographics and fagerstrom data sets within the public.ctn0094data package were joined by ID (who variable) and a new dta frame smoking_df is created.

# Joining data sets: 
smoking_df <- demographics %>% 
  left_join(fagerstrom, by = "who") 

7.5 Demographic Table with tbl_summary Function

7.5.1 Creating Table 1: Demographic Characteristic

In order to create a basic demographic table, I will now select which variables I want to show in the table and then use the tbl_summary function to create the table. I am also adding the description of the variables I included in my table.

  1. age: an integer variable that indicates the Age of the patient.
  2. race: a factor variable with levels ‘Black’, ‘Other Refused/missing’, and ‘White’, which represents the Self-reported race of the patient.
  3. education: a factor variable denotes the Education level at intake, with levels ‘HS/GED’ for high school graduate or equivalent, ‘Less than HS’ for less than high school education, ‘More than HS’ for some education beyond high school, and ‘Missing’ if the information is not provided.
  4. is_male: a factor variable with levels ‘No’ and ‘Yes’, describing the Sex (not gender) of the patient, where ‘Yes’ indicates male.
  5. marital: a factor variable indicating the Marital status at intake, with levels ‘Married or Partnered’, ‘Never married’, ‘Separated/Divorced/Widowed’, and ‘Not answered’ if the question was not asked during intake.
  6. is_smoker: a factor indicating whether the patient is a smoker or not. Levels include “No” (not a smoker) and “Yes” (a smoker).
# Selecting variables in a new data frame `table_1df` for table 1
table_1df <- smoking_df %>% 
  select(age, race, education, is_male, marital, is_smoker)

# Table 1
table_1 <- table_1df  %>% tbl_summary()

table_1
Characteristic N = 3,5601
age 34 (27, 45)
    Unknown 208
race
    Black 365 (10%)
    Other 506 (14%)
    Refused/missing 58 (1.6%)
    White 2,631 (74%)
education
    HS/GED 691 (39%)
    Less than HS 352 (20%)
    More than HS 724 (41%)
    Unknown 1,793
is_male 2,351 (66%)
    Unknown 4
marital
    Married or Partnered 329 (19%)
    Never married 1,028 (59%)
    Separated/Divorced/Widowed 394 (23%)
    Unknown 1,809
is_smoker 2,631 (85%)
    Unknown 460
1 Median (IQR); n (%)

7.5.2 Customizing Table 1: Changing the Label

I am using label function to change the label of all variables. Other customization will be shown in the next contingency table.

# Changing the Label

table_1 <-
  table_1df %>% 
  tbl_summary(
    label = list(
      age = "Age",
      race = "Race",
      education = "Education level",
      is_male = "Male",
      marital = "Marital status",
      is_smoker = "Smoker"
    )
  )

table_1
Characteristic N = 3,5601
Age 34 (27, 45)
    Unknown 208
Race
    Black 365 (10%)
    Other 506 (14%)
    Refused/missing 58 (1.6%)
    White 2,631 (74%)
Education level
    HS/GED 691 (39%)
    Less than HS 352 (20%)
    More than HS 724 (41%)
    Unknown 1,793
Male 2,351 (66%)
    Unknown 4
Marital status
    Married or Partnered 329 (19%)
    Never married 1,028 (59%)
    Separated/Divorced/Widowed 394 (23%)
    Unknown 1,809
Smoker 2,631 (85%)
    Unknown 460
1 Median (IQR); n (%)

7.6 Contingency Table with tbl_summary Function

7.6.1 Creating Table 2: Demographic Variables by Smoking Status

I will now show the table 1 demographic variables by smoking habit status (is_smoker, Yes = smoker and No = non-smokers)

# Contingency table 
table_2 <- table_1df %>% tbl_summary(by = is_smoker) 

table_2
Characteristic No, N = 4691 Yes, N = 2,6311
age 36 (28, 47) 33 (26, 44)
    Unknown 7 79
race

    Black 46 (9.8%) 259 (9.8%)
    Other 68 (14%) 376 (14%)
    Refused/missing 1 (0.2%) 18 (0.7%)
    White 354 (75%) 1,978 (75%)
education

    HS/GED 104 (34%) 531 (41%)
    Less than HS 23 (7.5%) 290 (22%)
    More than HS 179 (58%) 488 (37%)
    Unknown 163 1,322
is_male 336 (72%) 1,724 (66%)
marital

    Married or Partnered 75 (25%) 236 (18%)
    Never married 152 (50%) 775 (59%)
    Separated/Divorced/Widowed 77 (25%) 293 (22%)
    Unknown 165 1,327
1 Median (IQR); n (%)

7.6.2 Removing Missing Data

If I do not want to show the missing data in my table, I will use missing = "no".

# Removing Missing Data
table_2nm <- table_1df %>% tbl_summary(by = is_smoker,
                                   missing = "no") 
table_2nm
Characteristic No, N = 4691 Yes, N = 2,6311
age 36 (28, 47) 33 (26, 44)
race

    Black 46 (9.8%) 259 (9.8%)
    Other 68 (14%) 376 (14%)
    Refused/missing 1 (0.2%) 18 (0.7%)
    White 354 (75%) 1,978 (75%)
education

    HS/GED 104 (34%) 531 (41%)
    Less than HS 23 (7.5%) 290 (22%)
    More than HS 179 (58%) 488 (37%)
is_male 336 (72%) 1,724 (66%)
marital

    Married or Partnered 75 (25%) 236 (18%)
    Never married 152 (50%) 775 (59%)
    Separated/Divorced/Widowed 77 (25%) 293 (22%)
1 Median (IQR); n (%)

7.6.3 Applying Statistical Tests

I will use add_p function to show the statistical analysis. This will automatically detect if data in each variable is continuous, dichotomous or categorical, and apply the appropriate descriptive statistics accordingly.

# Adding p-value
table_2 <- table_1df %>% tbl_summary(by = is_smoker,
                                   missing = "no") %>% 
  add_p()

table_2
Characteristic No, N = 4691 Yes, N = 2,6311 p-value2
age 36 (28, 47) 33 (26, 44) <0.001
race

0.8
    Black 46 (9.8%) 259 (9.8%)
    Other 68 (14%) 376 (14%)
    Refused/missing 1 (0.2%) 18 (0.7%)
    White 354 (75%) 1,978 (75%)
education

<0.001
    HS/GED 104 (34%) 531 (41%)
    Less than HS 23 (7.5%) 290 (22%)
    More than HS 179 (58%) 488 (37%)
is_male 336 (72%) 1,724 (66%) 0.010
marital

0.006
    Married or Partnered 75 (25%) 236 (18%)
    Never married 152 (50%) 775 (59%)
    Separated/Divorced/Widowed 77 (25%) 293 (22%)
1 Median (IQR); n (%)
2 Wilcoxon rank sum test; Fisher’s exact test; Pearson’s Chi-squared test

Note: The footnote 2 shows all the statistical tests applied to this table. It can be understandable from the table that for categorical variable it applied Pearson’s Chi-squared test, for continuous non-normal distributed variable it applied Wilcoxon rank sum test; and for small sample data, it applied Fisher’s exact test. It would be great to see different footnotes for each of the test next to each p-value, however, I did not find a way to do that.

7.6.4 Customizing Table 2(a)

I will now customize the table 2 to show total number and overall number and show missing values by using the following functions:

# Adding total and overall number 
table_2 <- table_1df %>% tbl_summary(by = is_smoker,
                                   label = list(
                                     age = "Age",
                                     race = "Race",
                                     education = "Education level",
                                     is_male = "Male",
                                     marital = "Marital status"
                                   ),
                                   missing_text = "(Missing)"
                                  ) %>% 
  add_p() %>%
  add_n() %>%
  add_overall() 

table_2
Characteristic N Overall, N = 3,1001 No, N = 4691 Yes, N = 2,6311 p-value2
Age 3,014 34 (27, 45) 36 (28, 47) 33 (26, 44) <0.001
    (Missing)
86 7 79
Race 3,100


0.8
    Black
305 (9.8%) 46 (9.8%) 259 (9.8%)
    Other
444 (14%) 68 (14%) 376 (14%)
    Refused/missing
19 (0.6%) 1 (0.2%) 18 (0.7%)
    White
2,332 (75%) 354 (75%) 1,978 (75%)
Education level 1,615


<0.001
    HS/GED
635 (39%) 104 (34%) 531 (41%)
    Less than HS
313 (19%) 23 (7.5%) 290 (22%)
    More than HS
667 (41%) 179 (58%) 488 (37%)
    (Missing)
1,485 163 1,322
Male 3,100 2,060 (66%) 336 (72%) 1,724 (66%) 0.010
Marital status 1,608


0.006
    Married or Partnered
311 (19%) 75 (25%) 236 (18%)
    Never married
927 (58%) 152 (50%) 775 (59%)
    Separated/Divorced/Widowed
370 (23%) 77 (25%) 293 (22%)
    (Missing)
1,492 165 1,327
1 Median (IQR); n (%)
2 Wilcoxon rank sum test; Fisher’s exact test; Pearson’s Chi-squared test

7.6.5 Customizing Table 2(b)

I will now customize the title, caption and header and made the variable names bold of table 2 by using the following functions:

# Adding title, caption and header 
table_2 <- table_1df %>% tbl_summary(by = is_smoker,
                                   label = list(
                                     age = "Age",
                                     race = "Race",
                                     education = "Education level",
                                     is_male = "Male",
                                     marital = "Marital status"
                                   ),
                                   missing_text = "(Missing)"
                                  ) %>% 
  add_p() %>%
  add_n() %>%
  add_overall() %>%
  bold_labels() %>%
  modify_caption("Table 2. Demographic characteristics according to smoking status") %>%
  modify_header(label ~ "**Demographic characteristics**") %>%
  modify_spanning_header(c("stat_1", "stat_2") ~ "**Smoking status**") 
  
table_2
Table 2. Demographic characteristics according to smoking status
Demographic characteristics N Overall, N = 3,1001 Smoking status p-value2
No, N = 4691 Yes, N = 2,6311
Age 3,014 34 (27, 45) 36 (28, 47) 33 (26, 44) <0.001
    (Missing)
86 7 79
Race 3,100


0.8
    Black
305 (9.8%) 46 (9.8%) 259 (9.8%)
    Other
444 (14%) 68 (14%) 376 (14%)
    Refused/missing
19 (0.6%) 1 (0.2%) 18 (0.7%)
    White
2,332 (75%) 354 (75%) 1,978 (75%)
Education level 1,615


<0.001
    HS/GED
635 (39%) 104 (34%) 531 (41%)
    Less than HS
313 (19%) 23 (7.5%) 290 (22%)
    More than HS
667 (41%) 179 (58%) 488 (37%)
    (Missing)
1,485 163 1,322
Male 3,100 2,060 (66%) 336 (72%) 1,724 (66%) 0.010
Marital status 1,608


0.006
    Married or Partnered
311 (19%) 75 (25%) 236 (18%)
    Never married
927 (58%) 152 (50%) 775 (59%)
    Separated/Divorced/Widowed
370 (23%) 77 (25%) 293 (22%)
    (Missing)
1,492 165 1,327
1 Median (IQR); n (%)
2 Wilcoxon rank sum test; Fisher’s exact test; Pearson’s Chi-squared test

7.6.6 Customizing Table 2(c)

Here, I am keeping only those customization that I prefer to have in my final table 2.

# Final table
table_2 <- table_1df %>% tbl_summary(by = is_smoker,
                                   label = list(
                                     age = "Age",
                                     race = "Race",
                                     education = "Education level",
                                     is_male = "Male",
                                     marital = "Marital status"
                                   ),
                                   missing = "no"
                                  ) %>% 
  add_p() %>%
  #add_n() %>%
  #add_overall() %>%
  bold_labels() %>%
  modify_caption("Table 2. Demographic characteristics according to smoking status") %>%
  modify_header(label ~ "**Characteristics**") %>%
  modify_spanning_header(c("stat_1", "stat_2") ~ "**Smoking Status**") %>%
  modify_footnote(all_stat_cols() ~ "Median (IQR) for Age; n (%) for all other variables") 
  
table_2
Table 2. Demographic characteristics according to smoking status
Characteristics Smoking Status p-value2
No, N = 4691 Yes, N = 2,6311
Age 36 (28, 47) 33 (26, 44) <0.001
Race

0.8
    Black 46 (9.8%) 259 (9.8%)
    Other 68 (14%) 376 (14%)
    Refused/missing 1 (0.2%) 18 (0.7%)
    White 354 (75%) 1,978 (75%)
Education level

<0.001
    HS/GED 104 (34%) 531 (41%)
    Less than HS 23 (7.5%) 290 (22%)
    More than HS 179 (58%) 488 (37%)
Male 336 (72%) 1,724 (66%) 0.010
Marital status

0.006
    Married or Partnered 75 (25%) 236 (18%)
    Never married 152 (50%) 775 (59%)
    Separated/Divorced/Widowed 77 (25%) 293 (22%)
1 Median (IQR) for Age; n (%) for all other variables
2 Wilcoxon rank sum test; Fisher’s exact test; Pearson’s Chi-squared test

7.6.7 Interpretation of Table 2

Interpreting the variable Education level:

Null Hypothesis (H₀): There is no association between education level and smoking status.

Alternative Hypothesis (H₁): There is an association between education level and smoking status.

Since the p-value is less than 0.001, we reject the null hypothesis. This indicates that there is a statistically significant association between education level and smoking status. However, to understand the nature of this association (whether education level affects smoking status or vice versa), further analysis would be needed.

7.6.8 Missing value distribution in Table 2

We often want to see the missing value distribution among the the demographic variables. For example, we want to see the missing value distribution for the smoking status variable. First, we need to re-code the NA into a new category for is_smoker variable and recreate the table.

7.6.8.1 Missing value data creation

# Recoding `is_smoker` variable into `is_smoker_new`
table_1df <- table_1df %>% 
  mutate(is_smoker_new = ifelse(is.na(is_smoker), 99, is_smoker))  # converting all NA to 99

# Convert into factor
table_1df$is_smoker_new <- factor(table_1df$is_smoker_new,
                                  levels = c(1, 2, 99),
                                  labels = c("No", "Yes", "Missing"))

# New data frame 
table_1df_new <- table_1df %>% 
  select(age, race, education, is_male, marital, is_smoker_new)

7.6.8.2 Missing value table creation

# Final table
table_2miss <- table_1df_new %>% tbl_summary(by = is_smoker_new,
                                   label = list(
                                     age = "Age",
                                     race = "Race",
                                     education = "Education level",
                                     is_male = "Male",
                                     marital = "Marital status"
                                   ),
                                   missing = "no"
                                  ) %>% 
  add_p() %>%
  #add_n() %>%
  #add_overall() %>%
  bold_labels() %>%
  modify_caption("Table 2. Demographic characteristics according to smoking status") %>%
  modify_header(label ~ "**Characteristics**") %>%
  modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**Smoking Status**") %>%
  modify_footnote(all_stat_cols() ~ "Median (IQR) for Age; n (%) for all other variables") 
  
table_2miss
Table 2. Demographic characteristics according to smoking status
Characteristics Smoking Status p-value2
No, N = 4691 Yes, N = 2,6311 Missing, N = 4601
Age 36 (28, 47) 33 (26, 44) 39 (29, 47) <0.001
Race


<0.001
    Black 46 (9.8%) 259 (9.8%) 60 (13%)
    Other 68 (14%) 376 (14%) 62 (13%)
    Refused/missing 1 (0.2%) 18 (0.7%) 39 (8.5%)
    White 354 (75%) 1,978 (75%) 299 (65%)
Education level


<0.001
    HS/GED 104 (34%) 531 (41%) 56 (37%)
    Less than HS 23 (7.5%) 290 (22%) 39 (26%)
    More than HS 179 (58%) 488 (37%) 57 (38%)
Male 336 (72%) 1,724 (66%) 291 (64%) 0.019
Marital status


<0.001
    Married or Partnered 75 (25%) 236 (18%) 18 (13%)
    Never married 152 (50%) 775 (59%) 101 (71%)
    Separated/Divorced/Widowed 77 (25%) 293 (22%) 24 (17%)
1 Median (IQR) for Age; n (%) for all other variables
2 Kruskal-Wallis rank sum test; Pearson’s Chi-squared test

7.7 Regression Table with tbl_regression() Function

7.7.1 Creating Regression Model

Here, we are creating a logistic regression model where smoking status is the response variable, education is exploratory variable and age, race and sex are considered as confounders.

# Building the Multivariable logistic model
m1 <- glm(is_smoker ~  education + age + race + is_male, 
          table_1df, 
          family = binomial)

# View raw model results
summary(m1)$coefficients
                         Estimate  Std. Error     z value     Pr(>|z|)
(Intercept)            3.50434562 0.389242905  9.00297879 2.196748e-19
educationLess than HS  1.01143171 0.252724965  4.00210447 6.278157e-05
educationMore than HS -0.60886151 0.144410039 -4.21619932 2.484542e-05
age                   -0.04564764 0.006417912 -7.11253757 1.139285e-12
raceOther             -0.24842217 0.315210858 -0.78811425 4.306299e-01
raceRefused/missing    0.39602359 1.124629178  0.35213704 7.247355e-01
raceWhite             -0.01922531 0.251971208 -0.07629961 9.391807e-01
is_maleYes            -0.39712021 0.147363550 -2.69483338 7.042384e-03

7.7.2 Creating Table 3: Regression Table

Here, I am using tbl_regression function to see the regression results in the table. The exponentiate = TRUE shows the data as Odds Ratio after exponentiation of the beta values.

# Creating Regression Table 
table_3 <- tbl_regression(m1, exponentiate = TRUE)

table_3
Characteristic OR1 95% CI1 p-value
education


    HS/GED
    Less than HS 2.75 1.71, 4.61 <0.001
    More than HS 0.54 0.41, 0.72 <0.001
age 0.96 0.94, 0.97 <0.001
race


    Black
    Other 0.78 0.42, 1.44 0.4
    Refused/missing 1.49 0.23, 29.4 0.7
    White 0.98 0.59, 1.59 >0.9
is_male


    No
    Yes 0.67 0.50, 0.89 0.007
1 OR = Odds Ratio, CI = Confidence Interval

7.7.3 Customizing Table 3

Here, I have customized the table 3 by using functions I applied in table 1.

# Customizing Regression Table 
table_3 <- tbl_regression(m1, exponentiate = TRUE,
                           label = list(
                             age = "Age",
                             race = "Race",
                             education = "Education level",
                             is_male = "Male"
                             ),
                          missing = "no"
                          ) %>% 
  bold_labels() %>%
  bold_p(t = 0.10) %>%  
  italicize_levels() %>%
  modify_caption("Table 3. Logistic Regression for smoking status as response varialbe (n=3014)")

table_3
Table 3. Logistic Regression for smoking status as response varialbe (n=3014)
Characteristic OR1 95% CI1 p-value
Education level


    HS/GED
    Less than HS 2.75 1.71, 4.61 <0.001
    More than HS 0.54 0.41, 0.72 <0.001
Age 0.96 0.94, 0.97 <0.001
Race


    Black
    Other 0.78 0.42, 1.44 0.4
    Refused/missing 1.49 0.23, 29.4 0.7
    White 0.98 0.59, 1.59 >0.9
Male


    No
    Yes 0.67 0.50, 0.89 0.007
1 OR = Odds Ratio, CI = Confidence Interval

7.7.4 Interpreting Table 3

Interpreting the variable Education level:

For individuals with less than high school education, the odds of being a smoker are 2.75 times higher compared to those with HS/GED, after adjusting for age, race, and sex.

Conversely, for individuals with more than high school education, the odds of being a smoker are 0.54 times lower compared to those with HS/GED, after adjusting for age, race, and sex.

Interpreting the variable Age:

For each unit increase in age, the odds of being a smoker decrease by a factor of 0.96 (or 4%), after adjusting for education, race, and sex.

In R, for interpreting categorical variables, reference level is selected by alphabetic order, therefore, the HS/GED is selected as reference level (H), next one is Less than HS (L) and then More than HS (M).

7.7.5 Changing the Reference Level in Table 3

Often, we need to change the reference level as per our analysis need or aim of the study. We can select the specific reference level and run the table 3. First step is to check if the variable is in factor format. If it is not in factor format, we need to convert it into factor. Next, we can use the following codes to refer and use in table 3.

7.7.5.1 New Model with New Reference Level

Here I am creating model 2 (m2) wit the new reference as Less than HS for the education variable.

# Check factor format
str(table_1df$education) # It shows that it is in factor format.
 Factor w/ 3 levels "HS/GED","Less than HS",..: 3 3 3 3 NA 1 3 NA 1 3 ...
# Building the glm model with specific reference level for education  = "Less than HS".
m2 <- glm(is_smoker ~  relevel(factor(education), ref = "Less than HS")  + age + race + is_male, 
          table_1df, 
          family = binomial)

# View raw model results
summary(m2)$coefficients
                                                                Estimate
(Intercept)                                                   4.51577733
relevel(factor(education), ref = "Less than HS")HS/GED       -1.01143171
relevel(factor(education), ref = "Less than HS")More than HS -1.62029322
age                                                          -0.04564764
raceOther                                                    -0.24842217
raceRefused/missing                                           0.39602359
raceWhite                                                    -0.01922531
is_maleYes                                                   -0.39712021
                                                              Std. Error
(Intercept)                                                  0.436459823
relevel(factor(education), ref = "Less than HS")HS/GED       0.252724965
relevel(factor(education), ref = "Less than HS")More than HS 0.244106320
age                                                          0.006417912
raceOther                                                    0.315210858
raceRefused/missing                                          1.124629178
raceWhite                                                    0.251971208
is_maleYes                                                   0.147363550
                                                                 z value
(Intercept)                                                  10.34637575
relevel(factor(education), ref = "Less than HS")HS/GED       -4.00210447
relevel(factor(education), ref = "Less than HS")More than HS -6.63765370
age                                                          -7.11253757
raceOther                                                    -0.78811425
raceRefused/missing                                           0.35213704
raceWhite                                                    -0.07629961
is_maleYes                                                   -2.69483338
                                                                 Pr(>|z|)
(Intercept)                                                  4.346284e-25
relevel(factor(education), ref = "Less than HS")HS/GED       6.278157e-05
relevel(factor(education), ref = "Less than HS")More than HS 3.187156e-11
age                                                          1.139285e-12
raceOther                                                    4.306299e-01
raceRefused/missing                                          7.247355e-01
raceWhite                                                    9.391807e-01
is_maleYes                                                   7.042384e-03

7.7.5.2 Creating and Customizing New Table 3 with New Reference Level

Here, I have created the new table 3 for m2 model and customized it accordingly.

# Customizing Regression Table 
table_3n <- tbl_regression(m2, exponentiate = TRUE,  # Creating the table
                           label = list(
                             age = "Age",
                             race = "Race",
                             education = "Education level",
                             is_male = "Male"
                             ),
                          missing = "no"
                          ) %>% 
  bold_labels() %>%
  bold_p(t = 0.10) %>%  
  italicize_levels() %>%
  modify_caption("Table 3. Logistic Regression for smoking status as response varialbe (n=3014)")

table_3n
Table 3. Logistic Regression for smoking status as response varialbe (n=3014)
Characteristic OR1 95% CI1 p-value
relevel(factor(education), ref = "Less than HS")


    Less than HS
    HS/GED 0.36 0.22, 0.59 <0.001
    More than HS 0.20 0.12, 0.31 <0.001
Age 0.96 0.94, 0.97 <0.001
Race


    Black
    Other 0.78 0.42, 1.44 0.4
    Refused/missing 1.49 0.23, 29.4 0.7
    White 0.98 0.59, 1.59 >0.9
Male


    No
    Yes 0.67 0.50, 0.89 0.007
1 OR = Odds Ratio, CI = Confidence Interval

7.8 Conclusion (Take Home Message)

  1. We can use gtsummary package for creating publication-ready tables.
  2. The tbl_summary() and the tbl_regression() are the frequently used functions in this package.
  3. Multiple other functions can be used to customize the table and can address the journal requirements.