An R Cookbook for Public Health - 7 Table by gtsummary

7.1 Packages for this Lesson

# Installing Required Packages
# install.packages("public.ctn0094data")
# install.packages("tidyverse")
# install.packages("gtsummary")

# Loading Required Packages
library(public.ctn0094data)
library(tidyverse)
library(gtsummary)
library(dplyr) # for re-coding

7.2 Introduction to ‘gtsummary’

The gtsummary package is useful mainly for creating publication-ready tables (i.e.demographic table, simple summary table, contingency-table, regression table, etc.). The best feature of this package is it can automatically detect if the data is continuous, dichotomous or categorical, and which descriptive statistics needs to apply.

7.3 Data Source and Description

The public.ctn0094data package provides harmonized and normalized data sets from the CTN-0094 clinical trial. These data sets describe the experiences of care-seeking individuals suffering from opioid use disorder (OUD). The trial is part of the Clinical Trials Network (CTN) protocol number 0094, funded by the US National Institute of Drug Abuse (NIDA). It is used by the NIDA to develop, validate, refine, and deliver new treatment options to patients.

In this lesson, I used the demographics, and fagerstrom data sets from the public.ctn0094data package to demonstrate the gtsummary function. The demographics part contains the demographic variables such as age, sex, race, marital status etc. The fagerstrom part contains data on smoking habit (smoker/non-smoker, Fagerstrom Test for Nicotine Dependence Score (ranging from 0 to 10) ~ FTND, Number of cigarettes smoked per day.). The FTND is a questionnaire that assesses the physical dependence of adults on nicotine. The test uses yes/no questions scored from 0 to 1 and multiple-choice questions scored from 0 to 3, and the total score ranges from 0 to 10. The higher the score, the more intense the patient’s nicotine dependence is. The score categories are: 8+: High dependence, 7–5: Moderate dependence, 4–3: Low to moderate dependence and 0–2: Low dependence.

# Searching suitable data sets: You can skip 
data(package = "public.ctn0094data")
#data(demographics, package = "public.ctn0094data")
#names(demographics)
#data(fagerstrom, package = "public.ctn0094data")
#names(fagerstrom)
#table(fagerstrom$ftnd)

7.4 Creating Model Data Frames

The demographics and fagerstrom data sets within the public.ctn0094data package were joined by ID (who variable) and a new dta frame smoking_df is created.

# Joining data sets: 
smoking_df <- demographics %>% 
  left_join(fagerstrom, by = "who")

7.5 Demographic Table with `tbl_summary` Function

7.5.1 Creating Table 1: Demographic Characteristic

In order to create a basic demographic table, I will now select which variables I want to show in the table and then use the tbl_summary function to create the table. I am also adding the description of the variables I included in my table.

age: an integer variable that indicates the Age of the patient.
race: a factor variable with levels ‘Black’, ‘Other Refused/missing’, and ‘White’, which represents the Self-reported race of the patient.
education: a factor variable denotes the Education level at intake, with levels ‘HS/GED’ for high school graduate or equivalent, ‘Less than HS’ for less than high school education, ‘More than HS’ for some education beyond high school, and ‘Missing’ if the information is not provided.
is_male: a factor variable with levels ‘No’ and ‘Yes’, describing the Sex (not gender) of the patient, where ‘Yes’ indicates male.
marital: a factor variable indicating the Marital status at intake, with levels ‘Married or Partnered’, ‘Never married’, ‘Separated/Divorced/Widowed’, and ‘Not answered’ if the question was not asked during intake.
is_smoker: a factor indicating whether the patient is a smoker or not. Levels include “No” (not a smoker) and “Yes” (a smoker).

# Selecting variables in a new data frame `table_1df` for table 1
table_1df <- smoking_df %>% 
  select(age, race, education, is_male, marital, is_smoker)

# Table 1
table_1 <- table_1df  %>% tbl_summary()

table_1

Characteristic	N = 3,560¹
age	34 (27, 45)
Unknown	208
race
Black	365 (10%)
Other	506 (14%)
Refused/missing	58 (1.6%)
White	2,631 (74%)
education
HS/GED	691 (39%)
Less than HS	352 (20%)
More than HS	724 (41%)
Unknown	1,793
is_male	2,351 (66%)
Unknown	4
marital
Married or Partnered	329 (19%)
Never married	1,028 (59%)
Separated/Divorced/Widowed	394 (23%)
Unknown	1,809
is_smoker	2,631 (85%)
Unknown	460
¹ Median (IQR); n (%)

7.5.2 Customizing Table 1: Changing the Label

I am using label function to change the label of all variables. Other customization will be shown in the next contingency table.

# Changing the Label

table_1 <-
  table_1df %>% 
  tbl_summary(
    label = list(
      age = "Age",
      race = "Race",
      education = "Education level",
      is_male = "Male",
      marital = "Marital status",
      is_smoker = "Smoker"
    )
  )

table_1

Characteristic	N = 3,560¹
Age	34 (27, 45)
Unknown	208
Race
Black	365 (10%)
Other	506 (14%)
Refused/missing	58 (1.6%)
White	2,631 (74%)
Education level
HS/GED	691 (39%)
Less than HS	352 (20%)
More than HS	724 (41%)
Unknown	1,793
Male	2,351 (66%)
Unknown	4
Marital status
Married or Partnered	329 (19%)
Never married	1,028 (59%)
Separated/Divorced/Widowed	394 (23%)
Unknown	1,809
Smoker	2,631 (85%)
Unknown	460
¹ Median (IQR); n (%)

7.6 Contingency Table with `tbl_summary` Function

7.6.1 Creating Table 2: Demographic Variables by Smoking Status

I will now show the table 1 demographic variables by smoking habit status (is_smoker, Yes = smoker and No = non-smokers)

# Contingency table 
table_2 <- table_1df %>% tbl_summary(by = is_smoker) 

table_2

Characteristic	No, N = 469¹	Yes, N = 2,631¹
age	36 (28, 47)	33 (26, 44)
Unknown	7	79
race
Black	46 (9.8%)	259 (9.8%)
Other	68 (14%)	376 (14%)
Refused/missing	1 (0.2%)	18 (0.7%)
White	354 (75%)	1,978 (75%)
education
HS/GED	104 (34%)	531 (41%)
Less than HS	23 (7.5%)	290 (22%)
More than HS	179 (58%)	488 (37%)
Unknown	163	1,322
is_male	336 (72%)	1,724 (66%)
marital
Married or Partnered	75 (25%)	236 (18%)
Never married	152 (50%)	775 (59%)
Separated/Divorced/Widowed	77 (25%)	293 (22%)
Unknown	165	1,327
¹ Median (IQR); n (%)

7.6.2 Removing Missing Data

If I do not want to show the missing data in my table, I will use missing = "no".

# Removing Missing Data
table_2nm <- table_1df %>% tbl_summary(by = is_smoker,
                                   missing = "no") 
table_2nm

Characteristic	No, N = 469¹	Yes, N = 2,631¹
age	36 (28, 47)	33 (26, 44)
race
Black	46 (9.8%)	259 (9.8%)
Other	68 (14%)	376 (14%)
Refused/missing	1 (0.2%)	18 (0.7%)
White	354 (75%)	1,978 (75%)
education
HS/GED	104 (34%)	531 (41%)
Less than HS	23 (7.5%)	290 (22%)
More than HS	179 (58%)	488 (37%)
is_male	336 (72%)	1,724 (66%)
marital
Married or Partnered	75 (25%)	236 (18%)
Never married	152 (50%)	775 (59%)
Separated/Divorced/Widowed	77 (25%)	293 (22%)
¹ Median (IQR); n (%)

7.6.3 Applying Statistical Tests

I will use add_p function to show the statistical analysis. This will automatically detect if data in each variable is continuous, dichotomous or categorical, and apply the appropriate descriptive statistics accordingly.

# Adding p-value
table_2 <- table_1df %>% tbl_summary(by = is_smoker,
                                   missing = "no") %>% 
  add_p()

table_2

Characteristic	No, N = 469¹	Yes, N = 2,631¹	p-value²
age	36 (28, 47)	33 (26, 44)	<0.001
race			0.8
Black	46 (9.8%)	259 (9.8%)
Other	68 (14%)	376 (14%)
Refused/missing	1 (0.2%)	18 (0.7%)
White	354 (75%)	1,978 (75%)
education			<0.001
HS/GED	104 (34%)	531 (41%)
Less than HS	23 (7.5%)	290 (22%)
More than HS	179 (58%)	488 (37%)
is_male	336 (72%)	1,724 (66%)	0.010
marital			0.006
Married or Partnered	75 (25%)	236 (18%)
Never married	152 (50%)	775 (59%)
Separated/Divorced/Widowed	77 (25%)	293 (22%)
¹ Median (IQR); n (%)
² Wilcoxon rank sum test; Fisher’s exact test; Pearson’s Chi-squared test

Note: The footnote 2 shows all the statistical tests applied to this table. It can be understandable from the table that for categorical variable it applied Pearson’s Chi-squared test, for continuous non-normal distributed variable it applied Wilcoxon rank sum test; and for small sample data, it applied Fisher’s exact test. It would be great to see different footnotes for each of the test next to each p-value, however, I did not find a way to do that.

7.6.4 Customizing Table 2(a)

I will now customize the table 2 to show total number and overall number and show missing values by using the following functions:

# Adding total and overall number 
table_2 <- table_1df %>% tbl_summary(by = is_smoker,
                                   label = list(
                                     age = "Age",
                                     race = "Race",
                                     education = "Education level",
                                     is_male = "Male",
                                     marital = "Marital status"
                                   ),
                                   missing_text = "(Missing)"
                                  ) %>% 
  add_p() %>%
  add_n() %>%
  add_overall() 

table_2

Characteristic	N	Overall, N = 3,100¹	No, N = 469¹	Yes, N = 2,631¹	p-value²
Age	3,014	34 (27, 45)	36 (28, 47)	33 (26, 44)	<0.001
(Missing)		86	7	79
Race	3,100				0.8
Black		305 (9.8%)	46 (9.8%)	259 (9.8%)
Other		444 (14%)	68 (14%)	376 (14%)
Refused/missing		19 (0.6%)	1 (0.2%)	18 (0.7%)
White		2,332 (75%)	354 (75%)	1,978 (75%)
Education level	1,615				<0.001
HS/GED		635 (39%)	104 (34%)	531 (41%)
Less than HS		313 (19%)	23 (7.5%)	290 (22%)
More than HS		667 (41%)	179 (58%)	488 (37%)
(Missing)		1,485	163	1,322
Male	3,100	2,060 (66%)	336 (72%)	1,724 (66%)	0.010
Marital status	1,608				0.006
Married or Partnered		311 (19%)	75 (25%)	236 (18%)
Never married		927 (58%)	152 (50%)	775 (59%)
Separated/Divorced/Widowed		370 (23%)	77 (25%)	293 (22%)
(Missing)		1,492	165	1,327
¹ Median (IQR); n (%)
² Wilcoxon rank sum test; Fisher’s exact test; Pearson’s Chi-squared test

7.6.5 Customizing Table 2(b)

I will now customize the title, caption and header and made the variable names bold of table 2 by using the following functions:

# Adding title, caption and header 
table_2 <- table_1df %>% tbl_summary(by = is_smoker,
                                   label = list(
                                     age = "Age",
                                     race = "Race",
                                     education = "Education level",
                                     is_male = "Male",
                                     marital = "Marital status"
                                   ),
                                   missing_text = "(Missing)"
                                  ) %>% 
  add_p() %>%
  add_n() %>%
  add_overall() %>%
  bold_labels() %>%
  modify_caption("Table 2. Demographic characteristics according to smoking status") %>%
  modify_header(label ~ "**Demographic characteristics**") %>%
  modify_spanning_header(c("stat_1", "stat_2") ~ "**Smoking status**") 
  
table_2

Table 2. Demographic characteristics according to smoking status
Demographic characteristics	N	Overall, N = 3,100¹	Smoking status		p-value²
Demographic characteristics	N	Overall, N = 3,100¹	No, N = 469¹	Yes, N = 2,631¹	p-value²
Age	3,014	34 (27, 45)	36 (28, 47)	33 (26, 44)	<0.001
(Missing)		86	7	79
Race	3,100				0.8
Black		305 (9.8%)	46 (9.8%)	259 (9.8%)
Other		444 (14%)	68 (14%)	376 (14%)
Refused/missing		19 (0.6%)	1 (0.2%)	18 (0.7%)
White		2,332 (75%)	354 (75%)	1,978 (75%)
Education level	1,615				<0.001
HS/GED		635 (39%)	104 (34%)	531 (41%)
Less than HS		313 (19%)	23 (7.5%)	290 (22%)
More than HS		667 (41%)	179 (58%)	488 (37%)
(Missing)		1,485	163	1,322
Male	3,100	2,060 (66%)	336 (72%)	1,724 (66%)	0.010
Marital status	1,608				0.006
Married or Partnered		311 (19%)	75 (25%)	236 (18%)
Never married		927 (58%)	152 (50%)	775 (59%)
Separated/Divorced/Widowed		370 (23%)	77 (25%)	293 (22%)
(Missing)		1,492	165	1,327
¹ Median (IQR); n (%)
² Wilcoxon rank sum test; Fisher’s exact test; Pearson’s Chi-squared test

7.6.6 Customizing Table 2(c)

Here, I am keeping only those customization that I prefer to have in my final table 2.

# Final table
table_2 <- table_1df %>% tbl_summary(by = is_smoker,
                                   label = list(
                                     age = "Age",
                                     race = "Race",
                                     education = "Education level",
                                     is_male = "Male",
                                     marital = "Marital status"
                                   ),
                                   missing = "no"
                                  ) %>% 
  add_p() %>%
  #add_n() %>%
  #add_overall() %>%
  bold_labels() %>%
  modify_caption("Table 2. Demographic characteristics according to smoking status") %>%
  modify_header(label ~ "**Characteristics**") %>%
  modify_spanning_header(c("stat_1", "stat_2") ~ "**Smoking Status**") %>%
  modify_footnote(all_stat_cols() ~ "Median (IQR) for Age; n (%) for all other variables") 
  
table_2

Table 2. Demographic characteristics according to smoking status
Characteristics	Smoking Status		p-value²
Characteristics	No, N = 469¹	Yes, N = 2,631¹	p-value²
Age	36 (28, 47)	33 (26, 44)	<0.001
Race			0.8
Black	46 (9.8%)	259 (9.8%)
Other	68 (14%)	376 (14%)
Refused/missing	1 (0.2%)	18 (0.7%)
White	354 (75%)	1,978 (75%)
Education level			<0.001
HS/GED	104 (34%)	531 (41%)
Less than HS	23 (7.5%)	290 (22%)
More than HS	179 (58%)	488 (37%)
Male	336 (72%)	1,724 (66%)	0.010
Marital status			0.006
Married or Partnered	75 (25%)	236 (18%)
Never married	152 (50%)	775 (59%)
Separated/Divorced/Widowed	77 (25%)	293 (22%)
¹ Median (IQR) for Age; n (%) for all other variables
² Wilcoxon rank sum test; Fisher’s exact test; Pearson’s Chi-squared test

7.6.7 Interpretation of Table 2

Interpreting the variable Education level:

Null Hypothesis (H₀): There is no association between education level and smoking status.

Alternative Hypothesis (H₁): There is an association between education level and smoking status.

Since the p-value is less than 0.001, we reject the null hypothesis. This indicates that there is a statistically significant association between education level and smoking status. However, to understand the nature of this association (whether education level affects smoking status or vice versa), further analysis would be needed.

7.6.8 Missing value distribution in Table 2

We often want to see the missing value distribution among the the demographic variables. For example, we want to see the missing value distribution for the smoking status variable. First, we need to re-code the NA into a new category for is_smoker variable and recreate the table.

7.6.8.1 Missing value data creation

# Recoding `is_smoker` variable into `is_smoker_new`
table_1df <- table_1df %>% 
  mutate(is_smoker_new = ifelse(is.na(is_smoker), 99, is_smoker))  # converting all NA to 99

# Convert into factor
table_1df$is_smoker_new <- factor(table_1df$is_smoker_new,
                                  levels = c(1, 2, 99),
                                  labels = c("No", "Yes", "Missing"))

# New data frame 
table_1df_new <- table_1df %>% 
  select(age, race, education, is_male, marital, is_smoker_new)

7.6.8.2 Missing value table creation

# Final table
table_2miss <- table_1df_new %>% tbl_summary(by = is_smoker_new,
                                   label = list(
                                     age = "Age",
                                     race = "Race",
                                     education = "Education level",
                                     is_male = "Male",
                                     marital = "Marital status"
                                   ),
                                   missing = "no"
                                  ) %>% 
  add_p() %>%
  #add_n() %>%
  #add_overall() %>%
  bold_labels() %>%
  modify_caption("Table 2. Demographic characteristics according to smoking status") %>%
  modify_header(label ~ "**Characteristics**") %>%
  modify_spanning_header(c("stat_1", "stat_2", "stat_3") ~ "**Smoking Status**") %>%
  modify_footnote(all_stat_cols() ~ "Median (IQR) for Age; n (%) for all other variables") 
  
table_2miss

Table 2. Demographic characteristics according to smoking status
Characteristics	Smoking Status			p-value²
Characteristics	No, N = 469¹	Yes, N = 2,631¹	Missing, N = 460¹	p-value²
Age	36 (28, 47)	33 (26, 44)	39 (29, 47)	<0.001
Race				<0.001
Black	46 (9.8%)	259 (9.8%)	60 (13%)
Other	68 (14%)	376 (14%)	62 (13%)
Refused/missing	1 (0.2%)	18 (0.7%)	39 (8.5%)
White	354 (75%)	1,978 (75%)	299 (65%)
Education level				<0.001
HS/GED	104 (34%)	531 (41%)	56 (37%)
Less than HS	23 (7.5%)	290 (22%)	39 (26%)
More than HS	179 (58%)	488 (37%)	57 (38%)
Male	336 (72%)	1,724 (66%)	291 (64%)	0.019
Marital status				<0.001
Married or Partnered	75 (25%)	236 (18%)	18 (13%)
Never married	152 (50%)	775 (59%)	101 (71%)
Separated/Divorced/Widowed	77 (25%)	293 (22%)	24 (17%)
¹ Median (IQR) for Age; n (%) for all other variables
² Kruskal-Wallis rank sum test; Pearson’s Chi-squared test

7.7 Regression Table with `tbl_regression()` Function

7.7.1 Creating Regression Model

Here, we are creating a logistic regression model where smoking status is the response variable, education is exploratory variable and age, race and sex are considered as confounders.

# Building the Multivariable logistic model
m1 <- glm(is_smoker ~  education + age + race + is_male, 
          table_1df, 
          family = binomial)

# View raw model results
summary(m1)$coefficients

                         Estimate  Std. Error     z value     Pr(>|z|)
(Intercept)            3.50434562 0.389242905  9.00297879 2.196748e-19
educationLess than HS  1.01143171 0.252724965  4.00210447 6.278157e-05
educationMore than HS -0.60886151 0.144410039 -4.21619932 2.484542e-05
age                   -0.04564764 0.006417912 -7.11253757 1.139285e-12
raceOther             -0.24842217 0.315210858 -0.78811425 4.306299e-01
raceRefused/missing    0.39602359 1.124629178  0.35213704 7.247355e-01
raceWhite             -0.01922531 0.251971208 -0.07629961 9.391807e-01
is_maleYes            -0.39712021 0.147363550 -2.69483338 7.042384e-03

7.7.2 Creating Table 3: Regression Table

Here, I am using tbl_regression function to see the regression results in the table. The exponentiate = TRUE shows the data as Odds Ratio after exponentiation of the beta values.

# Creating Regression Table 
table_3 <- tbl_regression(m1, exponentiate = TRUE)

table_3

Characteristic	OR¹	95% CI¹	p-value
education
HS/GED	—	—
Less than HS	2.75	1.71, 4.61	<0.001
More than HS	0.54	0.41, 0.72	<0.001
age	0.96	0.94, 0.97	<0.001
race
Black	—	—
Other	0.78	0.42, 1.44	0.4
Refused/missing	1.49	0.23, 29.4	0.7
White	0.98	0.59, 1.59	>0.9
is_male
No	—	—
Yes	0.67	0.50, 0.89	0.007
¹ OR = Odds Ratio, CI = Confidence Interval

7.7.3 Customizing Table 3

Here, I have customized the table 3 by using functions I applied in table 1.

# Customizing Regression Table 
table_3 <- tbl_regression(m1, exponentiate = TRUE,
                           label = list(
                             age = "Age",
                             race = "Race",
                             education = "Education level",
                             is_male = "Male"
                             ),
                          missing = "no"
                          ) %>% 
  bold_labels() %>%
  bold_p(t = 0.10) %>%  
  italicize_levels() %>%
  modify_caption("Table 3. Logistic Regression for smoking status as response varialbe (n=3014)")

table_3

Table 3. Logistic Regression for smoking status as response varialbe (n=3014)
Characteristic	OR¹	95% CI¹	p-value
Education level
HS/GED	—	—
Less than HS	2.75	1.71, 4.61	<0.001
More than HS	0.54	0.41, 0.72	<0.001
Age	0.96	0.94, 0.97	<0.001
Race
Black	—	—
Other	0.78	0.42, 1.44	0.4
Refused/missing	1.49	0.23, 29.4	0.7
White	0.98	0.59, 1.59	>0.9
Male
No	—	—
Yes	0.67	0.50, 0.89	0.007
¹ OR = Odds Ratio, CI = Confidence Interval

7.7.4 Interpreting Table 3

Interpreting the variable Education level:

For individuals with less than high school education, the odds of being a smoker are 2.75 times higher compared to those with HS/GED, after adjusting for age, race, and sex.

Conversely, for individuals with more than high school education, the odds of being a smoker are 0.54 times lower compared to those with HS/GED, after adjusting for age, race, and sex.

Interpreting the variable Age:

For each unit increase in age, the odds of being a smoker decrease by a factor of 0.96 (or 4%), after adjusting for education, race, and sex.

In R, for interpreting categorical variables, reference level is selected by alphabetic order, therefore, the HS/GED is selected as reference level (H), next one is Less than HS (L) and then More than HS (M).

7.7.5 Changing the Reference Level in Table 3

Often, we need to change the reference level as per our analysis need or aim of the study. We can select the specific reference level and run the table 3. First step is to check if the variable is in factor format. If it is not in factor format, we need to convert it into factor. Next, we can use the following codes to refer and use in table 3.

7.7.5.1 New Model with New Reference Level

Here I am creating model 2 (m2) wit the new reference as Less than HS for the education variable.

# Check factor format
str(table_1df$education) # It shows that it is in factor format.

 Factor w/ 3 levels "HS/GED","Less than HS",..: 3 3 3 3 NA 1 3 NA 1 3 ...

# Building the glm model with specific reference level for education  = "Less than HS".
m2 <- glm(is_smoker ~  relevel(factor(education), ref = "Less than HS")  + age + race + is_male, 
          table_1df, 
          family = binomial)

# View raw model results
summary(m2)$coefficients

                                                                Estimate
(Intercept)                                                   4.51577733
relevel(factor(education), ref = "Less than HS")HS/GED       -1.01143171
relevel(factor(education), ref = "Less than HS")More than HS -1.62029322
age                                                          -0.04564764
raceOther                                                    -0.24842217
raceRefused/missing                                           0.39602359
raceWhite                                                    -0.01922531
is_maleYes                                                   -0.39712021
                                                              Std. Error
(Intercept)                                                  0.436459823
relevel(factor(education), ref = "Less than HS")HS/GED       0.252724965
relevel(factor(education), ref = "Less than HS")More than HS 0.244106320
age                                                          0.006417912
raceOther                                                    0.315210858
raceRefused/missing                                          1.124629178
raceWhite                                                    0.251971208
is_maleYes                                                   0.147363550
                                                                 z value
(Intercept)                                                  10.34637575
relevel(factor(education), ref = "Less than HS")HS/GED       -4.00210447
relevel(factor(education), ref = "Less than HS")More than HS -6.63765370
age                                                          -7.11253757
raceOther                                                    -0.78811425
raceRefused/missing                                           0.35213704
raceWhite                                                    -0.07629961
is_maleYes                                                   -2.69483338
                                                                 Pr(>|z|)
(Intercept)                                                  4.346284e-25
relevel(factor(education), ref = "Less than HS")HS/GED       6.278157e-05
relevel(factor(education), ref = "Less than HS")More than HS 3.187156e-11
age                                                          1.139285e-12
raceOther                                                    4.306299e-01
raceRefused/missing                                          7.247355e-01
raceWhite                                                    9.391807e-01
is_maleYes                                                   7.042384e-03

7.7.5.2 Creating and Customizing New Table 3 with New Reference Level

Here, I have created the new table 3 for m2 model and customized it accordingly.

# Customizing Regression Table 
table_3n <- tbl_regression(m2, exponentiate = TRUE,  # Creating the table
                           label = list(
                             age = "Age",
                             race = "Race",
                             education = "Education level",
                             is_male = "Male"
                             ),
                          missing = "no"
                          ) %>% 
  bold_labels() %>%
  bold_p(t = 0.10) %>%  
  italicize_levels() %>%
  modify_caption("Table 3. Logistic Regression for smoking status as response varialbe (n=3014)")

table_3n

Table 3. Logistic Regression for smoking status as response varialbe (n=3014)
Characteristic	OR¹	95% CI¹	p-value
relevel(factor(education), ref = "Less than HS")
Less than HS	—	—
HS/GED	0.36	0.22, 0.59	<0.001
More than HS	0.20	0.12, 0.31	<0.001
Age	0.96	0.94, 0.97	<0.001
Race
Black	—	—
Other	0.78	0.42, 1.44	0.4
Refused/missing	1.49	0.23, 29.4	0.7
White	0.98	0.59, 1.59	>0.9
Male
No	—	—
Yes	0.67	0.50, 0.89	0.007
¹ OR = Odds Ratio, CI = Confidence Interval

7.8 Conclusion (Take Home Message)

We can use gtsummary package for creating publication-ready tables.
The tbl_summary() and the tbl_regression() are the frequently used functions in this package.
Multiple other functions can be used to customize the table and can address the journal requirements.