% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/balqual.R
\name{balqual}
\alias{balqual}
\title{Evaluate Matching Quality}
\usage{
balqual(
  matched_data = NULL,
  formula = NULL,
  type = c("smd", "r", "var_ratio"),
  statistic = c("mean", "max"),
  cutoffs = NULL,
  round = 3,
  print_out = TRUE
)
}
\arguments{
\item{matched_data}{An object of class \code{matched}, generated by the
\code{\link[=match_gps]{match_gps()}} function. This object is essential for the \code{balqual()}
function as it contains the final data.frame and attributes required to
compute the quality coefficients.}

\item{formula}{A valid R formula used to compute generalized propensity
scores during the first step of the vector matching algorithm in
\code{\link[=estimate_gps]{estimate_gps()}}. This formula must match the one used in \code{estimate_gps()}.}

\item{type}{A character vector specifying the quality metrics to calculate.
Can maximally contain 3 values in a vector created by the \code{c()}. Possible
values include:
\itemize{
\item \code{smd} - Calculates standardized mean differences (SMD) between groups,
defined as the difference in means divided by the standard deviation of the
treatment group (Rubin, 2001).
\item \code{r} - Computes Pearson's r coefficient using the Z statistic from the
U-Mann-Whitney test.
\item \code{var_ratio} - Measures the dispersion differences between groups,
calculated as the ratio of the larger variance to the smaller one.
}}

\item{statistic}{A character vector specifying the type of statistics used to
summarize the quality metrics. Since quality metrics are calculated for all
pairwise comparisons between treatment levels, they need to be aggregated
for the entire dataset.
\itemize{
\item \code{max}: Returns the maximum values of the statistics defined in the \code{type}
argument (as suggested by Lopez and Gutman, 2017).
\item \code{mean}: Returns the corresponding averages.
}

To compute both, provide both names using the \code{c()} function.}

\item{cutoffs}{A numeric vector with the same length as the number of
coefficients specified in the \code{type} argument. Defines the cutoffs for each
corresponding metric, below which the dataset is considered balanced. If
\code{NULL}, the default cutoffs are used: 0.1 for \code{smd} and \code{r}, and 2 for
\code{var_ratio}.}

\item{round}{An integer specifying the number of decimal places to round the
output to.}

\item{print_out}{Logical. If \code{TRUE} (default), a matching quality summary
will be printed to the console. Set to \code{FALSE} to suppress this output.}
}
\value{
If assigned to a name, returns a list of summary statistics of class
\code{quality} containing:
\itemize{
\item \code{quality_mean} - A data frame with the mean values of the statistics
specified in the \code{type} argument for all balancing variables used in
\code{formula}.
\item \code{quality_max} - A data frame with the maximal values of the statistics
specified in the \code{type} argument for all balancing variables used in
\code{formula}.
\item \code{perc_matched} - A single numeric value indicating the percentage of
observations in the original dataset that were matched.
\item \code{statistic} - A single string defining which statistic will be displayed
in the console.
\item \code{summary_head} - A summary of the matching process. If \code{max} is included
in the \code{statistic}, it contains the maximal observed values for each
variable; otherwise, it includes the mean values.
\item \code{n_before} - The number of observations in the dataset before matching.
\item \code{n_after} - The number of observations in the dataset after matching.
\item \code{count_table} - A contingency table showing the distribution of the
treatment variable before and after matching.
}

The \code{balqual()} function also prints a well-formatted table with the
defined summary statistics for each variable in the \code{formula} to the
console.
}
\description{
The \code{balqual()} function evaluates the balance quality of a
dataset after matching, comparing it to the original unbalanced dataset. It
computes various summary statistics and provides an easy interpretation
using user-specified cutoff values.
}
\examples{
# We try to balance the treatment variable in the cancer dataset based on age
# and sex covariates
data(cancer)

# Firstly, we define the formula
formula_cancer <- formula(status ~ age * sex)

# Then we can estimate the generalized propensity scores
gps_cancer <- estimate_gps(formula_cancer,
  cancer,
  method = "multinom",
  reference = "control",
  verbose_output = TRUE
)

# ... and drop observations based on the common support region...
csr_cancer <- csregion(gps_cancer)

# ... to match the samples using `match_gps()`
matched_cancer <- match_gps(csr_cancer,
  reference = "control",
  caliper = 1,
  kmeans_cluster = 5,
  kmeans_args = list(n.iter = 100),
  verbose_output = TRUE
)

# At the end we can assess the quality of matching using `balqual()`
balqual(
  matched_data = matched_cancer,
  formula = formula_cancer,
  type = "smd",
  statistic = "max",
  round = 3,
  cutoffs = 0.2
)

}
\references{
Rubin, D.B. Using Propensity Scores to Help Design Observational
Studies: Application to the Tobacco Litigation. Health Services & Outcomes
Research Methodology 2, 169–188 (2001).
https://doi.org/10.1023/A:1020363010465

Michael J. Lopez, Roee Gutman "Estimation of Causal Effects with Multiple
Treatments: A Review and New Ideas," Statistical Science, Statist. Sci.
32(3), 432-454, (August 2017)
}
\seealso{
\code{\link[=match_gps]{match_gps()}} for matching the generalized propensity scores;
\code{\link[=estimate_gps]{estimate_gps()}} for the documentation of the \code{formula} argument.
}
