| Type: | Package |
| Author: | Dylan Huynh [aut, cre] |
| Maintainer: | Dylan Huynh <dylanhuynh@utexas.edu> |
| Title: | Nonparametric Bootstrap Test for Regression Monotonicity |
| Version: | 1.2 |
| Description: | Implements nonparametric bootstrap tests for detecting monotonicity in regression functions from Hall, P. and Heckman, N. (2000) <doi:10.1214/aos/1016120363> Includes tools for visualizing results using Nadaraya-Watson kernel regression and supports efficient computation with 'C++'. |
| License: | GPL-2 | GPL-3 [expanded from: GPL] |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.2 |
| LinkingTo: | Rcpp, RcppEigen |
| Imports: | Rcpp (≥ 1.0.13-1), parallel, stats, graphics, ggplot2 (≥ 3.0.0), rlang |
| Suggests: | testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | yes |
| Packaged: | 2025-04-24 16:02:23 UTC; dylanh |
| Depends: | R (≥ 3.5.0) |
| Repository: | CRAN |
| Date/Publication: | 2025-04-24 19:20:02 UTC |
Generate Kernel Plot
Description
Creates a scatter plot of the input vectors X and Y, and overlays
a Nadaraya-Watson kernel regression curve using the specified bandwidth.
Usage
create_kernel_plot(X, Y, bandwidth = bw.nrd(X) * (length(X)^-0.1), nrows = 4)
Arguments
X |
Vector of x values. |
Y |
Vector of y values. |
bandwidth |
Kernel bandwidth used for the Nadaraya-Watson estimator. Can
be a single numeric value or a vector of bandwidths.
Default is calculated as
|
nrows |
Number of rows in the facet grid if multiple bandwidths are provided.
Does not do anything if only a single bandwidth value is provided.
Default is |
Value
A ggplot object containing the scatter plot(s) with the kernel regression curve(s). If a vector of bandwidths is supplied, the plots are put into a grid using faceting.
References
Nadaraya, E. A. (1964). On estimating regression. Theory of Probability and Its Applications, 9(1), 141–142.
Watson, G. S. (1964). Smooth estimates of regression functions. Sankhyā: The Indian Journal of Statistics, Series A, 359-372.
Examples
# Example 1: Basic plot on quadratic function
seed <- 42
set.seed(seed)
X <- runif(500)
Y <- X ^ 2 + rnorm(500, sd = 0.1)
plot <- create_kernel_plot(X, Y, bandwidth = bw.nrd(X) * (length(X) ^ -0.1))
A Simulated Diabetes Dataset
Description
This dataset contains simulated medical measurements for Diabetes and is emulated after data from the Diabetes Prevention Program. Each column represents change in a key metabolic indicators after two years for the placebo group receiving no treatment.
Usage
data("diabetes", package="MonotonicityTest")
Format
A data frame with 1000 rows and 4 variables:
- CLDL
Change in low-density lipoprotein (LDL) cholesterol (mg/dL).
- GLUCOSE
Change in fasting plasma glucose levels (mg/dL).
- TRIG
Change in triglyceride levels (mg/dL).
- HBA1C
Change in hemoglobin A1c levels (%).
Examples
data("diabetes", package="MonotonicityTest")
names(diabetes)
Perform Monotonicity Test
Description
Performs a monotonicity test between the vectors X and Y
as described in Hall and Heckman (2000).
This function uses a bootstrap approach to test for monotonicity
in a nonparametric regression setting.
Usage
monotonicity_test(
X,
Y,
bandwidth = bw.nrd(X) * (length(X)^-0.1),
boot_num = 200,
m = floor(0.05 * length(X)),
ncores = 1,
negative = FALSE,
seed = NULL
)
Arguments
X |
Numeric vector of predictor variable values. Must not contain missing or infinite values. |
Y |
Numeric vector of response variable values. Must not contain missing or infinite values. |
bandwidth |
Numeric value for the kernel bandwidth used in the
Nadaraya-Watson estimator. Default is calculated as
|
boot_num |
Integer specifying the number of bootstrap samples.
Default is |
m |
Integer parameter used in the calculation of the test statistic.
Corresponds to the minimum window size to calculate the test
statistic over or a "smoothing" parameter. Lower values increase
the sensitivity of the test to local deviations from monotonicity.
Default is |
ncores |
Integer specifying the number of cores to use for parallel
processing. Default is |
negative |
Logical value indicating whether to test for a monotonic
decreasing (negative) relationship. Default is |
seed |
Optional integer for setting the random seed. If NULL (default), the global random state is used. |
Details
The test evaluates the following hypotheses:
H_0: The regression function is monotonic
-
Non-decreasing if
negative = FALSE -
Non-increasing if
negative = TRUE
H_A: The regression function is not monotonic
Value
A list with the following components:
pThe p-value of the test. A small p-value (e.g., < 0.05) suggests evidence against the null hypothesis of monotonicity.
distThe distribution of test statistic under the null from bootstrap samples. The length of
distis equal toboot_num.statThe test statistic
T_mcalculated from the original data.plotA ggplot object with a scatter plot where the points of the "critical interval" are highlighted. This critical interval is the interval where
T_mis greatest.intervalNumeric vector containing the indices of the "critical interval". The first index indicates where the interval starts, and the second indicates where it ends in the sorted
Xvector.
Note
For large datasets (e.g., n \geq 6500) this function may require
significant computation time due to having to compute the statistic
for every possible interval. Consider reducing boot_num, using
a subset of the data, or using parallel processing with ncores
to improve performance.
In addition to this, a minimum of 300 observations is recommended for kernel estimates to be reliable.
References
Hall, P., & Heckman, N. E. (2000). Testing for monotonicity of a regression mean by calibrating for linear functions. The Annals of Statistics, 28(1), 20–39.
Examples
# Example 1: Usage on monotonic increasing function
# Generate sample data
seed <- 42
set.seed(seed)
X <- runif(500)
Y <- 4 * X + rnorm(500, sd = 1)
result <- monotonicity_test(X, Y, boot_num = 25, seed = seed)
print(result)
# Example 2: Usage on non-monotonic function
seed <- 42
set.seed(seed)
X <- runif(500)
Y <- (X - 0.5) ^ 2 + rnorm(500, sd = 0.5)
result <- monotonicity_test(X, Y, boot_num = 25, seed = seed)
print(result)