| Type: | Package |
| Title: | Efficient Effect Size Computation |
| Version: | 0.8.1 |
| Date: | 2020-10-05 |
| Description: | A collection of functions to compute the standardized effect sizes for experiments (Cohen d, Hedges g, Cliff delta, Vargha-Delaney A). The computation algorithms have been optimized to allow efficient computation even with very large data sets. |
| URL: | https://github.com/mtorchiano/effsize/ |
| BugReports: | https://github.com/mtorchiano/effsize/issues |
| License: | GPL-2 |
| NeedsCompilation: | no |
| Repository: | CRAN |
| Suggests: | testthat |
| Packaged: | 2020-10-05 07:34:37 UTC; mtk |
| Author: | Marco Torchiano [aut, cre] |
| Maintainer: | Marco Torchiano <marco.torchiano@polito.it> |
| Date/Publication: | 2020-10-05 09:50:17 UTC |
Efficient Effect Size Computation
Description
This packages contains functions to compute effect sizes both based on means difference (Cohen's d and Hedges g), dominance matrices (Cliff's Delta) and stochastic superiority (Vargha-Delaney A).
The computation (especially for Cliff's Delta) is carried on with higly efficient algorithms.
Details
The main functions are:
VD.A.
Change history
- 0.3.1
Fixed a bug in
cohen.dwhenPAIRED=TRUE, now thePAIREDparameter has no effect, it is left just for compatibility. In a future code clean-up it may be removed- 0.4
Implemented a new algorithm with improved memory and time complexity. In particular new time complexity is T = O(n1*log(n2)) vs. the previous T = O(n1*n2), and new memory complexity M = O( n1 + n2 ) vs. the previous M = O( n1 * n2). In practice now the computation becomes feasible in a "reasonable" time.
- 0.4.1
Code clean-up and optimization using vectorized binary partioning.
- 0.5
Added Vargha and Delaney A and fixed minor bugs with Cohen.d.
- 0.5.1
Modified the Vargha and Delaney A computation to minimize accuracy errors.
- 0.5.2
Fixed bug in
cliff.delta.- 0.5.3
Fixed bug in
cohen.d.formula.- 0.5.4
Fixed minor issue detected by check.
- 0.5.5
Changed the effsize field magnitude to a factor value.
- 0.6.0
Implemented paired computation and CI computation with non-central t-distributions for cohen.d.
- 0.6.1
Added ability to specify factor vector and data vector for 'cliff.delta' function (thanks to Joses W. Ho).
- 0.6.2
na.rmincohen.dremoves all incomplete pairs when paired.- 0.6.3
fixed bug in
cohen.dwhenna.rm=TRUE, minor changes in the documentation (thanks to P.Thomas)- 0.6.4
Fixed a bug related to paired
cohen.dwith NAs. Minor documentation changes- 0.7.0
Refactored tests using
testthatpackage. Fixed a bug incliff.deltareturning inconsistent results when the dominance matrix is returned. Fixed issue concerning CI. Fixed bug incohen.dwhen using noncentral parameter for negative effect sizes.- 0.7.1
Fixed minor bugs in
cliff.deltaandcohen.d- 0.7.2
Fixed bugs in
cohen.d, order of factors is now observed and CI are computed correctly- 0.7.3
Fixed bugs in
cohen.d, possible endless loop, cleaned code- 0.7.4
Fixed bugs in
cliff.deltawhen values are factors- 0.7.5
Fixed bugs in
cohen.dfor paired data- 0.7.6
Fixed bugs in
cohen.dfor CI of paired data- 0.7.7
Fixed bugs in
cohen.dfor non-pooled SD, plus a few pull requests on documentation- 0.7.8
Fixed bug in
cohen.dwrong correct type check- 0.7.9
Fixed tests to be compatible with upcoming R 4.0, that sets stringsAsFactors to FALSE by default
- 0.8.0
Added non-central CI estimation for single sample
cohen.d, fixed a bug related to order of data and added asubjectparameter for pairedcohen.d
Author(s)
Marco Torchiano http://softeng.polito.it/torchiano/
Vargha and Delaney A measure
Description
Computes the Vargha and Delaney A effect size measure.
Usage
VD.A(d, ...)
## S3 method for class 'formula'
VD.A(formula,data=list(), ...)
## Default S3 method:
VD.A(d,f, ...)
Arguments
d |
a numeric vector giving either the data values (if |
f |
either a factor with two levels or a numeric vector of values |
formula |
a formula of the form |
data |
an optional matrix or data frame containing the variables in the formula |
... |
further arguments to be passed to or from methods. |
Details
The function computes the Vargha and Delaney A effect size measure (Vargha and Delaney, 2000).
Value
A list of class effsize containing the following components:
estimate |
the A statistics estimate |
magnitude |
a qualitative assessment of the magnitude of effect size |
method |
the method used, i.e. |
Author(s)
Marco Torchiano http://softeng.polito.it/torchiano/
References
A. Vargha and H. D. Delaney. "A critique and improvement of the CL common language effect size statistics of McGraw and Wong." Journal of Educational and Behavioral Statistics, 25(2):101-132, 2000
See Also
cliff.delta, cohen.d, print.effsize
Examples
treatment = rnorm(100,mean=10)
control = rnorm(100,mean=12)
d = (c(treatment,control))
f = rep(c("Treatment","Control"),each=100)
## compute Vargha and Delaney A
## treatment and control
VD.A(treatment,control)
## data and factor
VD.A(d,f)
## formula interface
VD.A(d ~ f)
Cliff's Delta effect size for ordinal variables
Description
Computes the Cliff's Delta effect size for ordinal variables with the related confidence interval using efficient algorithms.
Usage
cliff.delta(d, ... )
## S3 method for class 'formula'
cliff.delta(formula, data=list() ,conf.level=.95,
use.unbiased=TRUE, use.normal=FALSE,
return.dm=FALSE, ...)
## Default S3 method:
cliff.delta(d, f, conf.level=.95,
use.unbiased=TRUE, use.normal=FALSE,
return.dm=FALSE, ...)
Arguments
d |
a numeric vector giving either the data values (if |
f |
either a factor with two levels or a numeric vector of values (see Detials) |
conf.level |
confidence level of the confidence interval |
use.unbiased |
a logical indicating whether to compute the delta's variance using the "unbiased" estimate formula or the "consistent" estimate |
use.normal |
logical indicating whether to use the normal or Student-t distribution for the confidence interval estimation |
return.dm |
logical indicating whether to return the dominance matrix. Warning: the explicit computation of the dominance uses a sub-optimal algorithm both in terms of memory and time |
formula |
a formula of the form |
data |
an optional matrix or data frame containing the variables in the formula |
... |
further arguments to be passed to or from methods. |
Details
Uses the original formula reported in (Cliff 1996).
If the dominance matrix is required i.e. return.dm=TRUE) the full matrix is computed thus using the naive algorithm.
Otherwise, if treatment and control are factors then the optimized linear complexity algorithm is used, otherwise the RLE algorithm (with complexity n log n) is used.
Value
A list of class effsize containing the following components:
estimate |
the Cliff's delta estimate |
conf.int |
the confidence interval of the delta |
var |
the estimated variance of the delta |
conf.level |
the confidence level used to compute the confidence interval |
dm |
the dominance matrix used for computation, only if |
magnitude |
a qualitative assessment of the magnitude of effect size |
method |
the method used for computing the effect size, always |
variance.estimation |
the method used to compute the delta variance estimation, either |
CI.distribution |
the distribution used to compute the confidence interval, either |
The magnitude is assessed using the thresholds provided in (Romano 2006), i.e. |d|<0.147 "negligible", |d|<0.33 "small", |d|<0.474 "medium", otherwise "large"
Author(s)
Marco Torchiano http://softeng.polito.it/torchiano/
References
Norman Cliff (1996). Ordinal methods for behavioral data analysis. Routledge.
J. Romano, J. D. Kromrey, J. Coraggio, J. Skowronek, Appropriate statistics for ordinal level data: Should we really be using t-test and cohen's d for evaluating group differences on the NSSE and other surveys?, in: Annual meeting of the Florida Association of Institutional Research, 2006.
K.Y. Hogarty and J.D.Kromrey (1999). Using SAS to Calculate Tests of Cliff's Delta. Proceedings of the Twenty-Foursth Annual SAS User Group International Conference, Miami Beach, Florida, p 238. Available at: https://support.sas.com/resources/papers/proceedings/proceedings/sugi24/Posters/p238-24.pdf
See Also
Examples
## Example data from Hogarty and Kromrey (1999)
treatment <- c(10,10,20,20,20,30,30,30,40,50)
control <- c(10,20,30,40,40,50)
res = cliff.delta(treatment,control,return.dm=TRUE)
print(res)
print(res$dm)
Cohen's d and Hedges g effect size
Description
Computes the Cohen's d and Hedges'g effect size statistics.
Usage
cohen.d(d, ...)
## S3 method for class 'formula'
cohen.d(formula,data=list(),...)
## Default S3 method:
cohen.d(d,f,pooled=TRUE,paired=FALSE,
na.rm=FALSE, mu=0, hedges.correction=FALSE,
conf.level=0.95,noncentral=FALSE,
within=TRUE, subject=NA, ...)
Arguments
d |
a numeric vector giving either the data values (if |
f |
either a factor with two levels or a numeric vector of values, if |
formula |
a formula of the form If using a paired computation ( A single sample effect size can be specified with the form |
data |
an optional matrix or data frame containing the variables in the formula |
pooled |
a logical indicating whether compute pooled standard deviation or the whole sample standard deviation. If |
hedges.correction |
logical indicating whether apply the Hedges correction |
conf.level |
confidence level of the confidence interval |
noncentral |
logical indicating whether to use non-central t distributions for computing the confidence interval. |
paired |
a logical indicating whether to consider the values as paired, a warning is issued if
|
within |
indicates whether to compute the effect size using the within subject variation, taking into consideration the correlation between pre and post samples. |
subject |
an array indicating the id of the subject for a paired computation, when the formula interface is used it can be indicated in the formula by adding |
mu |
numeric indicating the reference mean for single sample effect size. |
na.rm |
logical indicating whether |
... |
further arguments to be passed to or from methods. |
Details
When f in the default version is a factor or a character, it must have two values and it identifies the two groups to be compared. Otherwise (e.g. f is numeric), it is considered as a sample to be compare to d.
In the formula version, f is expected to be a factor, if that is not the case it is coherced to a factor and a warning is issued.
The function computes the value of Cohen's d statistics (Cohen 1988).
If required (hedges.correction==TRUE) the Hedges g statistics is computed instead (Hedges and Holkin, 1985).
When paired is set, the effect size is computed using the approach suggested in (Gibbons et al. 1993). In particular a correction to take into consideration the correlation of the two samples is applied (see Borenstein et al., 2009)
It is possible to perform a single sample effect size estimation either using a formula ~x or passing f=NA.
The computation of the CI requires the use of non-central Student-t distributions that are used when noncentral==TRUE; otherwise a central distribution is used.
Also a quantification of the effect size magnitude is performed using the thresholds define in Cohen (1992).
The magnitude is assessed using the thresholds provided in (Cohen 1992), i.e. |d|<0.2 "negligible", |d|<0.5 "small", |d|<0.8 "medium", otherwise "large"
The variance of the d is computed using the conversion formula reported at page 238 of Cooper et al. (2009):
S^2_d = \left( \frac{n_1+n_2}{n_1 n_2} + \frac{d^2}{2 df}\right) \left( \frac{n_1+n_2}{df} \right)
Value
A list of class effsize containing the following components:
estimate |
the statistic estimate |
conf.int |
the confidence interval of the statistic |
sd |
the within-groups standard deviation |
conf.level |
the confidence level used to compute the confidence interval |
magnitude |
a qualitative assessment of the magnitude of effect size |
method |
the method used for computing the effect size, either |
Author(s)
Marco Torchiano http://softeng.polito.it/torchiano/
References
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York:Academic Press.
Hedges, L. V. & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando, FL: Academic Press.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159.
Cooper, Hedges, and Valentin (2009). The Handbook of Research Synthesis and Meta-Analysis
David C. Howell (2011). Confidence Intervals on Effect Size. Available at: https://www.uvm.edu/~statdhtx/methods8/Supplements/MISC/Confidence%20Intervals%20on%20Effect%20Size.pdf
Cumming, G.; Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61, 633-649.
Gibbons, R. D., Hedeker, D. R., & Davis, J. M. (1993). Estimation of effect size from a series of experiments involving paired comparisons. Journal of Educational Statistics, 18, 271-279.
M. Borenstein, L. V. Hedges, J. P. T. Higgins and H. R. Rothstein (2009) Introduction to Meta-Analysis. John Wiley & Son.
See Also
cliff.delta, VD.A, print.effsize
Examples
treatment = rnorm(100,mean=10)
control = rnorm(100,mean=12)
d = (c(treatment,control))
f = rep(c("Treatment","Control"),each=100)
## compute Cohen's d
## treatment and control
cohen.d(treatment,control)
## data and factor
cohen.d(d,f)
## formula interface
cohen.d(d ~ f)
## compute Hedges' g
cohen.d(d,f,hedges.correction=TRUE)
Prints effect size
Description
Prints the results of an effect size computation
Usage
## S3 method for class 'effsize'
print(x, ...)
Arguments
x |
the effect size result |
... |
further parameters are currently ignored |
Details
Shows the estimate value and, when available, the confidence interval.
Note
This is still work in progress..
Author(s)
Marco Torchiano http://softeng.polito.it/torchiano/
References
See the main function cliff.delta.