| Type: | Package |
| Title: | Turn a Regression Model Inside Out |
| Version: | 1.1.1 |
| Maintainer: | David Melamed <dmmelamed@gmail.com> |
| Description: | Turns regression models inside out. Functions decompose variances and coefficients for various regression model types. Functions also visualize regression model objects using techniques developed in Schoon, Melamed, and Breiger (2024) <doi:10.1017/9781108887205>. |
| VignetteBuilder: | knitr |
| Depends: | R (≥ 3.5.0), ggplot2, methods |
| Suggests: | dplyr, knitr, rmarkdown, ggrepel, MASS |
| License: | GPL-2 | GPL-3 |
| Encoding: | UTF-8 |
| LazyData: | true |
| NeedsCompilation: | no |
| Packaged: | 2024-06-13 15:18:34 UTC; melamed.9 |
| Author: | David Melamed |
| Repository: | CRAN |
| Date/Publication: | 2024-06-14 09:50:01 UTC |
Replication data for Beckfield (2006) as re-analyzed by Schoon, Melamed, and Breiger (2024)
Description
Beckfield (2006) analyzed these data using fixed and random effects regression models. He showed that regional economic and political integregation is associated with increased economic inequality. Schoon, Melamed, and Breiger (2024) turned these models inside out and decomposed the model coefficients.
Usage
data("Beckfield06")
Format
A data frame with 48 observations on the following 9 variables.
yeara numeric vector
polinta numeric vector
ecointa numeric vector
ecointsa numeric vector
gdpa numeric vector
transa numeric vector
outfloa numeric vector
ginia numeric vector
countryida character vector
References
Beckfield, Jason. 2006. "European integration and income inequality."" American Sociological Review 71(6): 964-985. Schoon, Eric W., David Melamed, and Ronald L. Breiger. 2024. Regression Inside Out. NY: Cambridge University Press.
Examples
data(Beckfield06)
head(Beckfield06)
Subset of data from the General Social Survey from 2016. Data were analyzed in Schoon, Melamed, and Breiger (2024).
Description
Subset of data from the General Social Survey from 2016. Data were analyzed in Schoon, Melamed, and Breiger (2024). Full details on the variable selection and source information is available therein.
Usage
data("GSS.2016")
Format
A data frame with 2867 observations on the following 27 variables.
sclassa numeric vector
fulltimea numeric vector
retireda numeric vector
hrsworkeda numeric vector
occprestigea numeric vector
occprestige_partnera numeric vector
occprestige_mothera numeric vector
occprestige_fathera numeric vector
childrena numeric vector
agea numeric vector
educa numeric vector
paeduca numeric vector
maeduca numeric vector
speduca numeric vector
babsa numeric vector
femalea numeric vector
whitea numeric vector
blacka numeric vector
othera numeric vector
incomea numeric vector
republicana numeric vector
conservativea numeric vector
environmenta numeric vector
helpblackpeoplea numeric vector
sciencea numeric vector
govequalwealtha numeric vector
pclassa numeric vector
References
Schoon, Eric W., David Melamed, and Ronald L. Breiger. 2024. Regression Inside Out. NY: Cambridge University Press.
Examples
data(GSS.2016)
head(GSS.2016)
Subset of the General Social Survey analyzed by Schoon, Melamed, and Breiger (2024)
Description
Subset of the General Social Survey analyzed by Schoon, Melamed, and Breiger (2024). Full details on the variable selection and source information is available therein.
Usage
data("GSS2018")
Format
A data frame with 558 observations on the following 7 variables.
doga numeric vector
racea numeric vector
sexa numeric vector
childrena numeric vector
marrieda numeric vector
agea numeric vector
incomea numeric vector
References
Schoon, Eric W., David Melamed, and Ronald L. Breiger. 2024. Regression Inside Out. NY: Cambridge University Press.
Examples
data(GSS2018)
head(GSS2018)
Replication data for regression models with a count dependent variable.
Description
Data analyzed by Hilbe (2011), and used here to illustrate model visualization and coefficient decomposition for count models.
Usage
data("Hilbe")
Format
A data frame with 601 observations on the following 9 variables.
naffairsa numeric vector
avgmarra numeric vector
hapavga numeric vector
vryhapa numeric vector
smerela numeric vector
vryrela numeric vector
yrsmarr4a numeric vector
yrsmarr5a numeric vector
yrsmarr6a numeric vector
Source
Hilbe, Joseph M., 2011. Negative binomial regression. NY: Cambridge University Press.
Examples
data(Hilbe)
head(Hilbe)
Data to replicate OLS regression models reported in Kenworthy (1999).
Description
Data to replicate OLS regression models reported in Kenworthy (1999). Data were analyzed in Schoon, Melamed, and Breiger (2024). Full details on the variable selection and source information is available therein.
Usage
data("Kenworthy99")
Format
A data frame with 15 observations on the following 6 variables.
dva numeric vector
gdpa numeric vector
pova numeric vector
trana numeric vector
ISO3a character vector
nation.longa character vector
References
Kenworthy, Lane. 1999. "Do social-welfare policies reduce poverty? A cross-national assessment."" Social Forces 77(3): 1119-1139. Schoon, Eric W., David Melamed, and Ronald L. Breiger. 2024. Regression Inside Out. NY: Cambridge University Press.
Examples
data(Kenworthy99)
head(Kenworthy99)
Subset of replication data from Ragin and Fiss (2017).
Description
Subset of replication data from Ragin and Fiss (2017). Data were analyzed in Schoon, Melamed, and Breiger (2024). Full details on the variable selection and source information is available therein.
Usage
data("RaginData")
Format
A data frame with 4185 observations on the following 10 variables.
incrata numeric
pinca numeric
peda numeric
resp_eda numeric
afqta numeric
kidsa numeric
marrieda numeric
blacka numeric
malea numeric
povda numeric
References
Ragin, Charles C. and Peer C. Fiss. 2017. Intersectional inequality: Race, class, test scores, and poverty. Chicago, IL: University of Chicago Press. Schoon, Eric W., David Melamed, and Ronald L. Breiger. 2024. Regression Inside Out. NY: Cambridge University Press.
Examples
data(RaginData)
head(RaginData)
Subset of replication data from Schneider and Makszin (2014).
Description
Subset of replication data from Schneider and Makszin (2014). Data were analyzed in Schoon, Melamed, and Breiger (2024). Full details on the variable selection and source information is available therein.
Usage
data("SchneiderAndMakszin06")
Format
A data frame with 30 observations on the following 36 variables.
ida character vector
countrya character vector
yeara numeric vector
fdea numeric vector
fde_cilba numeric vector
fde_ciuba numeric vector
wcoorda numeric vector
govinta numeric vector
uda numeric vector
epla numeric vector
socexpa numeric vector
eduexpa numeric vector
vet_una numeric vector
lmexpa numeric vector
wagecova numeric vector
vet_isced3a numeric vector
eduexp_pria numeric vector
edu_terenra numeric vector
vt_rega numeric vector
vt_vapa numeric vector
compvotea numeric vector
fde2a numeric vector
low_fde_la numeric vector
high_fde_la numeric vector
high_wc_la numeric vector
high_int_la numeric vector
high_ud_la numeric vector
high_epl_la numeric vector
high_socx_la numeric vector
high_edux_la numeric vector
high_lmx_la numeric vector
high_vet_la numeric vector
p1_ya numeric vector
p2_ya numeric vector
p3_ya numeric vector
sol_ya numeric vector
References
Schneider, Carsten Q., and Kristin Makszin. 2014. "Forms of welfare capitalism and education-based participatory inequality." Socio-Economic Review 12(2): 437-462. Schoon, Eric W., David Melamed, and Ronald L. Breiger. 2024. Regression Inside Out. NY: Cambridge University Press.
Examples
data(SchneiderAndMakszin06)
head(SchneiderAndMakszin06)
Subset of replication data from Wimmer, Cederman, and Min (2009).
Description
Subset of replication data from Wimmer, Cederman, and Min (2009). Data were analyzed in Schoon, Melamed, and Breiger (2024). Full details on the variable selection and source information is available therein.
Usage
data("Wimmer_et_al_EPR")
Format
A data frame with 7908 observations on the following 80 variables.
yearca numeric
yeara numeric
cowcodea numeric
countrya character
gdpcapa numeric
gdpcapla numeric
oilpca numeric
oilpcla numeric
popavga numeric
lpopla numeric
ethfraca numeric
westerna numeric
eeuropa numeric
lamericaa numeric
ssafricaa numeric
asiaa numeric
nafrmea numeric
lmtnesta numeric
polity2a numeric
politya numeric
anoca numeric
anocla numeric
democa numeric
democla numeric
regchg3a numeric
pimppasta numeric
groupsa numeric
egipgrpsa numeric
exclgrpsa numeric
exclpopa numeric
lrexclpopa numeric
ttlpopa numeric
discpopa numeric
pwrlpopa numeric
olppopa numeric
olpspopa numeric
jppopa numeric
sppopa numeric
dompopa numeric
monpopa numeric
maxexclpopa numeric
maxegippopa numeric
maxpopa numeric
newonseta numeric
newethonseta numeric
newhionseta numeric
newethhionseta numeric
onsetstatusa numeric
onsetstatus2a numeric
actoraima numeric
actoraim2a numeric
ongoingwarla numeric
ongoinghiwarla numeric
newonset2a numeric
newhionset2a numeric
newethonset2a numeric
warlfla numeric
onsetfla numeric
ethonsetfla numeric
onsetfl2a numeric
ethonsetfl2a numeric
warstns2a numeric
warstns1a numeric
atwarnsla numeric
npeaceyearsa numeric
nspline1a numeric
nspline2a numeric
nspline3a numeric
hpeaceyearsa numeric
hspline1a numeric
hspline2a numeric
hspline3a numeric
fpeaceyearsa numeric
fspline1a numeric
fspline2a numeric
fspline3a numeric
speaceyearsa numeric
sspline1a numeric
sspline2a numeric
sspline3a numeric
References
Wimmer, Andreas, Lars-Erik Cederman, and Brian Min. 2009. "Ethnic politics and armed conflict: A configurational analysis of a new global data set." American Sociological Review 74(2): 316-337.
Examples
data(Wimmer_et_al_EPR)
head(Wimmer_et_al_EPR)
Compute the Cosine similarity between two points.
Description
Given two points, the function computes the cosine similarity between them.
Usage
cosine(x,y)
Arguments
x |
Point 1 |
y |
Point 2 |
Value
The cosine similarity, ranging between -1 and +1.
Author(s)
Ronald L. Breiger, David Melamed and Eric Schoon
References
Schoon, Eric, David Melamed, and Ronald L. Breiger. 2023. Regression Inside Out. NY: Cambridge University Press.
Examples
data(Kenworthy99)
m1 <- lm(scale(dv) ~ scale(gdp) + scale(pov) + scale(tran) -1,data=Kenworthy99)
rp1 <- rio.plot(m1,include.int="no",r1=1:15)
cosine(rp1$row.dimensions[15,],rp1$row.dimensions[8,])
# cosine similarity between USA and Ireland
cosine(rp1$row.dimensions[15,],rp1$row.dimensions[14,])
# cosine similarity between USA and United Kingdom
Decompose the Results of a Regression Model by Cases
Description
This function takes a regression model object and a vector of case assignments to groups (note, cases can be in their own group) and computes each cases' contribution to the overall regression coefficients.
Usage
decompose.model(m1,group.by=group.by,include.int="yes",model.type="OLS")
Arguments
m1 |
A regression model object. OLS, logistic, Poisson and negative binomial regression are supported. |
group.by |
A numeric vector denoting group membership. Should be the same length as the number of cases. |
include.int |
Whether the regression model included an intercept. Default is "yes." |
model.type |
Type of model to be decomposed. OLS via lm, logistic via glm ("logit"), Poisson via glm ("poisson"), and negative binomial via MASS ("nb") are supported. |
Value
decomp.coef |
Each case's or subset of cases' contribution to the estimated slope or regression coefficient. |
decomp.var |
Each case's or subset of cases' contribution to the variance of the estimated slope or regression coefficient. |
Author(s)
David Melamed, Ronald L. Breiger, and Eric Schoon
References
Schoon, Eric, David Melamed, and Ronald L. Breiger. 2024. Regression Inside Out. NY: Cambridge University Press.
Examples
data(Kenworthy99)
m1 <- lm(scale(dv) ~ scale(gdp) + scale(pov) + scale(tran) -1,data=Kenworthy99)
decompose.model(m1,group.by=c("Liberal","Corp","Liberal",
"SocDem","SocDem","Corp","Corp","Corp","Corp","Corp","SocDem",
"SocDem","Liberal","Liberal","Liberal"),include.int="no")
Project point 1 onto the line (at 90 degress) running through point 2 and the origin (0,0).
Description
Given two points, p1 and p2, this function identifies the point at which p1 is projected onto the line connecting p2 and the origin (0,0). The projection occurs at a right angle.
Usage
project.point(p1,p2)
Arguments
p1 |
First point, the one that is to be projected onto point 2. |
p2 |
Second point, the one that is projected to the origin. This is the outcome or dependent variable in our book. See reference below. |
Details
The output is just a single point. This is implemented as the point to which lines are drawn in many graphs.
Value
Two values which correspond to the x and y co-ordinates in the graph.
Author(s)
David Melamed, Ronald L. Breiger, and Eric Schoon
References
Schoon, Eric, David Melamed, and Ronald L. Breiger. 2024. Regression Inside Out. NY: Cambridge University Press.
Examples
data(Kenworthy99)
m1 <- lm(scale(dv) ~ scale(gdp) + scale(pov) + scale(tran) -1,data=Kenworthy99)
rp1 <- rio.plot(m1,include.int="no",r1=1:15)
project.point(as.numeric(rp1$col.dimensions[1,]),as.numeric(rp1$row.dimensions[1,]))
Regression Inside Out: Plotting Regression Models
Description
rio.plot is used to generate a reduced rank image of a regression model. The function computes row and column dimensions for both cases and variables, and generates an image of the model based on those scores.
Usage
rio.plot(m1,exclude.vars="no",r1="none",case.names="",col.names="no",
h.just=-.2,v.just=0,case.col="blue",var.name.col="black",
include.int="yes",group.cases=1,model.type="OLS")
Arguments
m1 |
a regression model object. Supported models include OLS, Logistic, Poisson, and Negative Binomial Regression. |
exclude.vars |
an optional numerical vector indicating variables from the model to exclude from the plot of the model. |
r1 |
an optional numerical vector indicating cases to include in the plot. By default, all cases are excluded from the plot. |
case.names |
a character string of names to label the cases. Should be the same length as 'r1.' |
col.names |
whether to include the variable names in the plot. Default is "no" |
h.just |
horizontal justification in the plot. Default is -.2 |
v.just |
vertical justification in the plot. Default is 0 |
case.col |
if cases are added to the plot, this is their color. Default is "blue" |
var.name.col |
Color of the names of variables in the plot. Default is "black" |
include.int |
Whether the underlying model included a model intercept. Default is "yes" |
group.cases |
Whether to aggregate cases into clusters or subsets. If yes, provide a numeric vector of memberships. It will aggregate over them by summing. |
model.type |
The type of regression model. OLS is supported via the lm function. Logistic and Poisson regression are supported via the glm function. Negative Binomial regression is supported via the MASS package. Default is "OLS." For logistic regression, use "logit." For Poisson regression, use "poisson." For negative binomial regression, use "nb." |
Details
The function take a regression model object (OLS, logistic, Poisson, or negative binomial) and computes the corresponding row (case) and column (variables) scores. The scores are part of the output, as is a ggplot object of the model.
Value
rio.plot returns several objects.
p1 |
a ggplot object of the model space, given the terms in the function |
row.dimensions |
the scores assigned to each case, or each subset of cases if they were aggregated using the 'group.cases' option. These are the co-ordinates in the plot. |
col.dimensions |
the scores assigned to each variable. These are the co-ordinates in the plot. |
case.variances |
each cases' contribution (or each subsets' contribution) to the variance of the estimated regression coefficient |
U |
The orthogonalized column space matrix from the Singular Value Decomposition of the predictor matrix and fitted values. |
UUt |
The orthogonalized column space matrix from the Singular Value Decomposition of the predictor matrix and fitted values, post-multiplied by its transpose. |
Author(s)
David Melamed, Ronald L. Breiger, and Eric Schoon
References
Schoon, Eric, David Melamed, and Ronald L. Breiger. 2024. Regression Inside Out. NY: Cambridge University Press.
Examples
data(Kenworthy99)
m1 <- lm(scale(dv) ~ scale(gdp) + scale(pov) + scale(tran) -1,data=Kenworthy99)
rp1 <- rio.plot(m1,include.int="no")
names(rp1)
rp1$gg.obj
# rp1$gg.obj + ggplot2::scale_x_continuous(limits=c(-.55,1)) # useful option
rp2 <- rio.plot(m1,r1=1:15,case.names=paste(1:15),include.int="no")
rp2$gg.obj
Kenworthy99 <- data.frame(Kenworthy99,type=c("Liberal","Corp","Liberal",
"SocDem","SocDem","Corp","Corp","Corp","Corp","Corp","SocDem","SocDem",
"Liberal","Liberal","Liberal"))
rp3 <- rio.plot(m1,r1=1:15,group.cases=Kenworthy99$type,include.int="no")
rp3$gg.obj
# rp3$gg.obj + ggplot2::scale_x_continuous(limits=c(-1,20))