Version: | 4.0-4 |
Date: | 2025-10-09 |
Title: | Ecological Inference in 2x2 Tables |
Maintainer: | Kosuke Imai <imai@Harvard.Edu> |
Depends: | R (≥ 2.0), MASS, utils |
Suggests: | testthat |
Description: | Implements the Bayesian and likelihood methods proposed in Imai, Lu, and Strauss (2008 <doi:10.1093/pan/mpm017>) and (2011 <doi:10.18637/jss.v042.i05>) for ecological inference in 2 by 2 tables as well as the method of bounds introduced by Duncan and Davis (1953). The package fits both parametric and nonparametric models using either the Expectation-Maximization algorithms (for likelihood models) or the Markov chain Monte Carlo algorithms (for Bayesian models). For all models, the individual-level data can be directly incorporated into the estimation whenever such data are available. Along with in-sample and out-of-sample predictions, the package also provides a functionality which allows one to quantify the effect of data aggregation on parameter estimation and hypothesis testing under the parametric likelihood models. |
LazyLoad: | yes |
LazyData: | yes |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
URL: | https://github.com/kosukeimai/eco |
BugReports: | https://github.com/kosukeimai/eco/issues |
RoxygenNote: | 7.3.3 |
NeedsCompilation: | yes |
Packaged: | 2025-10-16 00:43:21 UTC; kosukeimai |
Author: | Kosuke Imai [aut, cre], Ying Lu [aut], Aaron Strauss [aut], Hubert Jin [ctb] |
Repository: | CRAN |
Date/Publication: | 2025-10-21 11:10:02 UTC |
Fitting the Parametric Bayesian Model of Ecological Inference in 2x2 Tables
Description
Qfun
returns the complete log-likelihood that is used to calculate
the fraction of missing information.
Usage
Qfun(theta, suff.stat, n)
Arguments
theta |
A vector that contains the MLE |
suff.stat |
A vector of sufficient statistics of |
n |
A integer representing the sample size. |
Value
A single numeric value: the complete-data log-likelihood.
References
Imai, Kosuke, Ying Lu and Aaron Strauss. (2011). “eco: R Package for Ecological Inference in 2x2 Tables” Journal of Statistical Software, Vol. 42, No. 5, pp. 1-23.
Imai, Kosuke, Ying Lu and Aaron Strauss. (2008). “Bayesian and Likelihood Inference for 2 x 2 Ecological Tables: An Incomplete Data Approach” Political Analysis, Vol. 16, No. 1 (Winter), pp. 41-69.
See Also
ecoML
Black Illiteracy Rates in 1910 US Census
Description
This data set contains the proportion of the residents who are black, the proportion of those who can read, the total population as well as the actual black literacy rate and white literacy rate for 1040 counties in the US. The dataset was originally analyzed by Robinson (1950) at the state level. King (1997) recoded the 1910 census at county level. The data set only includes those who are older than 10 years of age.
Format
A data frame containing 5 variables and 1040 observations
X | numeric | the proportion of Black residents in each county |
Y | numeric | the overall literacy rates in each county |
N | numeric | the total number of residents in each county |
W1 | numeric | the actual Black literacy rate |
W2 | numeric | the actual White literacy rate |
References
Robinson, W.S. (1950). “Ecological Correlations and the
Behavior of Individuals.” American Sociological Review, vol. 15,
pp.351-357.
King, G. (1997). “A Solution to the Ecological
Inference Problem: Reconstructing Individual Behavior from Aggregate Data”.
Princeton University Press, Princeton, NJ.
Fitting the Parametric Bayesian Model of Ecological Inference in 2x2 Tables
Description
eco
is used to fit the parametric Bayesian model (based on a
Normal/Inverse-Wishart prior) for ecological inference in 2 \times 2
tables via Markov chain Monte Carlo. It gives the in-sample predictions as
well as the estimates of the model parameters. The model and algorithm are
described in Imai, Lu and Strauss (2008, 2011).
Usage
eco(
formula,
data = parent.frame(),
N = NULL,
supplement = NULL,
context = FALSE,
mu0 = 0,
tau0 = 2,
nu0 = 4,
S0 = 10,
mu.start = 0,
Sigma.start = 10,
parameter = TRUE,
grid = FALSE,
n.draws = 5000,
burnin = 0,
thin = 0,
verbose = FALSE
)
Arguments
formula |
A symbolic description of the model to be fit, specifying the
column and row margins of |
data |
An optional data frame in which to interpret the variables in
|
N |
An optional variable representing the size of the unit; e.g., the
total number of voters. |
supplement |
An optional matrix of supplemental data. The matrix has
two columns, which contain additional individual-level data such as survey
data for |
context |
Logical. If |
mu0 |
A scalar or a numeric vector that specifies the prior mean for
the mean parameter |
tau0 |
A positive integer representing the scale parameter of the
Normal-Inverse Wishart prior for the mean and variance parameter |
nu0 |
A positive integer representing the prior degrees of freedom of
the Normal-Inverse Wishart prior for the mean and variance parameter
|
S0 |
A positive scalar or a positive definite matrix that specifies the
prior scale matrix of the Normal-Inverse Wishart prior for the mean and
variance parameter |
mu.start |
A scalar or a numeric vector that specifies the starting
values of the mean parameter |
Sigma.start |
A scalar or a positive definite matrix that specified the
starting value of the variance matrix |
parameter |
Logical. If |
grid |
Logical. If |
n.draws |
A positive integer. The number of MCMC draws. The default is
|
burnin |
A positive integer. The burnin interval for the Markov chain;
i.e. the number of initial draws that should not be stored. The default is
|
thin |
A positive integer. The thinning interval for the Markov chain;
i.e. the number of Gibbs draws between the recorded values that are skipped.
The default is |
verbose |
Logical. If |
Details
An example of 2 \times 2
ecological table for racial voting is given
below:
black voters | white voters | |||
vote | W_{1i} | W_{2i} | Y_i |
|
not vote | 1-W_{1i} | 1-W_{2i} | 1-Y_i |
|
X_i | 1-X_i |
where Y_i
and X_i
represent
the observed margins, and W_1
and W_2
are unknown variables. In
this exmaple, Y_i
is the turnout rate in the ith precint, X_i
is
the proproption of African American in the ith precinct. The unknowns
W_{1i}
an dW_{2i}
are the black and white turnout, respectively.
All variables are proportions and hence bounded between 0 and 1. For each
i
, the following deterministic relationship holds, Y_i=X_i
W_{1i}+(1-X_i)W_{2i}
.
Value
An object of class eco
containing the following elements:
call |
The matched call. |
X |
The row margin, |
Y |
The column margin, |
N |
The size of each table, |
burnin |
The number of initial burnin draws. |
thin |
The thinning interval. |
nu0 |
The prior degrees of freedom. |
tau0 |
The prior scale parameter. |
mu0 |
The prior mean. |
S0 |
The prior scale matrix. |
W |
A three dimensional array storing the posterior in-sample predictions of |
Wmin |
A numeric matrix storing the lower bounds of |
Wmax |
A numeric matrix storing the upper bounds of |
The
following additional elements are included in the output when
parameter = TRUE
.
mu |
The posterior draws of the population mean parameter, |
Sigma |
The posterior draws of the population variance matrix, |
References
Imai, Kosuke, Ying Lu and Aaron Strauss. (2011). “eco: R Package for Ecological Inference in 2x2 Tables” Journal of Statistical Software, Vol. 42, No. 5, pp. 1-23.
Imai, Kosuke, Ying Lu and Aaron Strauss. (2008). “Bayesian and Likelihood Inference for 2 x 2 Ecological Tables: An Incomplete Data Approach” Political Analysis, Vol. 16, No. 1 (Winter), pp. 41-69.
See Also
ecoML
, ecoNP
, predict.eco
, summary.eco
Examples
## load the registration data
data(reg)
## NOTE: convergence has not been properly assessed for the following
## examples. See Imai, Lu and Strauss (2008, 2011) for more
## complete analyses.
## fit the parametric model with the default prior specification
res <- eco(Y ~ X, data = reg, verbose = TRUE)
## summarize the results
summary(res)
## obtain out-of-sample prediction
out <- predict(res, verbose = TRUE)
## summarize the results
summary(out)
## load the Robinson's census data
data(census)
## fit the parametric model with contextual effects and N
## using the default prior specification
res1 <- eco(Y ~ X, N = N, context = TRUE, data = census, verbose = TRUE)
## summarize the results
summary(res1)
## obtain out-of-sample prediction
out1 <- predict(res1, verbose = TRUE)
## summarize the results
summary(out1)
Calculating the Bounds for Ecological Inference in RxC Tables
Description
ecoBD
is used to calculate the bounds for missing internal cells of
R \times C
ecological table. The data can be entered either in the
form of counts or proportions.
Usage
ecoBD(formula, data = parent.frame(), N = NULL)
Arguments
formula |
A symbolic description of ecological table to be used,
specifying the column and row margins of |
data |
An optional data frame in which to interpret the variables in
|
N |
An optional variable representing the size of the unit; e.g., the
total number of voters. If |
Details
The data may be entered either in the form of counts or proportions. If
proportions are used, formula
may omit the last row and/or column of
tables, which can be calculated from the remaining margins. For example,
Y ~ X
specifies Y
as the first column margin and X
as
the first row margin in 2 \times 2
tables. If counts are used,
formula
may omit the last row and/or column margin of the table only
if N
is supplied. In this example, the columns will be labeled as
X
and not X
, and the rows will be labeled as Y
and
not Y
.
For larger tables, one can use cbind()
and +
. For example,
cbind(Y1, Y2, Y3) ~ X1 + X2 + X3 + X4)
specifies 3 \times 4
tables.
An R \times C
ecological table in the form of counts:
n_{i11} | n_{i12} | ... | n_{i1C} |
n_{i1.} |
n_{i21} | n_{i22} | ... |
n_{i2C} | n_{i2.} |
... | ... | ... | ... | ... |
n_{iR1} | n_{iR2} | ... |
n_{iRC} | n_{iR.} |
n_{i.1} | n_{i.2} | ... | n_{i.C} | N_i |
where n_{nr.}
and
n_{i.c}
represent the observed margins, N_i
represents the size
of the table, and n_{irc}
are unknown variables. Note that for each
i
, the following deterministic relationships hold; n_{ir.} =
\sum_{c=1}^C n_{irc}
for r=1,\dots,R
, and n_{i.c}=\sum_{r=1}^R
n_{irc}
for c=1,\dots,C
. Then, each of the unknown inner cells can be
bounded in the following manner,
\max(0, n_{ir.}+n_{i.c}-N_i) \le
n_{irc} \le \min(n_{ir.}, n_{i.c}).
If the size of tables, N
, is
provided,
An R \times C
ecological table in the form of proportions:
W_{i11} | W_{i12} | ... |
W_{i1C} | Y_{i1} |
W_{i21} | W_{i22} | ... | W_{i2C} | Y_{i2} |
... | ... | ... | ... | ... |
W_{iR1} | W_{iR2} | ... | W_{iRC} | Y_{iR} |
X_{i1} |
X_{i2} | ... | X_{iC} |
where Y_{ir}
and
X_{ic}
represent the observed margins, and W_{irc}
are unknown
variables. Note that for each i
, the following deterministic
relationships hold; Y_{ir} = \sum_{c=1}^C X_{ic} W_{irc}
for
r=1,\dots,R
, and \sum_{r=1}^R W_{irc}=1
for c=1,\dots,C
.
Then, each of the inner cells of the table can be bounded in the following
manner,
\max(0, (X_{ic} + Y_{ir}-1)/X_{ic}) \le W_{irc} \le \min(1,
Y_{ir}/X_{ir}).
Value
An object of class ecoBD
containing the following elements
(When three dimensional arrays are used, the first dimension indexes the
observations, the second dimension indexes the row numbers, and the third
dimension indexes the column numbers):
call |
The matched call. |
X |
A matrix of the observed row margin, |
Y |
A matrix of the observed column margin, |
N |
A vector of the size of ecological tables, |
aggWmin |
A three dimensional array of aggregate lower bounds for proportions. |
aggWmax |
A three dimensional array of aggregate upper bounds for proportions. |
Wmin |
A three dimensional array of lower bounds for proportions. |
Wmax |
A three dimensional array of upper bounds for proportions. |
Nmin |
A three dimensional array of lower bounds for counts. |
Nmax |
A three dimensional array of upper bounds for counts. |
The object
can be printed through print.ecoBD
.
References
Imai, Kosuke, Ying Lu and Aaron Strauss. (2011) “eco: R Package for Ecological Inference in 2x2 Tables” Journal of Statistical Software, Vol. 42, No. 5, pp. 1-23.
Imai, Kosuke, Ying Lu and Aaron Strauss. (2008) “Bayesian and Likelihood Inference for 2 x 2 Ecological Tables: An Incomplete Data Approach” Political Analysis, Vol. 16, No. 1, (Winter), pp. 41-69.
See Also
eco
, ecoNP
Examples
## load the registration data
data(reg)
## calculate the bounds
res <- ecoBD(Y ~ X, N = N, data = reg)
## print the results
print(res)
Fitting Parametric Models and Quantifying Missing Information for Ecological Inference in 2x2 Tables
Description
ecoML
is used to fit parametric models for ecological inference in
2 \times 2
tables via Expectation Maximization (EM) algorithms. The
data is specified in proportions. At it's most basic setting, the algorithm
assumes that the individual-level proportions (i.e., W_1
and
W_2
) and distributed bivariate normally (after logit transformations).
The function calculates point estimates of the parameters for models based
on different assumptions. The standard errors of the point estimates are
also computed via Supplemented EM algorithms. Moreover, ecoML
quantifies the amount of missing information associated with each parameter
and allows researcher to examine the impact of missing information on
parameter estimation in ecological inference. The models and algorithms are
described in Imai, Lu and Strauss (2008, 2011).
Usage
ecoML(
formula,
data = parent.frame(),
N = NULL,
supplement = NULL,
theta.start = c(0, 0, 1, 1, 0),
fix.rho = FALSE,
context = FALSE,
sem = TRUE,
epsilon = 10^(-6),
maxit = 1000,
loglik = TRUE,
hyptest = FALSE,
verbose = FALSE
)
Arguments
formula |
A symbolic description of the model to be fit, specifying the
column and row margins of |
data |
An optional data frame in which to interpret the variables in
|
N |
An optional variable representing the size of the unit; e.g., the
total number of voters. |
supplement |
An optional matrix of supplemental data. The matrix has
two columns, which contain additional individual-level data such as survey
data for |
theta.start |
A numeric vector that specifies the starting values for
the mean, variance, and covariance. When |
fix.rho |
Logical. If |
context |
Logical. If |
sem |
Logical. If |
epsilon |
A positive number that specifies the convergence criterion
for EM algorithm. The square root of |
maxit |
A positive integer specifies the maximum number of iterations
before the convergence criterion is met. The default is |
loglik |
Logical. If |
hyptest |
Logical. If |
verbose |
Logical. If |
Details
When SEM
is TRUE
, ecoML
computes the observed-data
information matrix for the parameters of interest based on Supplemented-EM
algorithm. The inverse of the observed-data information matrix can be used
to estimate the variance-covariance matrix for the parameters estimated from
EM algorithms. In addition, it also computes the expected complete-data
information matrix. Based on these two measures, one can further calculate
the fraction of missing information associated with each parameter. See
Imai, Lu and Strauss (2006) for more details about fraction of missing
information.
Moreover, when hytest=TRUE
, ecoML
allows to estimate the
parametric model under the null hypothesis that mu_1=mu_2
. One can
then construct the likelihood ratio test to assess the hypothesis of equal
means. The associated fraction of missing information for the test statistic
can be also calculated. For details, see Imai, Lu and Strauss (2006) for
details.
Value
An object of class ecoML
containing the following elements:
call |
The matched call. |
X |
The row margin, |
Y |
The column margin, |
N |
The size of each table, |
context |
The assumption under which model is estimated. If
|
sem |
Whether SEM algorithm is used to estimate the standard errors and observed information matrix for the parameter estimates. |
fix.rho |
Whether the correlation or the partial
correlation between |
r12 |
If |
epsilon |
The precision criterion for EM convergence.
|
theta.sem |
The ML estimates of |
W |
In-sample estimation of |
suff.stat |
The sufficient statistics for |
iters.em |
Number of EM iterations before convergence is achieved. |
iters.sem |
Number of SEM iterations before convergence is achieved. |
loglik |
The log-likelihood of the model when convergence is achieved. |
loglik.log.em |
A vector saving the value of the log-likelihood function at each iteration of the EM algorithm. |
mu.log.em |
A matrix saving the unweighted mean estimation of the
logit-transformed individual-level proportions (i.e., |
Sigma.log.em |
A matrix saving the
log of the variance estimation of the logit-transformed individual-level
proportions (i.e., |
rho.fisher.em |
A matrix saving the fisher
transformation of the estimation of the correlations between the
logit-transformed individual-level proportions (i.e., |
Moreover, when sem=TRUE
, ecoML
also output the following
values:
DM |
The matrix characterizing the rates of convergence of the EM algorithms. Such information is also used to calculate the observed-data information matrix |
Icom |
The (expected) complete data information
matrix estimated via SEM algorithm. When |
Iobs |
The observed information matrix. The dimension of |
Imiss |
The difference between |
Vobs |
The (symmetrized) variance-covariance matrix of the ML parameter
estimates. The dimension of |
Iobs |
The (expected) complete-data variance-covariance matrix. The
dimension of |
Vobs.original |
The estimated variance-covariance matrix of the ML parameter
estimates. The dimension of |
Fmis |
The fraction of missing information associated with each parameter estimation. |
VFmis |
The proportion of increased variance associated with each parameter estimation due to observed data. |
Ieigen |
The largest eigen value of |
Icom.trans |
The complete data information matrix for the fisher transformed parameters. |
Iobs.trans |
The observed data information matrix for the fisher transformed parameters. |
Fmis.trans |
The fractions of missing information associated with the fisher transformed parameters. |
References
Imai, Kosuke, Ying Lu and Aaron Strauss. (2011). “eco: R Package for Ecological Inference in 2x2 Tables” Journal of Statistical Software, Vol. 42, No. 5, pp. 1-23.
Imai, Kosuke, Ying Lu and Aaron Strauss. (2008). “Bayesian and Likelihood Inference for 2 x 2 Ecological Tables: An Incomplete Data Approach” Political Analysis, Vol. 16, No. 1 (Winter), pp. 41-69.
See Also
eco
, ecoNP
, summary.ecoML
Examples
## load the census data
data(census)
## NOTE: convergence has not been properly assessed for the following
## examples. See Imai, Lu and Strauss (2006) for more complete analyses.
## In the first example below, in the interest of time, only part of the
## data set is analyzed and the convergence requirement is less stringent
## than the default setting.
## In the second example, the program is arbitrarily halted 100 iterations
## into the simulation, before convergence.
## load the Robinson's census data
data(census)
## fit the parametric model with the default model specifications
res <- ecoML(Y ~ X, data = census[1:100,], N=census[1:100,3],
epsilon=10^(-6), verbose = TRUE)
## summarize the results
summary(res)
## fit the parametric model with some individual
## level data using the default prior specification
surv <- 1:600
res1 <- ecoML(Y ~ X, context = TRUE, data = census[-surv,],
supplement = census[surv,c(4:5,1)], maxit=100, verbose = TRUE)
## summarize the results
summary(res1)
Fitting the Nonparametric Bayesian Models of Ecological Inference in 2x2 Tables
Description
ecoNP
is used to fit the nonparametric Bayesian model (based on a
Dirichlet process prior) for ecological inference in 2 \times 2
tables
via Markov chain Monte Carlo. It gives the in-sample predictions as well as
out-of-sample predictions for population inference. The models and
algorithms are described in Imai, Lu and Strauss (2008, 2011).
Usage
ecoNP(
formula,
data = parent.frame(),
N = NULL,
supplement = NULL,
context = FALSE,
mu0 = 0,
tau0 = 2,
nu0 = 4,
S0 = 10,
alpha = NULL,
a0 = 1,
b0 = 0.1,
parameter = FALSE,
grid = FALSE,
n.draws = 5000,
burnin = 0,
thin = 0,
verbose = FALSE
)
Arguments
formula |
A symbolic description of the model to be fit, specifying the
column and row margins of |
data |
An optional data frame in which to interpret the variables in
|
N |
An optional variable representing the size of the unit; e.g., the
total number of voters. |
supplement |
An optional matrix of supplemental data. The matrix has
two columns, which contain additional individual-level data such as survey
data for |
context |
Logical. If |
mu0 |
A scalar or a numeric vector that specifies the prior mean for
the mean parameter |
tau0 |
A positive integer representing the scale parameter of the
Normal-Inverse Wishart prior for the mean and variance parameter
|
nu0 |
A positive integer representing the prior degrees of freedom of
the variance matrix |
S0 |
A positive scalar or a positive definite matrix that specifies the
prior scale matrix for the variance matrix |
alpha |
A positive scalar representing a user-specified fixed value of
the concentration parameter, |
a0 |
A positive integer representing the value of shape parameter of
the gamma prior distribution for |
b0 |
A positive integer representing the value of the scale parameter
of the gamma prior distribution for |
parameter |
Logical. If |
grid |
Logical. If |
n.draws |
A positive integer. The number of MCMC draws. The default is
|
burnin |
A positive integer. The burnin interval for the Markov chain;
i.e. the number of initial draws that should not be stored. The default is
|
thin |
A positive integer. The thinning interval for the Markov chain;
i.e. the number of Gibbs draws between the recorded values that are skipped.
The default is |
verbose |
Logical. If |
Value
An object of class ecoNP
containing the following elements:
call |
The matched call. |
X |
The row margin, |
Y |
The column margin, |
burnin |
The number of initial burnin draws. |
thin |
The thinning interval. |
nu0 |
The prior degrees of freedom. |
tau0 |
The prior scale parameter. |
mu0 |
The prior mean. |
S0 |
The prior scale matrix. |
a0 |
The prior shape parameter. |
b0 |
The prior scale parameter. |
W |
A three dimensional array storing the posterior in-sample predictions
of |
Wmin |
A numeric matrix storing the lower bounds of |
Wmax |
A numeric matrix storing the upper bounds of |
The following additional elements are included in the output when
parameter = TRUE
.
mu |
A three dimensional array storing the
posterior draws of the population mean parameter, |
Sigma |
A three dimensional array storing the posterior draws of the
population variance matrix, |
alpha |
The posterior draws of |
nstar |
The number of clusters at each Gibbs draw. |
References
Imai, Kosuke, Ying Lu and Aaron Strauss. (2011). “eco: R Package for Ecological Inference in 2x2 Tables” Journal of Statistical Software, Vol. 42, No. 5, pp. 1-23.
Imai, Kosuke, Ying Lu and Aaron Strauss. (2008). “Bayesian and Likelihood Inference for 2 x 2 Ecological Tables: An Incomplete Data Approach” Political Analysis, Vol. 16, No. 1 (Winter), pp. 41-69.
See Also
eco
, ecoML
, predict.eco
, summary.ecoNP
Examples
## load the registration data
data(reg)
## NOTE: We set the number of MCMC draws to be a very small number in
## the following examples; i.e., convergence has not been properly
## assessed. See Imai, Lu and Strauss (2006) for more complete examples.
## fit the nonparametric model to give in-sample predictions
## store the parameters to make population inference later
res <- ecoNP(Y ~ X, data = reg, n.draws = 50, param = TRUE, verbose = TRUE)
##summarize the results
summary(res)
## obtain out-of-sample prediction
out <- predict(res, verbose = TRUE)
## summarize the results
summary(out)
## density plots of the out-of-sample predictions
oldpar <- par(mfrow=c(2,1))
plot(density(out[,1]), main = "W1")
plot(density(out[,2]), main = "W2")
## load the Robinson's census data
data(census)
## fit the parametric model with contextual effects and N
## using the default prior specification
res1 <- ecoNP(Y ~ X, N = N, context = TRUE, param = TRUE, data = census,
n.draws = 25, verbose = TRUE)
## summarize the results
summary(res1)
## out-of sample prediction
pres1 <- predict(res1)
summary(pres1)
par(oldpar)
Foreign-born literacy in 1930
Description
This data set contains, on a state level, the proportion of white residents ten years and older who are foreign born, and the proportion of those residents who are literate. Data come from the 1930 census and were first analyzed by Robinson (1950).
Format
A data frame containing 5 variables and 48 observations
X | numeric | proportion of the white population at least 10 years of age that is foreign born |
Y | numeric | proportion of the white population at least 10 years of age that is illiterate |
W1 | numeric | proportion of the foreign-born white population at least 10 years of age that is illiterate |
W2 | numeric | proportion of the native-born white population at least 10 years of age that is illiterate |
ICPSR | numeric | the ICPSR state code |
References
Robinson, W.S. (1950). “Ecological Correlations and the Behavior of Individuals.” American Sociological Review, vol. 15, pp.351-357.
Foreign-born literacy in 1930, County Level
Description
This data set contains, on a county level, the proportion of white residents ten years and older who are foreign born, and the proportion of those residents who are literate. Data come from the 1930 census and were first analyzed by Robinson (1950). Counties with fewer than 100 foreign born residents are dropped.
Format
A data frame containing 6 variables and 1976 observations
X | numeric | proportion of the white population at least 10 years of age that is foreign born |
Y | numeric | proportion of the white population at least 10 years of age that is illiterate |
W1 | numeric | proportion of the foreign-born white population at least 10 years of age that is illiterate |
W2 | numeric | proportion of the native-born white population at least 10 years of age that is illiterate |
state | numeric | the ICPSR state code |
county | numeric | the ICPSR (within state) county code |
References
Robinson, W.S. (1950). “Ecological Correlations and the Behavior of Individuals.” American Sociological Review, vol. 15, pp.351-357.
Electoral Results for the House and Presidential Races in 1988
Description
This data set contains, on a House district level, the percentage of the vote for the Democratic House candidate, the percentage of the vote for the Democratic presidential candidate (Dukakis), the number of voters who voted for a major party candidate in the presidential race, and the ratio of voters in the House race versus the number who cast a ballot for President. Eleven (11) uncontested races are not included. Dataset compiled and analyzed by Burden and Kimball (1988). Complete dataset and documentation available at ICSPR study number 1140.
Format
A data frame containing 5 variables and 424 observations
X | numeric | proportion voting for the Democrat in the presidential race |
Y | numeric | proportion voting for the Democrat in the House race |
N | numeric | number of major party voters in the presidential contest |
HPCT | numeric | House election turnout divided by presidential election turnout (set to 1 if House turnout exceeds presidential turnout) |
DIST | numeric | 4-digit ICPSR state and district code: first 2 digits for the state code, last two digits for the district number (e.g., 2106=IL 6th) |
References
Burden, Barry C. and David C. Kimball (1988). “A New Approach To Ticket- Splitting.” The American Political Science Review. vol 92., no. 3, pp. 553-544.
Out-of-Sample Posterior Prediction under the Parametric Bayesian Model for Ecological Inference in 2x2 Tables
Description
Obtains out-of-sample posterior predictions under the fitted parametric
Bayesian model for ecological inference. predict
method for class
eco
and ecoX
.
Usage
## S3 method for class 'eco'
predict(object, newdraw = NULL, subset = NULL, verbose = FALSE, ...)
Arguments
object |
An output object from |
newdraw |
An optional list containing two matrices (or three
dimensional arrays for the nonparametric model) of MCMC draws of |
subset |
A scalar or numerical vector specifying the row number(s) of
|
verbose |
logical. If |
... |
further arguments passed to or from other methods. |
Details
The posterior predictive values are computed using the Monte Carlo sample
stored in the eco
output (or other sample if newdraw
is
specified). Given each Monte Carlo sample of the parameters, we sample the
vector-valued latent variable from the appropriate multivariate Normal
distribution. Then, we apply the inverse logit transformation to obtain the
predictive values of proportions, W
. The computation may be slow
(especially for the nonparametric model) if a large Monte Carlo sample of
the model parameters is used. In either case, setting verbose = TRUE
may be helpful in monitoring the progress of the code.
Value
predict.eco
yields a matrix of class predict.eco
containing the Monte Carlo sample from the posterior predictive distribution
of inner cells of ecological tables. summary.predict.eco
will
summarize the output, and print.summary.predict.eco
will print the
summary.
See Also
eco
, predict.ecoNP
Out-of-Sample Posterior Prediction under the Nonparametric Bayesian Model for Ecological Inference in 2x2 Tables
Description
Obtains out-of-sample posterior predictions under the fitted nonparametric
Bayesian model for ecological inference. predict
method for class
ecoNP
and ecoNPX
.
Usage
## S3 method for class 'ecoNP'
predict(
object,
newdraw = NULL,
subset = NULL,
obs = NULL,
verbose = FALSE,
...
)
Arguments
object |
An output object from |
newdraw |
An optional list containing two matrices (or three
dimensional arrays for the nonparametric model) of MCMC draws of |
subset |
A scalar or numerical vector specifying the row number(s) of
|
obs |
An integer or vector of integers specifying the observation
number(s) whose posterior draws will be used for predictions. The default is
|
verbose |
logical. If |
... |
further arguments passed to or from other methods. |
Details
The posterior predictive values are computed using the Monte Carlo sample
stored in the eco
or ecoNP
output (or other sample if
newdraw
is specified). Given each Monte Carlo sample of the
parameters, we sample the vector-valued latent variable from the appropriate
multivariate Normal distribution. Then, we apply the inverse logit
transformation to obtain the predictive values of proportions, W
. The
computation may be slow (especially for the nonparametric model) if a large
Monte Carlo sample of the model parameters is used. In either case, setting
verbose = TRUE
may be helpful in monitoring the progress of the code.
Value
predict.eco
yields a matrix of class predict.eco
containing the Monte Carlo sample from the posterior predictive distribution
of inner cells of ecological tables. summary.predict.eco
will
summarize the output, and print.summary.predict.eco
will print the
summary.
See Also
eco
, ecoNP
, summary.eco
, summary.ecoNP
Out-of-Sample Posterior Prediction under the Nonparametric Bayesian Model for Ecological Inference in 2x2 Tables
Description
Obtains out-of-sample posterior predictions under the fitted nonparametric
Bayesian model for ecological inference. predict
method for class
ecoNP
and ecoNPX
.
Usage
## S3 method for class 'ecoNPX'
predict(
object,
newdraw = NULL,
subset = NULL,
obs = NULL,
cond = FALSE,
verbose = FALSE,
...
)
Arguments
object |
An output object from |
newdraw |
An optional list containing two matrices (or three
dimensional arrays for the nonparametric model) of MCMC draws of |
subset |
A scalar or numerical vector specifying the row number(s) of
|
obs |
An integer or vector of integers specifying the observation
number(s) whose posterior draws will be used for predictions. The default is
|
cond |
logical. If |
verbose |
logical. If |
... |
further arguments passed to or from other methods. |
Details
The posterior predictive values are computed using the Monte Carlo sample
stored in the eco
or ecoNP
output (or other sample if
newdraw
is specified). Given each Monte Carlo sample of the
parameters, we sample the vector-valued latent variable from the appropriate
multivariate Normal distribution. Then, we apply the inverse logit
transformation to obtain the predictive values of proportions, W
. The
computation may be slow (especially for the nonparametric model) if a large
Monte Carlo sample of the model parameters is used. In either case, setting
verbose = TRUE
may be helpful in monitoring the progress of the code.
Value
predict.eco
yields a matrix of class predict.eco
containing the Monte Carlo sample from the posterior predictive distribution
of inner cells of ecological tables. summary.predict.eco
will
summarize the output, and print.summary.predict.eco
will print the
summary.
See Also
eco
, ecoNP
, summary.eco
, summary.ecoNP
Out-of-Sample Posterior Prediction under the Parametric Bayesian Model for Ecological Inference in 2x2 Tables
Description
Obtains out-of-sample posterior predictions under the fitted parametric
Bayesian model for ecological inference. predict
method for class
eco
and ecoX
.
Usage
## S3 method for class 'ecoX'
predict(
object,
newdraw = NULL,
subset = NULL,
newdata = NULL,
cond = FALSE,
verbose = FALSE,
...
)
Arguments
object |
An output object from |
newdraw |
An optional list containing two matrices (or three
dimensional arrays for the nonparametric model) of MCMC draws of |
subset |
A scalar or numerical vector specifying the row number(s) of
|
newdata |
An optional data frame containing a new data set for which posterior predictions will be made. The new data set must have the same variable names as those in the original data. |
cond |
logical. If |
verbose |
logical. If |
... |
further arguments passed to or from other methods. |
Details
The posterior predictive values are computed using the Monte Carlo sample
stored in the eco
output (or other sample if newdraw
is
specified). Given each Monte Carlo sample of the parameters, we sample the
vector-valued latent variable from the appropriate multivariate Normal
distribution. Then, we apply the inverse logit transformation to obtain the
predictive values of proportions, W
. The computation may be slow
(especially for the nonparametric model) if a large Monte Carlo sample of
the model parameters is used. In either case, setting verbose = TRUE
may be helpful in monitoring the progress of the code.
Value
predict.eco
yields a matrix of class predict.eco
containing the Monte Carlo sample from the posterior predictive distribution
of inner cells of ecological tables. summary.predict.eco
will
summarize the output, and print.summary.predict.eco
will print the
summary.
See Also
eco
, predict.ecoNP
Print the Summary of the Results for the Bayesian Parametric Model for Ecological Inference in 2x2 Tables
Description
summary
method for class eco
.
Usage
## S3 method for class 'summary.eco'
print(x, digits = max(3, getOption("digits") - 3), ...)
Arguments
x |
An object of class |
digits |
the number of significant digits to use when printing. |
... |
further arguments passed to or from other methods. |
Value
summary.eco
yields an object of class summary.eco
containing the following elements:
call |
The call from |
n.obs |
The number of units. |
n.draws |
The number of Monte Carlo samples. |
agg.table |
Aggregate posterior estimates of the marginal
means of |
If
param = TRUE
, the following elements are also included:
param.table |
Posterior estimates of model parameters: population mean
estimates of |
If
units = TRUE
, the following elements are also included:
W1.table |
Unit-level posterior estimates for |
W2.table |
Unit-level posterior estimates for |
This object can be printed by print.summary.eco
See Also
eco
, predict.eco
Print the Summary of the Results for the Maximum Likelihood Parametric Model for Ecological Inference in 2x2 Tables
Description
summary
method for class eco
.
Usage
## S3 method for class 'summary.ecoML'
print(x, digits = max(3, getOption("digits") - 3), ...)
Arguments
x |
An object of class |
digits |
the number of significant digits to use when printing. |
... |
further arguments passed to or from other methods. |
Value
summary.eco
yields an object of class summary.eco
containing the following elements:
call |
The call from |
sem |
Whether the SEM algorithm was executed, as specified by the user
upon calling |
fix.rho |
Whether the correlation parameter was fixed or allowed to vary,
as specified by the user upon calling |
epsilon |
The convergence threshold specified by the
user upon calling |
n.obs |
The number of units. |
iters.em |
The number iterations the EM algorithm cycled through before convergence or reaching the maximum number of iterations allowed. |
iters.sem |
The number iterations the SEM algorithm cycled through before convergence or reaching the maximum number of iterations allowed. |
loglik |
The final observed log-likelihood. |
rho |
A matrix of |
param.table |
Final estimates of the parameter values for the model.
Excludes parameters fixed by the user upon calling |
agg.table |
Aggregate estimates of the marginal means of |
agg.wtable |
Aggregate estimates of the marginal means of |
If units = TRUE
, the following elements
are also included:
W.table |
Unit-level estimates for |
This object can be printed by print.summary.eco
See Also
ecoML
Print the Summary of the Results for the Bayesian Nonparametric Model for Ecological Inference in 2x2 Tables
Description
summary
method for class ecoNP
.
Usage
## S3 method for class 'summary.ecoNP'
print(x, digits = max(3, getOption("digits") - 3), ...)
Arguments
x |
An object of class |
digits |
the number of significant digits to use when printing. |
... |
further arguments passed to or from other methods. |
Value
summary.ecoNP
yields an object of class summary.ecoNP
containing the following elements:
call |
The call from |
n.obs |
The number of units. |
n.draws |
The number of Monte Carlo samples. |
agg.table |
Aggregate posterior estimates of the marginal
means of |
If
param = TRUE
, the following elements are also included:
param.table |
Posterior estimates of model parameters: population mean
estimates of |
If unit = TRUE
,
the following elements are also included:
W1.table |
Unit-level posterior estimates for |
W2.table |
Unit-level posterior estimates for |
This object can be printed by print.summary.ecoNP
See Also
ecoNP
, predict.eco
Voter Registration in US Southern States
Description
This data set contains the racial composition, the registration rate, the number of eligible voters as well as the actual observed racial registration rates for every county in four US southern states: Florida, Louisiana, North Carolina, and South Carolina.
Format
A data frame containing 5 variables and 275 observations
X | numeric | the fraction of Black voters |
Y | numeric | the fraction of voters who registered themselves |
N | numeric | the total number of voters in each county |
W1 | numeric | the actual fraction of Black voters who registered themselves |
W2 | numeric | the actual fraction of White voters who registered themselves |
References
King, G. (1997). “A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data”. Princeton University Press, Princeton, NJ.
Summarizing the Results for the Bayesian Parametric Model for Ecological Inference in 2x2 Tables
Description
summary
method for class eco
.
Usage
## S3 method for class 'eco'
summary(
object,
CI = c(2.5, 97.5),
param = TRUE,
units = FALSE,
subset = NULL,
...
)
Arguments
object |
An output object from |
CI |
A vector of lower and upper bounds for the Bayesian credible intervals used to summarize the results. The default is the equal tail 95 percent credible interval. |
param |
Logical. If |
units |
Logical. If |
subset |
A numeric vector indicating the subset of the units whose
in-sample predications to be provided when |
... |
further arguments passed to or from other methods. |
Value
summary.eco
yields an object of class summary.eco
containing the following elements:
call |
The call from |
n.obs |
The number of units. |
n.draws |
The number of Monte Carlo samples. |
agg.table |
Aggregate posterior estimates of the marginal
means of |
If
param = TRUE
, the following elements are also included:
param.table |
Posterior estimates of model parameters: population mean
estimates of |
If
units = TRUE
, the following elements are also included:
W1.table |
Unit-level posterior estimates for |
W2.table |
Unit-level posterior estimates for |
This object can be printed by print.summary.eco
See Also
eco
, predict.eco
Summarizing the Results for the Maximum Likelihood Parametric Model for Ecological Inference in 2x2 Tables
Description
summary
method for class eco
.
Usage
## S3 method for class 'ecoML'
summary(
object,
CI = c(2.5, 97.5),
param = TRUE,
units = FALSE,
subset = NULL,
...
)
Arguments
object |
An output object from |
CI |
A vector of lower and upper bounds for the Bayesian credible intervals used to summarize the results. The default is the equal tail 95 percent credible interval. |
param |
Ignored. |
units |
Logical. If |
subset |
A numeric vector indicating the subset of the units whose
in-sample predications to be provided when |
... |
further arguments passed to or from other methods. |
Value
summary.eco
yields an object of class summary.eco
containing the following elements:
call |
The call from |
sem |
Whether the SEM algorithm was executed, as specified by the user
upon calling |
fix.rho |
Whether the correlation parameter was fixed or allowed to
vary, as specified by the user upon calling |
epsilon |
The convergence threshold specified by the user upon
calling |
n.obs |
The number of units. |
iters.em |
The number iterations the EM algorithm cycled through before convergence or reaching the maximum number of iterations allowed. |
iters.sem |
The number iterations the SEM algorithm cycled through before convergence or reaching the maximum number of iterations allowed. |
loglik |
The final observed log-likelihood. |
rho |
A matrix of |
param.table |
Final estimates of the parameter values for the model.
Excludes parameters fixed by the user upon calling |
agg.table |
Aggregate estimates of the marginal means of
|
agg.wtable |
Aggregate estimates of the marginal means
of |
If units = TRUE
, the following elements are also included:
W.table |
Unit-level estimates for |
This object can be printed by print.summary.eco
See Also
ecoML
Summarizing the Results for the Bayesian Nonparametric Model for Ecological Inference in 2x2 Tables
Description
summary
method for class ecoNP
.
Usage
## S3 method for class 'ecoNP'
summary(
object,
CI = c(2.5, 97.5),
param = FALSE,
units = FALSE,
subset = NULL,
...
)
Arguments
object |
An output object from |
CI |
A vector of lower and upper bounds for the Bayesian credible intervals used to summarize the results. The default is the equal tail 95 percent credible interval. |
param |
Logical. If |
units |
Logical. If |
subset |
A numeric vector indicating the subset of the units whose
in-sample predications to be provided when |
... |
further arguments passed to or from other methods. |
Value
summary.ecoNP
yields an object of class summary.ecoNP
containing the following elements:
call |
The call from |
n.obs |
The number of units. |
n.draws |
The number of Monte Carlo samples. |
agg.table |
Aggregate posterior estimates of the marginal
means of |
If
param = TRUE
, the following elements are also included:
param.table |
Posterior estimates of model parameters: population mean
estimates of |
If unit = TRUE
,
the following elements are also included:
W1.table |
Unit-level posterior estimates for |
W2.table |
Unit-level posterior estimates for |
This object can be printed by print.summary.ecoNP
See Also
ecoNP
, predict.eco
Calculate the variance or covariance of the object
Description
varcov
returns the variance or covariance of the object.
Usage
varcov(object, ...)
Arguments
object |
An object |
... |
The rest of the input parameters if any |
Value
a variance-covariance matrix
Black voting rates for Wallace for President, 1968
Description
This data set contains, on a county level, the proportion of county residents who are Black and the proportion of presidential votes cast for Wallace. Demographic data is based on the 1960 census. Presidential returns are from ICPSR study 13. County data from 10 southern states (Alabama, Arkansas, Georgia, Florida, Louisiana, Mississippi, North Carolina, South Carolina, Tennessee, Texas) are included. (Virginia is excluded due to the difficulty of matching counties between the datasets.) This data is analyzed in Wallace and Segal (1973).
Format
A data frame containing 3 variables and 1009 observations
X | numeric | proportion of the population that is Black |
Y | numeric | proportion presidential votes cast for Wallace |
FIPS | numeric | the FIPS county code |
References
Wasserman, Ira M. and David R. Segal (1973). “Aggregation Effects in the Ecological Study of Presidential Voting.” American Journal of Political Science. vol. 17, pp. 177-81.