| Title: | Packages and Functions for 'CourseKata' Courses |
| Version: | 0.19.0 |
| Date: | 2025-07-16 |
| Description: | Easily install and load all packages and functions used in 'CourseKata' courses. Aid teaching with helper functions and augment generic functions to provide cohesion between the network of packages. Learn more about 'CourseKata' at https://www.coursekata.org. |
| License: | AGPL (≥ 3) |
| URL: | https://github.com/coursekata/coursekata-r |
| BugReports: | https://github.com/coursekata/coursekata-r/issues |
| Depends: | R (≥ 3.6) |
| Imports: | cli (≥ 3.2.0), dslabs (≥ 0.7.4), ggformula (≥ 0.10.1), ggplot2 (≥ 3.5.0), glue (≥ 1.6.2), lsr (≥ 0.5.2), Metrics, mosaic (≥ 1.8.3), palmerpenguins, purrr (≥ 0.3.4), remotes, rlang (≥ 1.0.2), supernova (≥ 2.5.1), vctrs (≥ 0.4.1), viridisLite |
| Suggests: | fivethirtyeight (≥ 0.6.2), lubridate (≥ 1.8.0), MASS, mockery (≥ 0.4.3), mockr (≥ 0.1), readr (≥ 2.1.2), readxl (≥ 1.4.0), usethis (≥ 2.1.6), simstudy (≥ 0.5.0), testthat (≥ 3.1.2), tibble(≥ 3.1.7), tidyr (≥ 1.2.0), vdiffr (≥ 1.0.2), withr (≥ 2.5.0) |
| Config/testthat/edition: | 3 |
| Config/testthat/parallel: | true |
| Language: | en-US |
| Encoding: | UTF-8 |
| LazyData: | true |
| RoxygenNote: | 7.3.2 |
| NeedsCompilation: | no |
| Packaged: | 2025-07-16 19:15:30 UTC; adamblake |
| Author: | Adam Blake |
| Maintainer: | Adam Blake <adam@coursekata.org> |
| Repository: | CRAN |
| Date/Publication: | 2025-07-16 19:30:02 UTC |
coursekata: Packages and Functions for 'CourseKata' Courses
Description
Easily install and load all packages and functions used in 'CourseKata' courses. Aid teaching with helper functions and augment generic functions to provide cohesion between the network of packages. Learn more about 'CourseKata' at https://www.coursekata.org.
Author(s)
Maintainer: Adam Blake adam@coursekata.org (ORCID)
Authors:
Ji Son ji@coursekata.org (ORCID)
Jim Stigler jim@coursekata.org (ORCID)
See Also
Useful links:
Report bugs at https://github.com/coursekata/coursekata-r/issues
Ames, Iowa housing data
Description
Data describing all residential home sales in Ames, Iowa from the years 2006–2010 as reported by the Ames City Assessor's Office and compiled by De Cock (2011). Ames is located about 30 miles north of Des Moines (the stats capitol) and is home to Iowa State University (the largest university in the state). Each row represents the latest sale of a home (one row per home in the dataset). Columns represent home features and sale prices (outcome). The original dataset includes a uniquely detailed (81 features per home) and comprehensive look at the housing market. The data included here are only a subset used for examples in CourseKata course material. See the references and data source for the full dataset.
Pedagogical Modifications
To simplify the dataset for instructional purposes, the data were filtered to include only single family homes, residential zoning, 1-2 story homes, homes with brick, cinder block, or concrete foundations, and average to excellent kitchen qualities. Further, the descriptive variables were reduced to the subset described in the format section.
Usage
Ames
Format
A data frame with 2930 observations on the following 80 variables:
YearBuiltYear home was built (
YYYY).YearSoldYear of home sale (
YYYY). Note: all home sales in this dataset occurred between 2006 - 2010. If a home was sold more than once between 2006 - 2010, only its latest sale is included in dataset.NeighborhoodOne of two neighborhoods in Ames county:
College Creek (
CollegeCreek), a neighborhood located adjacent to Iowa State University (the largest University in the state).Old Town (
OldTown), a nationally designated historic district in Ames. The old neighborhood is located just north of the central business district.
HomeSizeRRaw above-ground area of home, measured in square feet.
HomeSizeKAbove-ground area of home, measured in thousands of square feet.
LotSizeRRaw total property lot size, measured in square feet.
LotSizeKTotal property lot size, in thousands of square feet.
FloorsNumber of above-ground floors (1 story or 2 story).
BuildQualityAssessor's rating of overall material and finish of the house.
10: Very Excellent9: Excellent8: Very Good7: Good6: Above Average5: Average4: Below Average3: Fair2: Poor1: Very Poor
FoundationType of foundation (ground material underneath the house).
Brick&Tile: Brick and TileCinderBlock: Cinder BlocksPouredConcrete: Poured Concrete
HasCentralAirIndicator if home contains central air conditioning (0 = No, 1 = Yes).
BathroomsNumber of full above-ground bathrooms.
BedroomsNumber of full above-ground bedrooms.
TotalRoomsNumber of above-ground rooms in home, excluding bathrooms.
KitchenQualityAssessor's rating of kitchen material quality.
ExcellentGoodAverage
HasFireplaceIndicator if home contains at least one fireplace (0 = No, 1 = Yes).
GarageTypeType of garage.
Attached: includes attached, built-in, basement, and dual-type garagesDetached: includes detached and carport garagesNone: home does not have a garage or carport
GarageCarsNumber of cars that can fit in garage.
PriceRSale price of home, in raw USD ($)
PriceKSale price of home, in thousands of USD ($)
TinySet(Ignore) Whether or not this row is in
ames_tiny.csv
Source
https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data
References
De Cock, Dean, (2011). Ames, Iowa: Alternative to the Boston Housing Data as an end of semester regression project, Journal of Statistics Education, 19(3). doi:10.1080/10691898.2011.11889627
Data from introductory statistics students at a university.
Description
Students at a university taking an introductory statistics course were asked to complete this survey as part of their homework.
Usage
Fingers
Format
A data frame with 157 observations on the following 16 variables:
GenderGender of participant.
RaceEthnicRacial or ethnic background.
FamilyMembersMembers of immediate family (excluding self).
SSLastLast digit of social security number (
NAif no SSN).YearYear in school:
1=First,2=Second,3=Third,4=Fourth,5=OtherJobCurrent employment status:
1=Not Working,2=Part-time Job,3=Full-time JobMathAnxiousAgreement with the statement "In general I tend to feel very anxious about mathematics":
1=Strongly Disagree,2=Disagree,3=Neither Agree nor Disagree,4=Agree,5=Strongly AgreeInterestInterest in statistics and the course:
1=No Interest,2=Somewhat Interested,3=Very InterestedGradePredictNumeric prediction for final grade in the course. The value is converted from the student's letter grade prediction.
4.0=A,3.7=A-,3.3=B+,3.0=B,2.7=B-,2.3=C+,2.0=C,1.7=C-,1.3=Below C-ThumbLength in mm from tip of thumb to the crease between the thumb and palm.
IndexLength in mm from tip of index finger to the crease between the index finger and palm.
MiddleLength in mm from tip of middle finger to the crease between the middle finger and palm.
RingLength in mm from tip of ring finger to the crease between the middle finger and palm.
PinkieLength in mm from tip of pinkie finger to the crease between the pinkie finger and palm.
HeightHeight in inches.
WeightWeight in pounds.
SexSex of participant.
Raw data from introductory statistics students at a university.
Description
This is the Fingers dataset before it was cleaned. In the cleaning process, we converted the values from numbers to appropriate types (where applicable), removed outliers that suggested data was input incorrectly, and we removed incomplete cases. The description for the dataset is: Students at a university taking an introductory statistics course were asked to complete this survey as part of their homework. (This is the same data set as the Fingers data)
Usage
FingersMessy
Format
A data frame with 157 observations on the following 16 variables:
GenderGender of participant.
RaceEthnicRacial or ethnic background.
FamilyMembersMembers of immediate family (excluding self).
SSLastLast digit of social security number (
NAif no SSN).YearYear in school:
1=First,2=Second,3=Third,4=Fourth,5=OtherJobCurrent employment status:
1=Not Working,2=Part-time Job,3=Full-time JobMathAnxiousAgreement with the statement "In general I tend to feel very anxious about mathematics":
1=Strongly Disagree,2=Disagree,3=Neither Agree nor Disagree,4=Agree,5=Strongly AgreeInterestInterest in statistics and the course:
1=No Interest,2=Somewhat Interested,3=Very InterestedGradePredictNumeric prediction for final grade in the course. The value is converted from the student's letter grade prediction.
4.0=A,3.7=A-,3.3=B+,3.0=B,2.7=B-,2.3=C+,2.0=C,1.7=C-,1.3=Below C-ThumbLength in mm from tip of thumb to the crease between the thumb and palm.
IndexLength in mm from tip of index finger to the crease between the index finger and palm.
MiddleLength in mm from tip of middle finger to the crease between the middle finger and palm.
RingLength in mm from tip of ring finger to the crease between the middle finger and palm.
PinkieLength in mm from tip of pinkie finger to the crease between the pinkie finger and palm.
HeightHeight in inches.
WeightWeight in pounds.
SexSex of participant.
Simulated housing data
Description
These data are simulated to be similar to the Ames housing data, but with far fewer variables and much smaller effect sizes.
Usage
Smallville
Format
A data frame with 32 observations on the following 4 variables:
PriceKPrice the home sold for (in thousands of dollars)
NeighborhoodThe neighborhood the home is in (Eastside, Downtown)
HomeSizeKThe size of the home (in thousands of square feet)
HasFireplaceWhether the home has a fireplace (0 = no, 1 = yes)
Students at a university were asked to enter a random number between 1-20 into a survey.
Description
Students at a university taking an introductory statistics course were asked to complete this survey as part of their homework.
Usage
Survey
Format
A data frame with 211 observations on the following 1 variable:
Any1_20The random number between 1 and 20 that a student thought of.
Tables data
Description
Data about tips collected from an experiment with 44 tables at a restaurant.
Usage
Tables
Format
A data frame with 44 observations on the following 2 variables.
TableIDA number assigned to each table.
TipHow much the tip was.
Data from an experiment about smiley faces and tips
Description
Tables were randomly assigned to receive checks that either included or did not include a drawing of a smiley face. Data was collected from 44 tables in an effort to examine whether the added smiley face would cause more generous tipping.
Usage
TipExperiment
Format
A data frame with 44 observations on the following 3 variables.
TableIDA number assigned to each table.
TipHow much the tip was.
ConditionWhich experimental condition the table was randomly assigned to.
Check(Simulated) The amount of money the table paid for their meal.
FoodQuality(Simulated) The perceived quality of the food.
Data on countries from the Happy Planet Index project.
Description
These data have been updated with some historical height data (from Our World in Data), drinking data (collected by the World Health Organization featured in fivethirtyeight), population and land characteristics, and vaccination data (from March 2023).
Usage
World
Format
A data frame with 130 observations on the following 14 variables:
CountryName of country
RegionOne of 5 UN defined regions: Africa, Americas, Asia, Europe, Oceania
CodeThree-letter country codes defined by the International Organization for Standardization (ISO) to represent countries in a way that avoids errors since a country’s name changes depending on the language being used.
LifeExpectancyAverage life expectancy (in years)
GirlsH1900The average of 18-year-old girls heights in 1900 (in cm)
GirlsH1980The average of 18-year-old girls heights in 1980 (in cm)
HappinessScore on a 0-10 scale for average level of happiness (10 being happiest)
GDPperCapitaGross Domestic Product (per capita)
FertRateThe average number of children that will be born to a woman over her lifetime
PeopleVaccTotal number of people vaccinated in the country
PeopleVacc_per100Total number of people vaccinated in the country (in percent)
Population2010Population (in millions) in 2010
Population2020Population (in millions) in 2020
WineServAverage wine consumption per capita for those age 15 and over per week (collected by WHO)
Generated "class data" for exploring pairwise tests
Description
These data were generated as outcomes for "students" for three different "instructors" named A, B, and C. The outcome have means such that C > B > A, but the difference is only clearly significant for C > A, and borderline for the others.
Usage
class_data
Format
An object of class tbl_df (inherits from tbl, data.frame) with 105 rows and 2 columns.
Details
outcomeA hypothetical, numerical outcome of an intervention.
teacherEither "A", "B", or "C", associating the outcome to a teacher.
Attach the CourseKata course packages
Description
Attach the CourseKata course packages
Usage
coursekata_attach(do_not_ask = FALSE, quietly = FALSE)
Arguments
do_not_ask |
Prevent asking the user to install missing packages (they are skipped). |
quietly |
Whether to suppress messages. |
Value
A named logical vector indicating which packages were attached.
Examples
coursekata_attach()
Install or update all CourseKata packages.
Description
Install or update all CourseKata packages.
Usage
coursekata_install(...)
coursekata_update(...)
Arguments
... |
Arguments passed on to |
Value
The state of all the packages after any updates have been performed.
Utility function for loading all themes.
Description
This function is called at package start-up and should rarely be needed by the user. The
exception is when the user has called coursekata_unload_theme() and wants to go back to the
CourseKata look and feel. When run, this function sets the CourseKata color palettes
coursekata_palette(), sets the default theme to theme_coursekata(), and tweaks some
default settings for specific plots. To restore the original ggplot2 settings, run
coursekata_unload_theme().
Usage
coursekata_load_theme()
Value
No return value, called to adjust the global state of ggplot2.
See Also
coursekata_palette theme_coursekata scale_discrete_coursekata coursekata_unload_theme
List all CourseKata course packages
Description
List all CourseKata course packages
Usage
coursekata_packages(check_remote_version = FALSE)
Arguments
check_remote_version |
Should the remote version number be checked? Requires internet, and will take longer. |
Value
A data frame with three variables: the name of the package package, the version, and
whether it is currently attached.
Examples
coursekata_packages()
The color palettes used in our theme system
Description
The color palettes used in our theme system
Usage
coursekata_palette(indices = integer(0))
Arguments
indices |
The indices of the colors to pull (or all colors if no indices are given). |
Value
A named list of the requested colors in the palette.
Create a function that provides a colorblind palette.
Description
Create a function that provides a colorblind palette.
Usage
coursekata_palette_provider()
Value
A function that accepts one argument n, which is the number of colors you want to use
in the plot. This function is used by scales like scale_color_discrete to provide colorblind-
safe palettes. Where possible, the function will use the hand-picked colors from
coursekata_palette(), and when more colors are needed than are available, it will use the
viridisLite::viridis() palette.
See Also
scale_discrete_coursekata
Get repositories for the packages.
Description
Ensures a default CRAN is set if one is not already set, and adds the repository for fivethirtyeightdata.
Usage
coursekata_repos(repos = getOption("repos"))
Arguments
repos |
Optionally set a repository character vector to augment. |
Value
A set of repositories that can be used to install or update the CourseKata packages.
Examples
coursekata_repos()
Restore ggplot2 default settings
Description
This function will restore all of the tweaks to themes and plotting to the original ggplot2
defaults. If you want to go back to the CourseKata look and feel, run
coursekata_load_theme().
Usage
coursekata_unload_theme()
Value
No return value, called to restore the global state of ggplot2.
See Also
coursekata_load_theme
Emergency room canine therapy
Description
Data from: Controlled clinical trial of canine therapy versus usual care to reduce patient anxiety in the emergency department.
Abstract
Objective
Test if therapy dogs can reduce anxiety in emergency department (ED) patients.
Methods
In this controlled clinical trial (NCT03471429), medically stable, adult patients were approached if the physician believed that the patient had “moderate or greater anxiety.” Patients were allocated on a 1:1 ratio to either 15 min exposure to a certified therapy dog and handler (dog), or usual care (control). Patient reported anxiety, pain and depression were assessed using a 0-10 scale (10=worst). Primary outcome was change in anxiety from baseline (T0) to 30 min and 90 min after exposure to dog or control (T1 and T2 respectively); secondary outcomes were pain, depression and frequency of pain medication.
Results
Among 98 patients willing to participate in research, 7 had aversions to dogs, leaving 91 (93%) were willing to see a dog; 40 patients were allocated to each group (dog or control). No data were normally distributed. Median baseline anxiety, pain and depression were similar between groups. With dog exposure, anxiety decreased significantly from T0 to T1: 6 (IQR 4-9.75) to T1: 2 (0-6) compared with 6 (4-8) to 6 (2.5-8) in controls (P<0.001, for T1, Mann-Whitney U). Dog exposure was associated with significantly lower anxiety at T2 and a significant overall treatment effect on two-way repeated measures ANOVA for anxiety, pain and depression. After exposure, 1/40 in the dog group needed pain medication, versus 7/40 in controls (P=0.056, Fisher’s).
Conclusions
Exposure to therapy dogs plus handlers significantly reduced anxiety in ED patients.
Usage
er
Format
A data frame with 84 observations on the following 53 variables:
idSubject ID
conditionWhether the subject saw a
Dogor was in theControlgroupageSubject's age in years
genderSubject's self-identified gender
raceSubject's self-identified race
veteranIs the subject a veteran?
disabledIs the subject disabled?
dog_nameThe name of the therapy dog
base_painSubject's self reported pain before the intervention (T0)
base_depressionSubject's self reported depression before the intervention (T0)
base_anxietySubject's self reported anxiety before the intervention (T0)
base_totalThe sum of the subject's
base_*scoreslater_painSubject's self reported pain after the intervention (T1)
later_depressionSubject's self reported depression after the intervention (T1)
later_anxietySubject's self reported anxiety after the intervention (T1)
later_totalThe sum of the subject's
later_*scoreslast_painSubject's self reported pain after the intervention (T2)
last_depressionSubject's self reported depression after the intervention (T2)
last_anxietySubject's self reported anxiety after the intervention (T2)
last_totalThe sum of the subject's
last_*scoreschange_painThe change in subject's pain from before the intervention to after
change_depressionThe change in subject's depression from before the intervention to after
change_anxietyThe change in subject's anxiety from before the intervention to after
change_totalThe sum of the subject's
change_*scoresprovider_maleWas the health care provider male?
providerThe health care provider's status: either an
Advanced Practitioner,Residentphysician, orAttendingphysicianheart_rateThe subject's heart rate at baseline (T0)
resp_rateThe subject's respiratory rate at baseline (T0)
sp_o2The subject's SpO2 at baseline (T0)
bp_systThe subject's systolic blood pressure at baseline (T0)
bp_diastThe subject's diastolic blood pressure at baseline (T0)
med_givenWas the subject given medication prior to the study? (T0)
mh_noneNone of the other medical history items were indicated
mh_asthmaMedical history: asthma
mh_smokerMedical history: smoker
mh_cadMedical history: coronary artery disease
mh_diabetesMedical history: diabetes mellitus
mh_hypertensionMedical history: hypertension
mh_strokeMedical history: prior stroke
mh_chronic_kidneyMedical history: chronic kidney disease
mh_copdMedical history: chronic obstructive pulmonary disease
mh_hyperlipidemiaMedical history: hyperlipidemia
mh_hivMedical history: HIV
mh_otherMedical history: other (write-in)
ph_adhdPsychiatric history: attention-deficit/hyperactivity disorder
ph_anxietyPsychiatric history: anxiety
ph_bipolarPsychiatric history: bipolar
ph_borderlinePsychiatric history: borderline personality disorder
ph_depressionPsychiatric history: depression
ph_schizophreniaPsychiatric history: schizophrenia
ph_ptsdPsychiatric history: PTSD
ph_noneNone of the other psychiatric history items were indicated
ph_otherPsychiatric history: other (write-in)
References
Kline, J. A., Fisher, M. A., Pettit, K. L., Linville, C. T., & Beck, A. M. (2019). Controlled clinical trial of canine therapy versus usual care to reduce patient anxiety in the emergency department. PloS One, 14(1), e0209232. doi:10.1371/journal.pone.0209232
Extract estimates/statistics from a model
Description
This collection of functions is useful for extracting estimates and statistics from a fitted
model. They are particularly useful when estimating many models, like when bootstrapping
confidence intervals. Each function can be used with an already fitted model as an lm object,
or a formula and associated data can be passed to it. All of these assume the comparison is the
empty model.
Usage
b0(object, data = NULL)
b1(object, data = NULL)
b(object, data = NULL, all = FALSE, predictor = character())
f(object, data = NULL, all = FALSE, predictor = character(), type = 3)
pre(object, data = NULL, all = FALSE, predictor = character(), type = 3)
p(object, data = NULL, all = FALSE, predictor = character(), type = 3)
fVal(object, data = NULL, all = FALSE, predictor = character(), type = 3)
PRE(object, data = NULL, all = FALSE, predictor = character(), type = 3)
Arguments
object |
|
data |
If |
all |
If |
predictor |
Filter the output down to just the statistics for these terms (e.g. "hp" to
just get the statistics for that term in the model). This argument is flexible: you can pass
a character vector of terms ( |
type |
The type of sums of squares to calculate (see |
Details
-
b0: The intercept from the full model. -
b1: The slope b1 from the full model. -
b: The coefficients from the full model. -
f: The F value from the full model. -
pre: The Proportional Reduction in Error for the full model. -
p: The p-value from the full model. -
sse: The SS Error (SS Residual) from the model. -
ssm: The SS Model (SS Regression) for the full model. -
ssr: Alias for SSM.
Value
The value of the estimate as a single number.
References
Judd, C. M., McClelland, G. H., & Ryan, C. S. (2017). Data Analysis: A Model Comparison Approach to Regression, ANOVA, and Beyond (3rd ed.). New York: Routledge. ISBN:879-1138819832
Examples
supernova(lm(mpg ~ disp, data = mtcars))
change_p_decimals <- supernova(lm(mpg ~ disp, data = mtcars))
print(change_p_decimals, pcut = 8)
Forced Expiratory Volume (FEV) Data
Description
Data from: Fundamentals of Biostatistics Notes from: Kahn, M.
Abstract
Sample of 654 youths, aged 3 to 19, in the area of East Boston during middle to late 1970's. Interest concerns the relationship between smoking and FEV. Since the study is necessarily observational, statistical adjustment via regression models clarifies the relationship.
Pedagogical Notes:
This is a versatile dataset that can be used throughout an introductory statistics course as well as an introductory modeling course. It includes many issues from statistical adjustment in observational studies, to subgroup analysis, quadratic regression and analysis of covariance.
Usage
fevdata
Format
A data frame with 654 observations on the following 5 variables:
AGEAge, in years
FEVForced expiratory volume, in liters
HEIGHTHeight, in inches
SEX0= Female,1= MaleSMOKE0= Non-smoker,1= Smoker
References
Kahn,M. (2003). Data Sleuth, STATS, 37, 24. https://jse.amstat.org/datasets/fev.txt Rosner, B. (1999). Fundamentals of Biostatistics, Pacific Grove, CA: Duxbury
Test the fit of a model on a train and test set.
Description
Test the fit of a model on a train and test set.
Usage
fit_stats(model, df_train, df_test)
fitstats(model, df_train, df_test)
Arguments
model |
An |
df_train |
A data frame with the training data. |
df_test |
A data frame with the test data. |
Value
A data frame with the fit statistics.
Simulated math game data.
Description
The simulated results of a small study comparing the effectiveness of three different computer- based math games in a sample of 105 fifth-grade students. All three games focused on the same topic and had identical learning goals, and none of the students had any prior knowledge of the topic.
Usage
game_data
Format
A data frame with 105 observations on the following 2 variables:
gameThe game the student was randomly assigned to, coded as "A", "B", or "C".
outcomeEach student's score on the outcome test.
Add a model to a plot
Description
When teaching about regression it can be useful to visualize the data as a point plot with the
outcome on the y-axis and the explanatory variable on the x-axis. For regression models, this is
most easily achieved by calling ggformula::gf_lm(), with empty models
ggformula::gf_hline() using the mean, and a more complicated call to
ggformula::gf_segment() for group models. This function simplifies this
by making a guess about what kind of model you are plotting (empty/null, regression, group) and
then making the appropriate plot layer for it.
Usage
gf_model(object, model, ...)
Arguments
object |
A plot created with the |
model |
|
... |
Additional arguments. Typically these are (a) ggplot2 aesthetics to be set with
|
Details
This function only works with models that have a continuous outcome measure.
Value
a gg object (a plot layer) that can be added to a plot.
Add Residual Lines to a Plot
Description
This function adds vertical lines representing residuals from a linear model to a ggformula plot. The residuals are drawn from the observed data points to the predicted values from the model.
Usage
gf_resid(plot, model, linewidth = 0.2, ...)
Arguments
plot |
A ggformula plot object, typically created with |
model |
A fitted linear model object created using |
linewidth |
A numeric value specifying the width of the residual lines. Default is |
... |
Additional aesthetics passed to |
Value
A ggplot object with residual lines added.
Examples
Height_model <- lm(Thumb ~ Height, data = Fingers)
gf_point(Thumb ~ Height, data = Fingers) %>%
gf_model(Height_model) %>%
gf_resid(Height_model, color = "red", alpha = 0.5)
Add Squared Residual Visualization to a Plot
Description
This function adds squared residual representations to a ggformula plot, illustrating squared error as a polygon. The function dynamically adjusts the aspect ratio to ensure proper scaling of squares.
Usage
gf_squaresid(plot, model, aspect = 4/6, alpha = 0.1, ...)
Arguments
plot |
A ggformula plot object, typically created with |
model |
A fitted linear model object created using |
aspect |
A numeric value controlling the square's aspect ratio. Default is |
alpha |
A numeric value specifying the transparency of the square's fill. Default is |
... |
Additional aesthetics passed to |
Value
A ggplot object with squared residuals added.
Examples
Height_model <- lm(Thumb ~ Height, data = Fingers)
gf_point(Thumb ~ Height, data = Fingers) %>%
gf_model(Height_model) %>%
gf_squaresid(Height_model, color = "blue", alpha = 0.5)
Find a percentage of a distribution
Description
Given a distribution, find which values lie in the upper, lower, or middle proportion of the
distribution. Useful when you want to do something like shade in the middle 95% of a plot. This
is a greedy operation, meaning that if the cutoff point is between two whole numbers the
specified region will suck up the extra space. For example, the requesting the upper 30% of the
[1 2 3 4] will return [FALSE FALSE TRUE TRUE] because the 30% was greedy.
Usage
middle(x, prop = 0.95, greedy = TRUE)
tails(x, prop = 0.95, greedy = TRUE)
lower(x, prop = 0.025, greedy = TRUE)
upper(x, prop = 0.025, greedy = TRUE)
Arguments
x |
The distribution of values to check. |
prop |
The proportion of values to find. |
greedy |
Whether the function should be greedy, as per the description above. |
Details
Note that NA values are ignored, i.e. they will always return FALSE.
Value
A logical vector indicating which values are in the specified region.
Examples
upper(1:10, .1)
lower(1:10, .2)
middle(1:10, .5)
tails(1:10, .5)
sampling_distribution <- do(1000) * mean(rnorm(100, 5, 10))
sampling_distribution %>%
gf_histogram(~mean, data = sampling_distribution, fill = ~ middle(mean, .68)) %>%
gf_refine(scale_fill_manual(values = c("blue", "coral")))
A modified form of the palmerpenguins::penguins data set.
Description
The modifications are to select only a subset of the variables, and convert some of the units.
Usage
penguins
Format
A data frame with 333 observations on the following 7 variables:
speciesThe species of penguin, coded as "Adelie", "Chinstrap", or "Gentoo".
gentooWhether the penguin is a Gentoo penguin (1) or not (0).
body_mass_kgThe mass of the penguin's body, in kilograms.
flipper_length_mThe length of the penguin's flipper, in m.
bill_length_cmThe length of the penguin's bill, in cm.
femaleWhether the penguin is female (1) or not (0).
islandThe island where the penguin was observed, coded as "Biscoe", "Dream", or "Torgersen".
A discrete color scale constructor with colorblind-safe palettes.
Description
See coursekata_palette() for more information.
Usage
scale_discrete_coursekata(...)
Arguments
... |
Additional parameters passed on to the scale type. |
Value
A discrete color scale.
See Also
coursekata_palette
Split data into train and test sets.
Description
Split data into train and test sets.
Usage
split_data(data, prop = 0.7)
Arguments
data |
A data frame. |
prop |
The proportion of rows to assign to the training set. |
Value
A list with two data frames, train and test.
A simple theme built on top of ggplot2::theme_bw
Description
The coursekata package automatically loads this theme when the package is loaded. This is in
addition to a number of other plot tweaks and option settings. To just restore the theme to the
default, you can run set_theme(theme_grey). If you want to restore all plot related settings
and/or prevent them when loading the package, see coursekata_unload_theme.
Usage
theme_coursekata()
Value
A gg theme object
Examples
gf_boxplot(Thumb ~ RaceEthnic, data = Fingers, fill = ~RaceEthnic)
Simulated data for an experiment about smiley faces and tips
Description
These are simulated data that are similar to the TipExperiment data. Hypothetical tables
were randomly assigned to receive checks that either included or did not include a drawing
of a smiley face, either from a male or a female server.
Usage
tip_exp
Format
A data frame with 44 observations on the following 3 variables.
genderWhether the server was
femaleormaleconditionWhether the check had a
smiley faceor not (control)tip_percentThe size of the tip as a percentage of the price of the meal