sset_col() would crash R.Fixed a bug where recycle sometimes modified lists
in-place.
Fixed a rare issue that could in theory occur where -2147483648 would be recognised as a representable integer in R.
attrs_add and attrs_rm have been
renamed to attrs_modify and attrs_clear
respectively to convey their intent more clearly.
as_df now always returns a plain
data.frame with only 3 attributes, ‘names’, ‘row.names’ and
‘class’.
df_modify as a fast way to modify and add columns to a
data frame.reconstruct has been renamed to
rebuild.Fast function counts to generate counts of unique
values.
str_coalesce as a vectorised way of finding first
non-empty string.
New sset method for fast subsetting integer64 vectors.
This version of cheapr sees many speed improvements and new features.
list_as_df for fast converting lists into data
frames.
attrs_add and attrs_rm to allow for
adding and removing attributes, both normally and in-place.
shallow_copy, semi_copy and
deep_copy for shallow and full copies of R
objects.
address to retrieve the memory address of an R
object.
cheapr_rep, cheapr_rep_len and
cheapr_rep_each to repeat out vectors efficiently.
cheapr_c to concatenate vectors and data frame rows
together fast and safely.
reconstruct as a method to allow users to write
methods to restore objects using a template. Currently only data frames
are being reconstructed.
col_c as a way to combine data frames cols and
vectors into a data frame.
list_combine as a way to combine elements from
multiple lists into a single list.
list_assign for fast assigning multiple elements to
a list.
list_drop_null to quickly drop NULL
elements.
More exported C functions.
new_df has been mostly re-written in C and now recycles
and repairs names by default. It also doesn’t deparse expressions into
strings where names don’t exist and now simply replaces those object
names with col_i where i is the ith column containing that
object..args has been added in many places as an alternative
to the dots argument ... It allows you to supply a list of
objects instead of supplying them in the usual way. This can be useful
when you already have a list and want to pass them as arguments very
efficiently. For example, cheapr_c(.args = list(x, y)) is
equivalent to do.call(cheapr_c, list(x, y)) and
cheapr_c(x, y).keep_attrs arg of sset_df has been
removed. Use the internal cpp_reconstruct or a custom
solution to keep all attributes of x.Fixed CRAN notes on deprecated C functions.
Data frame subsetting is now faster and internally simpler. Data frame negative subsetting is now cheaper.
New functions sset_df, sset_row and
sset_col for row slicing and col selecting with minimal
overhead (~ 1 microsecond).
Some C functions have been exported and can be used in native C/C++ code.
R version 4.0.0 now required as also required by cpp11.
New function int_sign for integer-based
signs.
overview now prints slightly differently.
Specifically ‘class’ is not printed. Time series overviews now return
and print the growth rate.
cheapr’s internal cpp_list_as_df now always returns
a data frame with names, even when supplied an empty list.
New functions as_df, fast_df and
cheapr_table.
count_val, unused_levels and
used_levels have been removed and replaced by
val_count, levels_unused and
levels_used respectively.
Deprecated cut_numeric, enframe_ and
deframe_.
Fixed additional issues flagged by R checks.
Capture of ... in case and
val_match has been improved.
val_match safety checks are slightly
improved.
get_breaks has been re-written in C and the
algorithm has been improved simultaneously to reduce floating point
error.
The result of get_breaks now matches the breaks
generated by cut for vectors with zero-range.
val_rm and na_rm have been
sped-up.
New functions cheapr_if_else, case and
val_match to make vectorised if-else operations much
cheaper.
New function with_local_seed to help run
reproducible expressions with a local seed to remove the need for
setting a seed globally, especially helpful for small expressions and
comparisons without affecting the global RNG state.
Various internal bug fixes related to the scalar functions.
Fixed a regression where NULL elements were not
being correctly dropped in new_df().
New factor functions levels_rename,
levels_add, levels_rm,
levels_lump and levels_count.
overview cols are abbreviated to save visual space
and histograms are printed by default.
levels_drop was not working correctly and has been
fixed.
New functions cheapr_var and
cheapr_rev.
get_breaks has been improved and a few small bugs
have been fixed.
as_discrete gains a new argument
inf_label.
Safety improvements to as_discrete.
Removed internal C++ functions as package installation was failing for some machines.
New scalar functions have been added and some renamed. Most are
now prefixed with ‘val_’ or ‘na_’ in the case of NA
specific scalar functions.
New cheap functions for binning continuous data into discrete
bins. These include get_breaks, as_discrete
and bin. get_breaks finds ‘pretty’
break-points of numeric data very quickly. as_discrete
converts numeric data to discrete categories as a factor.
bin is a low-level function for binning numeric data into
the correct bins. It can also efficiently return the corresponding break
values instead of the break indices through
codes = FALSE.
New function na_insert to randomly insert
NA values into a vector.
New function vector_length as a hybrid between
length and nrow.
gcd and scm now make use of 64-bit
integers internally and can accept ‘integer64’ objects. scm
used to return NA once the 32-bit integer limit of 2^31 - 1
was reached if the input was an integer vector. This has now been
increased to the 64-bit integer limit, which is approximately
9.223372e+18 and errors if that limit is exceeded.
‘integer64’ objects are now lightly supported. They are not supported in any sequence functions or in the ‘set_math’ functions.
New functions new_df and
named_list.
All factor levels utilities now begin with the prefix ‘levels_’.
New cheap factor functions as_factor,
levels_add_na, levels_drop_na,
levels_drop and levels_reorder.
lag_ now uses memmove where
possible.
Fixed an issue where lag_(x) was materialising x
twice if x was an ALTREP integer sequence.
Range based subsetting, e.g. sset(x, 1:10) should
now be faster as memmove is used where possible.
New functions val_count and which_val
for common scalar operations.
Some functions gain a ‘names’ argument.
Replaced calls to STRING_PTR with
STRING_PTR_RO to satisfy R package check results.
lag_ should now be somewhat faster.
Fixed a small bug in lag2_ that would produce
incorrect results when supplying a vector of lags and an order
vector.
A signed integer overflow bug in lag2_ has been
fixed. This occurred when supplying NA lags.
lag2_ no longer fills the names of named vectors
when the fill value is supplied.
New function recycle to help recycle R objects to a
common size.
The set functions that update by reference are now
ALTREP aware and take a copy when the input is an ALTREP
object.
New function lag2_ as a generalised solution for
complex lags. It supports dynamic lag vectors, lags using an order
vector, and custom run lengths. It doesn’t support updating by reference
or long vectors.
New function lag_ for very fast lags and leads on
vectors and data frames. It includes a set argument
allowing users to create a lagged vector by reference without
copies.
set_round has been amended to improve floating point
accuracy.
New ‘set’ Math operations inspired by ‘data.table’ and ‘collapse’ that transform data by reference.
Fixed an inconsistency of when sequence_() would
error when supplied with a zero-length size argument.
Fixed a protection stack imbalance in count_val(x)
when x is NULL.
sset has been optimised for wide data frames with
many variables. It is also faster when applied to a data frame with
dates, date-times and factors.
In sset, when i is a logical vector it
must match the length of x.
sset can now handle ‘ALTREP’ compact real sequences
as well.
sset is now parallelised when i is an
‘ALTREP’ compact integer sequence,
e.g. sset(x, 1:10).
sset now has an internal range-based subset method
for ‘ALTREP’ integer sequences made using : for
example.
New function count_val as a cheaper alternative to
e.g. sum(x == val).
Negative indexing in sset has been improved. It is
also now partially parallelised.
Setting recursive to false should now be
faster.
‘overview’ objects gain an additional list element “print_digits” which is passed to the print method in order to correctly round the summary statistics without affecting the ‘cheapr.digits’ option globally.
factor_ and na_rm now handle data
frames.
A bug in sset.data.table that caused further set
calculations to produce warnings has been fixed.
is_na.POSIXlt and sset.POSIXlt have
been rewritten to handle unbalanced ‘POSIXlt’ objects.
New function sset to consistently subset data frame
rows and vectors in general.
overview now always returns an object of class
“overview”. It also returns the number of observations instead of rows
so that it makes sense for vector summaries as well as data frame
summaries.
sequence_ has been optimised and rewritten in C++.
It now only checks for integer overflow when both from and
by are integer vectors.
The internal function list_as_df has been rewritten
in C++.
New function overview as a cheaper alternative to
summary.
All of the NA handling functions now fall back to
using is.na if an appropriate method cannot be
found.
More support has been added for all objects with an
is.na method.
is_na has been added as an S3 generic function which
is parallelised and internally falls back on is.na if there
are no suitable methods.
Additional list utility functions have been added.
Limited support for vctrs_rcrd objects has been
added again.
num_na and similar functions no longer treat empty
data frame rows as single observations but instead return the total
number of NA values in the data frame.
Fixed a bug in row_na_counts and
col_na_counts that would cause the session to crash when a
column variable was a list.
For the time being, vctrs ‘vctrs_rcrd’ objects are no longer supported though this support may be re-added in the future.