cheapr 1.5.0

cheapr 1.5.0 supersedes cheapr 1.4.0

The C/C++ API is currently under development and a stable release can be expected for cheapr 2.0.0

Note: cheapr 2.0.0 will require C++20

General news

Thanks @ChampLeeTX for spotting an issue where the package wasn’t installing for older versions of R. This is now fixed.
A new rich set of parallelised math functions like abs_, round_ and more.
New functions for setting and getting the number of threads being used.
New multi-threaded vector initialisers like new_integer and new_double.
Plain list vectors are now never regarded as NA even if they contain NA elements.
Parallelisation is internally used more frequently for operations like creating new vectors, combining vectors, filling, copying and replacing data. The default has been changed to 2. To use a different number of threads simply use set_threads()
scm now better detects when integer (32-bit and 64-bit) overflow occurs and switches to using doubles internally, returning a double value.
if_else_ can now handle data frames and is fully SIMD parallelised.

New features

The C/C++ API has been re-written to use pure R C API code internally.
New function unique_ as a cheaper alternative to unique.
New function replace_, a cheaper alternative to [<- for fast value replacement.
New functions cast, cast_common, archetype, archetype_common, r_type and r_type_common to help with fast type-stable coercion.
paste_ as a fast alternative to base::paste.
New function na_init to help with fast initialisation of vectors based off a template vector.
seq_ gains a new argument size to allow for specifying sequences of exact sizes. Any combination of from, to, by and size can be specified and are vectorised.
Added argument as_list to seq_ and sequence_ to allow the result to be returned as a list of sequences instead of a vector of combined sequences.
New vectorised functions seq_start, seq_end and seq_increment to help with sequence generation.
New function is_whole_number to check that a numeric vector consists only of whole numbers.
New function switch_args to give more flexibility to developers creating functions that accept the ... argument.
Exported cpp_rebuild, a low-level convenience function for rebuilding attributes from a template.
New rebuild methods for tbl_df and sf objects.

Bug fixes

Fixed a bug in case that would return incorrect results when the length of the RHS was greater than 1.
Removed all non-API C entry points.

Changes and Improvements

sset is no longer an S3 generic and now internally dispatches on the correct method. One can still define a subset method for custom objects via [ which sset falls back on when it can’t find an appropriate method.
Combining vectors via c_ has been internally made simpler, faster and utilises type-stable common-casting.
if_else_ has been re-written mostly in C/C++ and should be faster.
as_df has been re-written in C/C++ and now has much lower overhead.
The .args argument now accepts any list, even classed ones.
rebuild.data.frame and rebuild.data.table will now return a data frame with class 'data.frame' and c('data.table', 'data.frame') respectively instead of returning a data frame with the class of the template data frame.
Added aliases for all functions named ‘cheapr_’ to follow the convention of fn_() where fn() is the common function they are replacing or improving. The current alternatives beginning with ‘cheapr_’ will likely not be deprecated for a long while.
sequence_ will now recycle the size argument additionally to recycling from and by.
Subsetting sf objects with sset now internally calls [.

Breaking changes

fastplyr versions 0.9.91 and later must depend on cheapr version 1.4.0 and later. This means that you can’t for example install both fastplyr 0.9.9 and cheapr 1.3.2
Matrices and arrays are explicitly converted to vectors when using data frames via new_df and other data frame constructors. Since the internal structure of a matrix is a vector with dimension attributes it is much more efficient to treat it as a vector. To work with rectangular data in cheapr it is advised to use data frames.
enframe_(), deframe_() and cut_numeric() have been removed as they have been deprecated for a while.

cheapr 1.3.2

Miscellaneous

Subsetting sees substantial speed improvements.

Bug fixes

Fixed an issue where negative-subscripting a data frame with sset_col() would crash R.

cheapr 1.3.1

Bug fixes

Fixed a bug where recycle sometimes modified lists in-place.
Fixed a rare issue that could in theory occur where -2147483648 would be recognised as a representable integer in R.

Changes

attrs_add and attrs_rm have been renamed to attrs_modify and attrs_clear respectively to convey their intent more clearly.
as_df now always returns a plain data.frame with only 3 attributes, ‘names’, ‘row.names’ and ‘class’.

New features

df_modify as a fast way to modify and add columns to a data frame.

cheapr 1.3.0

Bug fixes

Fixed errors generated by rchk.

Breaking changes

reconstruct has been renamed to rebuild.

New features

Fast function counts to generate counts of unique values.
str_coalesce as a vectorised way of finding first non-empty string.
New sset method for fast subsetting integer64 vectors.

cheapr 1.2.0

This version of cheapr sees many speed improvements and new features.

New functions

list_as_df for fast converting lists into data frames.
attrs_add and attrs_rm to allow for adding and removing attributes, both normally and in-place.
shallow_copy, semi_copy and deep_copy for shallow and full copies of R objects.
address to retrieve the memory address of an R object.
cheapr_rep, cheapr_rep_len and cheapr_rep_each to repeat out vectors efficiently.
cheapr_c to concatenate vectors and data frame rows together fast and safely.
reconstruct as a method to allow users to write methods to restore objects using a template. Currently only data frames are being reconstructed.
col_c as a way to combine data frames cols and vectors into a data frame.
list_combine as a way to combine elements from multiple lists into a single list.
list_assign for fast assigning multiple elements to a list.
list_drop_null to quickly drop NULL elements.
More exported C functions.

Changes

new_df has been mostly re-written in C and now recycles and repairs names by default. It also doesn’t deparse expressions into strings where names don’t exist and now simply replaces those object names with col_i where i is the ith column containing that object.

New features

.args has been added in many places as an alternative to the dots argument ... It allows you to supply a list of objects instead of supplying them in the usual way. This can be useful when you already have a list and want to pass them as arguments very efficiently. For example, cheapr_c(.args = list(x, y)) is equivalent to do.call(cheapr_c, list(x, y)) and cheapr_c(x, y).

Deprecations

The keep_attrs arg of sset_df has been removed. Use the internal cpp_reconstruct or a custom solution to keep all attributes of x.

cheapr 1.1.0

Fixed CRAN notes on deprecated C functions.
Data frame subsetting is now faster and internally simpler. Data frame negative subsetting is now cheaper.
New functions sset_df, sset_row and sset_col for row slicing and col selecting with minimal overhead (~ 1 microsecond).
Some C functions have been exported and can be used in native C/C++ code.

cheapr 1.0.1

R version 4.0.0 now required as also required by cpp11.
New function int_sign for integer-based signs.
overview now prints slightly differently. Specifically ‘class’ is not printed. Time series overviews now return and print the growth rate.
cheapr’s internal cpp_list_as_df now always returns a data frame with names, even when supplied an empty list.

cheapr 1.0.0

New functions as_df, fast_df and cheapr_table.
count_val, unused_levels and used_levels have been removed and replaced by val_count, levels_unused and levels_used respectively.
Deprecated cut_numeric, enframe_ and deframe_.

cheapr 0.9.92

Fixed additional issues flagged by R checks.
Capture of ... in case and val_match has been improved.
val_match safety checks are slightly improved.
get_breaks has been re-written in C and the algorithm has been improved simultaneously to reduce floating point error.
The result of get_breaks now matches the breaks generated by cut for vectors with zero-range.
val_rm and na_rm have been sped-up.

cheapr 0.9.91

New functions cheapr_if_else, case and val_match to make vectorised if-else operations much cheaper.
New function with_local_seed to help run reproducible expressions with a local seed to remove the need for setting a seed globally, especially helpful for small expressions and comparisons without affecting the global RNG state.
Various internal bug fixes related to the scalar functions.
Fixed a regression where NULL elements were not being correctly dropped in new_df().
New factor functions levels_rename, levels_add, levels_rm, levels_lump and levels_count.
overview cols are abbreviated to save visual space and histograms are printed by default.
levels_drop was not working correctly and has been fixed.

cheapr 0.9.9

New functions cheapr_var and cheapr_rev.
get_breaks has been improved and a few small bugs have been fixed.
as_discrete gains a new argument inf_label.
Safety improvements to as_discrete.
Removed internal C++ functions as package installation was failing for some machines.

cheapr 0.9.8 (02-Oct-2024)

New scalar functions have been added and some renamed. Most are now prefixed with ‘val_’ or ‘na_’ in the case of NA specific scalar functions.
New cheap functions for binning continuous data into discrete bins. These include get_breaks, as_discrete and bin. get_breaks finds ‘pretty’ break-points of numeric data very quickly. as_discrete converts numeric data to discrete categories as a factor. bin is a low-level function for binning numeric data into the correct bins. It can also efficiently return the corresponding break values instead of the break indices through codes = FALSE.
New function na_insert to randomly insert NA values into a vector.
New function vector_length as a hybrid between length and nrow.
gcd and scm now make use of 64-bit integers internally and can accept ‘integer64’ objects. scm used to return NA once the 32-bit integer limit of 2^31 - 1 was reached if the input was an integer vector. This has now been increased to the 64-bit integer limit, which is approximately 9.223372e+18 and errors if that limit is exceeded.
‘integer64’ objects are now lightly supported. They are not supported in any sequence functions or in the ‘set_math’ functions.
New functions new_df and named_list.
All factor levels utilities now begin with the prefix ‘levels_’.
New cheap factor functions as_factor, levels_add_na, levels_drop_na, levels_drop and levels_reorder.
lag_ now uses memmove where possible.
Fixed an issue where lag_(x) was materialising x twice if x was an ALTREP integer sequence.

cheapr 0.9.3 (29-Jul-2024)

Range based subsetting, e.g. sset(x, 1:10) should now be faster as memmove is used where possible.
New functions val_count and which_val for common scalar operations.
Some functions gain a ‘names’ argument.
Replaced calls to STRING_PTR with STRING_PTR_RO to satisfy R package check results.
lag_ should now be somewhat faster.
Fixed a small bug in lag2_ that would produce incorrect results when supplying a vector of lags and an order vector.

cheapr 0.9.2 (11-May-2024)

A signed integer overflow bug in lag2_ has been fixed. This occurred when supplying NA lags.
lag2_ no longer fills the names of named vectors when the fill value is supplied.

cheapr 0.9.1 (05-May-2024)

New function recycle to help recycle R objects to a common size.
The set functions that update by reference are now ALTREP aware and take a copy when the input is an ALTREP object.
New function lag2_ as a generalised solution for complex lags. It supports dynamic lag vectors, lags using an order vector, and custom run lengths. It doesn’t support updating by reference or long vectors.

cheapr 0.9.0 (22-Apr-2024)

New function lag_ for very fast lags and leads on vectors and data frames. It includes a set argument allowing users to create a lagged vector by reference without copies.
set_round has been amended to improve floating point accuracy.

cheapr 0.8.0 (12-Apr-2024)

New ‘set’ Math operations inspired by ‘data.table’ and ‘collapse’ that transform data by reference.
Fixed an inconsistency of when sequence_() would error when supplied with a zero-length size argument.
Fixed a protection stack imbalance in count_val(x) when x is NULL.
sset has been optimised for wide data frames with many variables. It is also faster when applied to a data frame with dates, date-times and factors.
In sset, when i is a logical vector it must match the length of x.
sset can now handle ‘ALTREP’ compact real sequences as well.

cheapr 0.5.0 (5-Apr-2024)

sset is now parallelised when i is an ‘ALTREP’ compact integer sequence, e.g. sset(x, 1:10).
sset now has an internal range-based subset method for ‘ALTREP’ integer sequences made using : for example.
New function count_val as a cheaper alternative to e.g. sum(x == val).
Negative indexing in sset has been improved. It is also now partially parallelised.
Setting recursive to false should now be faster.
‘overview’ objects gain an additional list element “print_digits” which is passed to the print method in order to correctly round the summary statistics without affecting the ‘cheapr.digits’ option globally.
factor_ and na_rm now handle data frames.
A bug in sset.data.table that caused further set calculations to produce warnings has been fixed.
is_na.POSIXlt and sset.POSIXlt have been rewritten to handle unbalanced ‘POSIXlt’ objects.

cheapr 0.4.0 (25-Mar-2024)

New function sset to consistently subset data frame rows and vectors in general.
overview now always returns an object of class “overview”. It also returns the number of observations instead of rows so that it makes sense for vector summaries as well as data frame summaries.
sequence_ has been optimised and rewritten in C++. It now only checks for integer overflow when both from and by are integer vectors.
The internal function list_as_df has been rewritten in C++.

cheapr 0.3.0 (18-Mar-2024)

New function overview as a cheaper alternative to summary.
All of the NA handling functions now fall back to using is.na if an appropriate method cannot be found.
More support has been added for all objects with an is.na method.

cheapr 0.2.0 (06-Mar-2024)

is_na has been added as an S3 generic function which is parallelised and internally falls back on is.na if there are no suitable methods.
Additional list utility functions have been added.
Limited support for vctrs_rcrd objects has been added again.
num_na and similar functions no longer treat empty data frame rows as single observations but instead return the total number of NA values in the data frame.
Fixed a bug in row_na_counts and col_na_counts that would cause the session to crash when a column variable was a list.
For the time being, vctrs ‘vctrs_rcrd’ objects are no longer supported though this support may be re-added in the future.

cheapr 0.1.0 (05-Mar-2024)

CRAN submission accepted.