links() - Incorrect results in some situations.
Resolved.links_af_probabilistic() - Failed in some situations.
Resolved."semi") for the batched
argument in links(). All matches are compared against the
record-set in the next iteration. Therefore, the number of record-pairs
increase exponentially as new matches are found. This means fewer
record-pairs (memory usage) but a longer run time compared to the
"no" option. Conversely, it leads to more record-pairs
(memory usage) but a shorter run time compared to the "yes"
option.batched) in episodes()split) in episodes(). Split
the analysis in N-splits of strata. This leads
to fewer record-pairs (and memory usage) but a longer run time.decode) in
as.data.frame.pid(), as.data.frame.epid() and
as.data.frame.pane()episodes_af_shift(). A more vectorised
approach to episodes() based on
epidm::group_time().links_wf_episodes(). Implantation of
episodes() using links().episodes() and links(). Each
iteration now uses less time and memory.link_id slot in pid objects is now a
list.links() - records with missing values in a
sub_criteria are now skipped at the corresponding
iteration.links()- recursive.
This now takes any of three options
[c("linked", "unlinked", "none")] .
[c("linked", "unlinked")] collectively were previously
[TRUE], while ["none"] was previously
[FALSE].as.epids() now calls make_episodes().window argument in
partitions() is now NULLas.data.frame() and as.data.list() now
only creates elements/fields from non-empty fieldsid and gid slots in
number_line objects are now integer(0) by
default.episode_group(), record_group() and
range_match_legacy() have been removed.["recurisve"] episodes from episodes() are
now presented as ["rolling"] episodes with
reference_event = "all_records" i.e
Old syntax ~ episodes(..., episode_type == "recursive")New syntax ~ episodes(..., episode_type == "rolling", reference_event = "all_records")recursive was TRUE,
links() ended prematurely and therefore missed some
matches. Resolved.recurrence_sub_criteria in episodes() was
not implemented correctly and lead to incorrect linkage result in some
instances. Resolved.overlap_method() - logical tests recycled incorrectly.
Resolved.check_links argument - Option "g"
implemented as option "l". Resolved.make_pairs_wf_source(). Created incorrect pairs.
Resolved.case_sub_criteria and
recurrence_sub_criteria in episodes() led to
incorrect results. Resolved.merge_ids() - shrink and
expand.plot.format.true(). Predefined logical test for use
with sub_criteria().false(). Predefined logical test for use
with sub_criteria().links()- batched. Specify
if all record pairs are created or compared at once ("no")
or in batches ("yes").links()- repeats_allowed.
Specify if record-pairs with duplicate elements should be created.links()-
permutations_allowed. Specify if permutations of the same
record-pair should be created.links()-
ignore_same_source. Specify if record-pairs from different
datasets should be created.
eval_sub_criteria()-
depth. First order of recursion.sets() and make_sets().
Create permutations of record-sets.links() - When shrink is
TRUE, records in a record-group must meet every listed
match criteria and sub_criteria. For example,
if pid_cri is 3, then the record must have meet matched
another on the the first three match criteria.links() - pid@iteration now tracks when a
record was dealt with instead of when it was assigned to a record-group.
For example, a record can be closed (matched or not matched) at
iteration 1 but assigned to a record-group at iteration 5.make_pairs() - x.* and y.*
values in the output are now swapped.sub_criteria can now export any data created by
match_func. To do this, match_func must export
a list, where the first element is a logical object. See an
example below.library(diyar)
val <- rep(month.abb[1:5], 2); val
#> [1] "Jan" "Feb" "Mar" "Apr" "May" "Jan" "Feb" "Mar" "Apr" "May"
match_and_export <- function(x, y){
output <- list(x == y,
data.frame(x_val = x, y_val = y, is_match = x == y))
return(output)
}
sub.cri.1 <- sub_criteria(
val, match_funcs = list(match.export = match_and_export)
)
format(sub.cri.1, show_levels = TRUE)
#> logical_test-{
#> Lv.0.1-match.export(Jan,Feb,Mar ...)
#> }
eval_sub_criteria(sub.cri.1)
#> $logical_test
#> [1] 1 0 0 0 0 1 0 0 0 0
#>
#> $mf.0.1
#> x_val y_val is_match
#> 1 Jan Jan TRUE
#> 2 Feb Jan FALSE
#> 3 Mar Jan FALSE
#> 4 Apr Jan FALSE
#> 5 May Jan FALSE
#> 6 Jan Jan TRUE
#> 7 Feb Jan FALSE
#> 8 Mar Jan FALSE
#> 9 Apr Jan FALSE
#> 10 May Jan FALSElinks can now export any data created within a
sub_criteria. To do this, the sub_criteria
must be created as described above. See an example belowval <- 1:5
diff_one_and_export <- function(x, y){
diff <- x - y
is_match <- diff <= 1
output <- list(is_match,
data.frame(x_val = x, y_val = y, diff = diff, is_match = is_match))
return(output)
}
sub.cri.2 <- sub_criteria(
val, match_funcs = list(diff.export = diff_one_and_export)
)
links(
criteria = "place_holder",
sub_criteria = list("cr1" = sub.cri.2))
#> $pid
#> [1] "P.1 (CRI 001)" "P.1 (CRI 001)" "P.3 (CRI 001)" "P.3 (CRI 001)"
#> [5] "P.5 (No hits)"
#>
#> $export
#> $export$cri.1
#> $export$cri.1$iteration.1
#> $export$cri.1$iteration.1$mf.0.1
#> x_val y_val diff is_match
#> 1 1 1 0 TRUE
#> 2 2 1 1 TRUE
#> 3 3 1 2 FALSE
#> 4 4 1 3 FALSE
#> 5 5 1 4 FALSE
#>
#>
#> $export$cri.1$iteration.2
#> $export$cri.1$iteration.2$mf.0.1
#> x_val y_val diff is_match
#> 1 3 3 0 TRUE
#> 2 4 3 1 TRUE
#> 3 5 3 2 FALSEsummary.epid() - Incorrect count for
‘by episode type’. Resolved.episodes() - Incorrect results in some instances with
skip_order. Resolved.make_ids() - Did not capture all records in that should
be in a record-group when matches are recursive. Resolved.make_pairs() - Incorrect record-pairs in some
instances. Resolved.eval_sub_criteria() - When output of
match_func is length one, it’s not recycled. Resolved.reverse_number_line() - Incorrect results in some
instances. Resolved.links()- Incorrect iteration
(pids slot) for non-matches. Resolved.links() and episodes() - Timing for each
iteration was incorrect. Resolved.overlap_method_names(). Overlap methods
for a corresponding overlap method codes.*with_report options for
display."chain" overlap method split into
"x_chain_y" and "y_chain_x".
"chain" will continue to be supported as a keyword for
"x_chain_y" OR "y_chain_x" method"across" overlap method split into
"x_across_y" and "y_across_x".
"across" will continue to be supported as a keyword for
"x_across_y" OR "y_across_x" methods"inbetween" overlap method split into
"x_inbetween_y" and "y_inbetween_x".
"inbetween" will continue to be supported as a keyword for
"x_inbetween_y" OR "y_inbetween_x" methodsoverlaps().overlap_method_names().make_batch_pairs() (internal) created invalid record
pairs. Resolved.reframe(). Modify the attributes of a
sub_criteria object.link_records(). Record linkage by
creating all record pairs as opposed to batches as with
link().make_pairs(). Create every combination
of records-pairs for a given dataset.make_pairs_wf_source(). Create
records-pairs from different sources only.make_ids(). Convert an edge list to a
group identifier.merge_ids(). Merge two group
identifiers.attrs(). Pass a set of attributes to one
instance of match_funcs or equal_funcs.episodes_wf_splits()episodes() and links(). Reduced
processing times.display argument.
"progress_with_report", "stats_with_report"
and "none_with_report". Creates a d_report; a
status of the analysis over its run time.eval_sub_criteria(). Record-pairs are no longer created
in the function. Therefore, index_record and
sn arguments have been replaced with x_pos and
y_pos.link_records() and
links_wf_probabilistic(). The cmp_threshold
argument has been renamed to attr_threshold.show_labels argument in schema(). Two new
options - "wind_nm" and "length" to replace
"length_label".wind_id list in
episodes(..., data_link = "XX") in . Resolved.link_id in
links(..., recursive = TRUE). Resolved.iteration not recorded in some situations with
episodes(). Resolved.skip_order ends an open episode. Resolved.NA in dist_wind_index and
dist_epid_index when sn is supplied.
Resolved.overlap_method_codes() - overlap method codes not
recycled properly. Resolved.delink(). Unlink identifiers.episodes_wf_splits(). Wrapper function
of episodes(). Better optimised for handling datasets with
many duplicate records.combi(). Numeric codes for unique
combination of vectors.attr_eval(). Recursive evaluation of a
function on each attribute of a sub_criteria.case_nm values - Case_CR and
Recurrence_CR which are Case and
Recurrence without a sub-criteria match.schema.epid.eval_sub_criteria with 1
result.links_wf_probabilistic(). Probabilistic
record linkage.partitions(). Spilt events into sections
in time.schema(). Plot schema diagrams for
pid, epid, pane and
number_line objects.encode() and decode().
Encode and decode slots values to minimise memory usage.episodes() -
case_sub_criteria and recurrence_sub_criteria.
Additional matching conditions for temporal links.episodes()-
case_length_total and recurrence_length_total.
Number of temporal links required for a
window/episode.links() - recursive.
Control if matches can spawn new matches.links() -
check_duplicates. Control the checking of logical tests on
duplicate values. If FALSE, results are recycled for the
duplicates.as.data.frame and as.list S3 methods for
the pid, number_line, epid,
pane objects.episode_type in episodes()
- “recursive”. For recursive episodes where every linked events can be
used as a subsequent index event.recurrence_from_last renamed to
reference_event and given two new options.episodes() and links(). Speed
improvements.epid_interval or
pane_interval with POSIXct objects is now
“GMT”.number_line_sequence() - splits number_line objects.
Also available as a seq method.epid_total, pid_total and
pane_total slots are populated by default. No need to used
group_stats to get these.to_df() - Removed. Use as.data.frame()
instead.to_s4() - Now an internal function. It’s no longer
exported.compress_number_line() - Now an internal function. It’s
no longer exported. Use episodes() instead.sub_criteria() - produces a sub_criteria
object. Nested “AND” and “OR” conditions are now possible.case_overlap_methods,
recurrence_overlap_methods and overlap_methods
now take integer codes for different combinations of
overlap methods. See overlap_methods$options for the full
list. character inputs are still supported."Single-record" was wrong in links summary
output. Resolved.Inf in number_line
objects.case_length or
recurrence_length for the same event.
overlap_methods for the
corresponding case_length and
recurrence_length.links() to replace
record_group().sub_criteria(). The new way of supplying a
sub_criteria in links().exact_match(), range_match()
and range_match_legacy(). Predefined logical tests for use
with sub_criteria(). User-defined tests can also be used.
See ?sub_criteria.custom_sort() for nested sorting.epid_lengths() to show the required
case_length or recurrence_length for an
analyses. Useful in confirming the required case_length or
recurrence_length for episode tracking.epid_windows(). Shows the period a
date will overlap with given a particular
case_length or recurrence_length. Useful in
confirming the required case_length or
recurrence_length for episode tracking.strata in links(). Useful
for stratified data linkage. As in stratified episode tracking, a record
with a missing strata (NA_character_) is
skipped from data linkage.data_links in links().
Unlink record groups that do not include records from certain data
sourceslistr(). Format atomic vectors as a
written list.combns(). An extension of combn to
generate permutations not ordinarily captured by
combn.iteration slot for pid and
epid objectsoverlap_method - reverse()number_line() - l and r must
have the same length or be 1.episodes() - case_nm differentiates
between duplicates of "Case" ("Duplicate_C")
and "Recurrent" events ("Duplicate_R").episodes().
"Case").
episode_type - simultaneously track both
"fixed" and "rolling" episodes.skip_if_b4_lengths - simultaneously track episodes
where events before a cut-off range are both skipped and not
skipped.episode_unit - simultaneously track episodes by
different units of time.case_for_recurrence - simultaneously track
"rolling" episodes with and without an additional case
window for recurrent events.recurrence_from_last - simultaneously track
"rolling" episodes with reference windows calculated from
the first and last event of the previous window.strata. Options must be the
same in each strata.
from_last - simultaneously track episodes in both
directions of time - past to present and present to past.episodes_max - simultaneously track different number of
episodes within the dataset.include_overlap_method - "overlap" and
"none" will not be combined with other methods.
"overlap" - mutually inclusive with the other methods,
so their inclusion is not necessary."none" - mutually exclusive and prioritised over the
other methods (including "none"), so their inclusion is not
necessary.NA_real_)
or periods (number_line(NA_real_, NA_real_))
case_length and recurrence_length. This
ensures that the event does not become an index case however, it can
still be part of different episode. For reference, an event with a
missing strata (NA_character_) ensures that
the event does not become an index case nor part of any episode.fixed_episodes, rolling_episodes and
episode_group - include_index_period didn’t
work in certain situations. Corrected.fixed_episodes, rolling_episodes and
episode_group - dist_from_wind was wrong in
certain situations. Corrected.record_group() - strata.
Perform record linkage separately within subsets of a dataset.overlap(),
compress_number_line(), fixed_sepisodes(),
rolling_episodes() and episode_group() -
overlap_methods and methods. Replaces
overlap_method and method respectively. Use
different sets of methods within the same dataset when grouping episodes
or collapsing number_line objects.
overlap_method and method only permits 1
method per per dataset.epid objects - win_nm. Shows
the type of window each event belongs to i.e. case or recurrence
windowepid objects - win_id. Unique
ID for each window. The ID is the sn of the reference event
for each window
epid objects updated to reflect thisepid objects - dist_from_wind.
Shows the duration of each event from its window’s reference eventepid objects - dist_from_epid.
Shows the duration of each event from its episode’s reference eventepisode_group() and
rolling_episodes() - recurrence_from_last.
Determine if reference events should be the first or last event from the
previous window.episode_group() and
rolling_episodes() - case_for_recurrence.
Determine if recurrent events should have their own case windows or
not.episode_group(),
fixed_episodes() and rolling_episodes() -
data_links. Unlink episodes that do not include records
from certain data_source(s).episode_group(), fixed_episodes() and
rolling_episodes() - case_length and
recurrence_length arguments. You can now use a range
(number_line object).episode_group(),
fixed_episodes() and rolling_episodes() -
include_index_period. If TRUE, overlaps with
the index event or period are grouped together even if they are outside
the cut-off range (case_length or
recurrence_length).pid objects - link_id. Shows
the record (sn slot) to which every record in the dataset
has matched to.invert_number_line(). Invert the
left and/or right points to the opposite end
of the number lineleft_point(x)<-,
right_point(x)<-, start_point(x)<- and
end_point(x)<-overlap() renamed to overlaps().
overlap() is now a convenience overlap_method
to capture ANY kind of overlap."none" is another convenience
overlap_method for NO kind of overlapexpand_number_line() - new options for
point; "left" and "right"compress_number_line() - compressed
number_line object inherits the direction of the widest
number_line among overlapping group of
number_line objectsoverlap_methods - have been changed such that each pair
of number_line objects can only overlap in one way. E.g.
"chain" and "aligns_end" used to be
possible but this is now considered a "chain" overlap
only"aligns_start" and "aligns_end" use to be
possible but this is now considered an "exact" overlapnumber_line_sequence() - Output is now a
list.number_line_sequence() - now works across multiple
number_line objects.to_df() - can now change number_line
objects to data.frames.
to_s4() can do the reverse.epid objects are the default outputs for
fixed_episodes(), rolling_episodes() and
episode_group()pid objects are the default outputs for
record_group()case_nm for events that were
skipped due to rolls_max or episodes_max is
now "Skipped".episode_group() and record_group(),
sn can be negative numbers but must still be uniqueepisode_group() and
record_group(). Runs just a little bit faster …x and y to
have the same lengths in overlap functions.
episode_group - case_length and
recurrence_length arguments. Now accepts negative numbers.
end_point() of the first
period.
number_line_width(), both will be collapsed if the second
one is within some days (or any other episode_unit) before
the start_point() of the first period.case_nm wasn’t right for rolling episodes.
Resolvedepisode_group(), fixed_episodes() and
rolling_episodes() - optimized to take less time when
working with large datasetsepisode_group(), fixed_episodes() and
rolling_episodes() - date argument now
supports numeric valuescompress_number_line() - the output (gid
slot) is now a group identifier just like in epid objects
(epid_interval)pid S4 object class for results of
record_group(). This will replace the current default
(data.frame) in the next major releaseepid S4 object class for results of
episode_group(), fixed_episodes() and
rolling_episodes(). This will replace the current default
(data.frame) in the next releaseto_s4() and to_s4 argument in
record_group(), episode_group(),
fixed_episodes() and rolling_episodes().
Changes their output from a data.frame (current default) to
epid or pid objectsto_df() changes epid or pid
objects to a data.framededuplicate argument from fixed_episodes()
and rolling_episodes() added to
episode_group()fixed_episodes() and rolling_episodes()
are now wrapper functions of episode_group(). Functionality
remains the same but now includes all arguments available to
episode_group()fixed_episodes() and
rolling_episodes() from number_line to
data.frame, pending the change to epid
objectspid_cri column returned in record_group is
now numeric. 0 indicates no match.criteria multiple times
record_group()number_line objects can now be used as a
criteria in record_group()episode_unit in
episode_group()bi_direction in
episode_group()fixed_episodes() and rolling_episodes() -
Group records into fixed or rolling episodes of events or period of
events.episode_group() - A more comprehensive implementation
of fixed_episodes() and rolling_episodes(),
with additional features such as user defined case assignment.record_group() - Multistage deterministic linkage that
addresses missing data.number_line S4 object.
record_group()fixed_episodes(), rolling_episodes() and
episode_group()fixed_episodes() and
rolling_episodes()