donttestmock_(class name), e.g., mock_data_list and
mock_ext_solutions_dfauto_plot output data frame doesn’t duplicate cluster
columnext_solutions_df no longer loses
sim_mats_list attributedonttest rather than commented outobservations(), summary_features(),
features(), uids() marked as internalrbind for classes solutions_df and
ext_solutions_df not preserving the class type of the
contained weights_matrixsolutions_df or ext_solutions_df
restricts output to 10 line max by defaultsolution column in mc_manhattan_plot()
when extended solutions data frame has no MC labelsweights matrixmerge.data_list()as.list() for dist_fns_list,
clust_fns_list, and data_list objectsgenerate_settings_matrix needed
paste0print.solutions_df() misprinted the number of
observations in the solutions data framemerge_dls() is superseded by
merge.data_lists()ext_solutions_df manipulation won’t drop
summary_features and features attributesestimate_nclust_given_graph has more resiliency to
floating point errors through tryCatch statement during eigengap quality
assignmentestimate_nclust_given_graph has more resiliency
to floating point errors through tryCatch loop updating eigenvalue
scalingdplyr_row_slice() functions for
classes solutions_df and ext_solutions_dfextend_solutions()extend_solutions was not assigning feature types
properly during p-value calculationsrbind.ext_solutions_df now takes ...
parameter before reset_indices parameter to avoid error
during calls with unnamed parameters.rbind.solutions_df now takes ... parameter
before reset_indices parameter to avoid error during call
without named parameters.snf_config object made weights matrix lose its
classlist) -> (class
data_list, list)data.frame) -> solutions
data frame (class solutions_df,
data.frame)data.frame) ->
extended solutions data frame (class ext_solutions_df,
data.frame)data.frame) -> (class ext_solutions_df,
data.frame)list) -> distance
functions list (class dist_fns_list,
list)list) ->
clustering functions list (class clust_fns_list,
list)matrix, array) ->
(class weights_matrix, matrix,
array)generate_data_list() ->
data_list()get_cluster_df(),
get_clusters(), get_cluster_solutions()) now
all superseded by custom transposition of solutions_df
class objects (i.e., simply call t())generate_settings_matrix(),
generate_distance_metrics_list(),
generate_weights_matrix(),
generate_clust_algs_list()) now all superseded by single
function snf_config() and the snf_config class
object it producessplit_vector, either by
adjusted_rand_index_heatmap() or
shiny_annotator(), solutions_df and
ext_solutions_df class objects can be annotated with their
meta cluster labels using the function
label_meta_clusters(). This is necessary prior to usage of
get_representative_solutions().as.data.frame()batch_snf no longer changes the output structure from a
solutions data frame to a list of a solutions data frame and a
similarity matrix list. Instead, the similarity matrix list is added to
the solutions data frame as an attribute and can be extracted using the
function sim_mats_list().calculate_coclustering() functionprint() functions have been defined for all
major metasnf objects.Last update before CRAN submission.
set.seed prior to
generate_settings_matrix instead.estimate_nclust_given_graph() occasionally
yielded incorrect number of cluster estimates as a result of improper
scaling in metasnf v0.7.0. The scaling should be corrected now.mc_manhattan_plot() with a
data list containing duplicate feature namesmc_manhattan_plot() parameter rep_solution
replaced with more accurate name extended_solutions_matrix
(solutions matrix with _pval columns)SNFtool::estimateNumberOfClustersGivenGraph() could
occasionally error out on the basis of calculating eigenvectors
(eigengap heuristic) for a Laplacian with floating point values that
were too small. Adapted function
estimate_nclust_given_graph() slightly scales up Laplacian
to reduce the risk of encountering this error (presumably without any
change to resulting cluster number estimate)get_matrix_order has arguments allowing users to
control which distance metric and agglomerative hierarchical clustering
methods are used to sort matricesget_complete_uids quickly pulls UIDs of observations
with complete data from a list of dataframesextend_solutions doesn’t crash on multi-feature target
listsgenerate_data_list()remove_missing parameter for
generate_data_list allowing subjects with incomplete data
to remain in the data listlp_solutions_matrix error message when
training set is not subset of full data listgenerate_data_list list elements now are named after
their componentsmerge_data_lists functionality to horizontally
merge data listsextend_solutions() will no longer crash when a
data_list has the UID column in non-first position.generate_data_list() enforces the UID column to be in
first position of each dataframe.auto_plot() will automatically generate bar and/or
jitter plots showing how features in a data_list/target_list are
distributed across a single cluster solutionshiny_annotator() function can be used to identify
indices of meta clusters within an
adjusted_rand_index_heatmapadjusted_rand_index_heatmap() now has a
split_vector parameter that will slice a heatmap into meta
clustersrename_dl() can be used to rename features in a
data_listmanhattan_plot has been split into
var_manhattan_plot (key variable - all variables),
esm_manhattan_plot (cluster solutions in an extended
solutions matrix to all variables), and mc_manhattan_plot
(like esm_manhattan_plot, but at the meta-cluster
level)get_representative_solutions extracts max-ARI solutions
from an extended solutions matrix based on a split_vector
containing meta cluster boundariesbatch_nmi calculates NMI scores (see
https://branchlab.github.io/metasnf/articles/nmi_scores.html)extend_solutions will only calculate p-value summary
measures (min/max/mean) for data_list passed in as a
target_list parameter, but will also accept and calculate
p-values for a data_list passed in through the data_list
parameteradjusted_rand_index_heatmap and
assoc_pval_heatmap have updated parameters to improve ease
of use and flexibility (including easier colour control)get_clustered_subs has been removed (does the same
thing as get_cluster_df)get_cluster_pval deprecated for
calc_assoc_pvalgenerate_data_list()
and its corresponding functionsremove_signal has been renamed to
linear_adjust to better reflect its functionsummarize_distance_metrics_list has been shortened to
summarize_dmlcorrelation_pval_heatmap has been renamed to
assoc_pval_heatmapcalc_om_aris has been renamed to
calc_arisextend_solutions p-value calculation
warnings are now suppressed_pval instead of a mix of p_val,
pval, and p.pval_select,
p_val_select, top_oms_per_cluster,
check_subj_orders_for_lp, get_p,
chi_sq_pval,pval_summaries, which would calculate
min/max/mean p-values, has been replaced with
summarize_pvalstrain_test_assign now provides results as named list of
subject vectors instead of a data.frame. keep_split
function has been removed accordingly.sort_subjects parameter added to
generate_data_list to allow for sorting of subjects in the
data_listextend_solutions can now also be parallelized (see
?extend_solutions)remove_signal function has sig_digs
parameter that can be used to restrict how many significant figures are
returned in the resulting residualscalc_om_aris is now MUCH faster after removing
excessive calls to as.numeric and enabling parallel
processing with future.apply. Thanks for the idea,
Alper.extend_solutions to better handle
extreme p-values (e.g. infinity)p_val_select with
pval_select which can also return negative-log
p-valuesgenerate_data_list correctly errors when components are
only partially named (resolves
https://github.com/BRANCHlab/metasnf/issues/10)lp_row function has been replaced by
lp_solutions_matrix. The new function is order agnostic:
full data lists can be constructed without any restriction on how
training and testing set subjects are sorted. Subjects present in the
provided solutions matrix to propagate are assumed to be the training
subjects.calc_om_aris now has progress parameter.
When set to true and used in conjunction with
progressr::with_progress(), a progress bar is shown for the
calculations. Learn more with ?calc_om_aris.grepl instead of grep used in
extend_solutions to reduce errors when no chi-squared
warning occurskeep_split will preserve observations who were assigned
a split but were not present in the dataframe being split. Instead of
being removed, those observations will have NA values.fraction_clustered_together crashing when a
cluster was assigned to only a single observationfraction_clustered_together not running due to
bracket typo when evaluating length of the data_listcorrelation_pval_heatmap function can have significance
stars disabled with significance_stars parameterestimateNumberOfClustersGivenGraph has been used up to this
point without specifying a parameter for NUMC.
Consequently, final similarity matrices clustered with the default
methods (spectral clustering based on eigen-gap or rotation cost
heuristics) were not capable of resulting in more than 5 clusters. The
default functions have been updated to span 2 clusters to 10 clusters.
Users will likely see different clustering results as a result of this
change. To replicate the behaviour of default spectral clustering prior
to v0.3.0, users should copy the following code prior to the batch_snf
command:clust_algs_list <- generate_clust_algs_list(
"spectral_eigen" = spectral_eigen_classic,
"spectral_rot" = spectral_rot_classic
)
# Adapt below as necessary
solutions_matrix <- batch_snf(
data_list,
settings_matrix,
clust_algs_list = clust_algs_list
)
fisher_exact_pval
function to avoid “FEXACT” error (like here
https://github.com/Lagkouvardos/Rhea/issues/17). Impact on results is
expected to be negligible.remove_signal() enables correcting a data_list
linearly for confounders / unwanted signal. Vignette is available: https://branchlab.github.io/metasnf/articles/confounders.html.batch_snf() has new parameter
automatic_standard_normalize to switch out the default
numeric distance measures (euclidean) with standard normalized
variants.NEWS.md file to track changes to the
package.