% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/find_optimal_n.R
\name{find_optimal_n}
\alias{find_optimal_n}
\title{Search for an optimal number of clusters in a list of bioregionalizations}
\usage{
find_optimal_n(
  bioregionalizations,
  metrics_to_use = "all",
  criterion = "elbow",
  step_quantile = 0.99,
  step_levels = NULL,
  step_round_above = TRUE,
  metric_cutoffs = c(0.5, 0.75, 0.9, 0.95, 0.99, 0.999),
  n_breakpoints = 1,
  plot = TRUE,
  verbose = TRUE
)
}
\arguments{
\item{bioregionalizations}{A \code{bioregion.bioregionalization.metrics} object
(output from
\code{\link[=bioregionalization_metrics]{bioregionalization_metrics()}}) or a \code{data.frame} with the first two
columns named \code{K} (bioregionalization name) and \code{n_clusters} (number of clusters),
followed by columns with numeric evaluation metrics.}

\item{metrics_to_use}{A \code{character} vector or single string specifying
metrics in \code{bioregionalizations} for calculating optimal clusters. Defaults
to \code{"all"} (uses all metrics).}

\item{criterion}{A \code{character} string specifying the criterion to identify
optimal clusters. Options include \code{"elbow"}, \code{"increasing_step"},
\code{"decreasing_step"}, \code{"cutoff"}, \code{"breakpoints"}, \code{"min"}, or \code{"max"}.
Defaults to \code{"elbow"}. See Details.}

\item{step_quantile}{For \code{"increasing_step"} or \code{"decreasing_step"},
specifies the quantile of differences between consecutive bioregionalizations as
the cutoff to identify significant steps in \code{eval_metric}.}

\item{step_levels}{For \code{"increasing_step"} or \code{"decreasing_step"}, specifies
the number of largest steps to retain as cutoffs.}

\item{step_round_above}{A \code{boolean} indicating whether the optimal clusters
are above (\code{TRUE}) or below (\code{FALSE}) identified steps. Defaults to \code{TRUE}.}

\item{metric_cutoffs}{For \code{criterion = "cutoff"}, specifies the cutoffs
of \code{eval_metric} to extract cluster counts.}

\item{n_breakpoints}{Specifies the number of breakpoints to find in the
curve. Defaults to 1.}

\item{plot}{A \code{boolean} indicating if a plot of the first \code{eval_metric}
with identified optimal clusters should be drawn.}

\item{verbose}{A \code{boolean} indicating whether to
display progress messages. Set to \code{FALSE} to suppress these messages.}
}
\value{
A \code{list} of class \code{bioregion.optimal.n} with these elements:
\itemize{
\item{\code{args}: Input arguments.}
\item{\code{evaluation_df}: The input evaluation \code{data.frame}, appended with
\code{boolean} columns for optimal cluster counts.}
\item{\code{optimal_nb_clusters}: A \code{list} with optimal cluster counts for each
metric in \code{"metrics_to_use"}, based on the chosen \code{criterion}.}
\item{\code{plot}: The plot (if requested).}}
}
\description{
This function aims to optimize one or several criteria on a set of
ordered bioregionalizations. It is typically used to find one or more optimal
cluster counts on hierarchical trees to cut or ranges of bioregionalizations
from k-means or PAM. Users should exercise caution in other cases
(e.g., unordered bioregionalizations or unrelated bioregionalizations).
}
\details{
This function explores evaluation metric ~ cluster relationships, applying
criteria to find optimal cluster counts.

\strong{Note on criteria:} Several criteria can return multiple optimal cluster
counts, emphasizing hierarchical or nested bioregionalizations. This
approach aligns with modern recommendations for biological datasets, as seen
in Ficetola et al. (2017)'s reanalysis of Holt et al. (2013).

\strong{Criteria for optimal clusters:}
\itemize{
\item{\code{elbow}: Identifies the "elbow" point in the evaluation metric curve,
where incremental improvements diminish. Based on a method to find the
maximum distance from a straight line linking curve endpoints.}
\item{\code{increasing_step} or \code{decreasing_step}: Highlights significant
increases or decreases in metrics by analyzing pairwise differences between
bioregionalizations. Users specify \code{step_quantile} or \code{step_levels}.}
\item{\code{cutoffs}: Derives clusters from specified metric cutoffs, e.g., as in
Holt et al. (2013). Adjust cutoffs based on spatial scale.}
\item{\code{breakpoints}: Uses segmented regression to find breakpoints. Requires
specifying \code{n_breakpoints}.}
\item{\code{min} & \code{max}: Selects clusters at minimum or maximum metric values.}}
}
\note{
Please note that finding the optimal number of clusters is a procedure
which normally requires decisions from the users, and as such can hardly be
fully automatized. Users are strongly advised to read the references
indicated below to look for guidance on how to choose their optimal
number(s) of clusters. Consider the "optimal" numbers of clusters returned
by this function as first approximation of the best numbers for your
bioregionalization.
}
\examples{
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)

dissim <- dissimilarity(comat, metric = "all")

# User-defined number of clusters
tree <- hclu_hierarclust(dissim,
                          optimal_tree_method = "best",
                          n_clust = 5:10)
tree

a <- bioregionalization_metrics(tree,
                                dissimilarity = dissim,
                                species_col = "Node2",
                                site_col = "Node1",
                                eval_metric = "anosim")
                                   
find_optimal_n(a, criterion = 'increasing_step', plot = FALSE)

}
\references{
Holt BG, Lessard J, Borregaard MK, Fritz SA, Araújo MB, Dimitrov D, Fabre P,
Graham CH, Graves GR, Jønsson Ka, Nogués-Bravo D, Wang Z, Whittaker RJ,
Fjeldså J & Rahbek C (2013) An update of Wallace's zoogeographic regions of
the world. \emph{Science} 339, 74-78.

Ficetola GF, Mazel F & Thuiller W (2017) Global determinants of
zoogeographical boundaries. \emph{Nature Ecology & Evolution} 1, 0089.
}
\seealso{
For more details illustrated with a practical example,
see the vignette:
\url{https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html#optimaln}.

Associated functions:
\link{hclu_hierarclust}
}
\author{
Boris Leroy (\email{leroy.boris@gmail.com}) \cr
Maxime Lenormand (\email{maxime.lenormand@inrae.fr}) \cr
Pierre Denelle (\email{pierre.denelle@gmail.com})
}
