FracFixR is a compositional statistical framework for analyzing RNA-seq data from fractionation experiments. It addresses the fundamental challenge where library preparation and sequencing depth obscure the original proportions of RNA fractions.
This vignette demonstrates:
Let’s simulate a simple polysome profiling experiment:
# Simulate count data for 500 genes across 12 samples
n_genes <- 100
n_samples <- 12
# Generate count matrix with varying expression levels
total_counts <- matrix(
rnbinom(n_genes * 4, mu = 100, size = 10),
nrow = n_genes,
dimnames = list(
paste0("Gene", 1:n_genes),
paste0("Sample", 1:4)
)
)
prob=c(1/4,1/4,1/2)
# distribute counts evenly accross different fractions in Control
multinom_samples <- apply(total_counts, c(1, 2), function(x) rmultinom(1, size = x, prob = prob))
reshaped <- aperm(multinom_samples, c(3, 2, 1)) # now (n, 4, 3)
reshaped_matrix1 <- matrix(reshaped, nrow = n_genes, ncol = 12)
colnames(reshaped_matrix1) <- paste0("V", rep(1:4, each = 3), "_p", 1:3)
counts<-cbind(total_counts[,1:2],reshaped_matrix1[,c(2:3,5:6)],total_counts[,3:4],reshaped_matrix1[,c(8:9,11:12)])
# Create annotation data frame
annotation <- data.frame(
Sample = colnames(counts),
Condition = rep(c("Control", "Treatment"), each = 6),
Type = rep(c("Total", "Total", "Light_Polysome", "Heavy_Polysome","Light_Polysome", "Heavy_Polysome"), 2),
Replicate = c("Rep1","Rep2", rep("Rep1", 2), rep("Rep2", 2), "Rep1","Rep2", rep("Rep1", 2), rep("Rep2", 2))
)
print(head(annotation))
#> Sample Condition Type Replicate
#> 1 Sample1 Control Total Rep1
#> 2 Sample2 Control Total Rep2
#> 3 V1_p2 Control Light_Polysome Rep1
#> 4 V1_p3 Control Heavy_Polysome Rep1
#> 5 V2_p2 Control Light_Polysome Rep2
#> 6 V2_p3 Control Heavy_Polysome Rep2
# Run the main analysis
results <- FracFixR(MatrixCounts = counts, Annotation = annotation)
#> Setting up parallel processing...
#> Filtering transcripts based on Total samples...
#> Retained 100 transcripts present in Total samples
#>
#> Processing condition 1/2: Control
#> Selecting transcripts with 70-96% quantiles (range: 214.3 - 296.1)
#> Selected 26 transcripts for regression
#> Processing replicate 1/2: Rep1
#> Processing replicate 2/2: Rep2
#>
#> Processing condition 2/2: Treatment
#> Selecting transcripts with 70-96% quantiles (range: 217.3 - 280.0)
#> Selected 26 transcripts for regression
#> Processing replicate 1/2: Rep1
#> Processing replicate 2/2: Rep2
#>
#> Finalizing results...
#> FracFixR analysis complete!
# View the structure of results
names(results)
#> [1] "OriginalData" "Annotation" "Propestimates" "Coefficients"
#> [5] "Fractions" "plots"
The FracFixR output contains several components:
# 1. Fraction proportions for each replicate
print(results$Fractions)
#> Light_Polysome Heavy_Polysome Lost Replicate
#> 1 0.08560270 0.00123615 0.9131612 Control_Rep1
#> 2 0.04790483 0.06823600 0.8838592 Control_Rep2
#> 3 0.03838147 0.05640819 0.9052103 Treatment_Rep1
#> 4 0.08793944 0.10989656 0.8021640 Treatment_Rep2
# 2. Regression coefficients
print(results$Coefficients)
#> Lost Light_Polysome Heavy_Polysome Replicate
#> 1 226.1183 0.6616193 0.009933183 Control_Rep1
#> 2 219.9686 0.3981329 0.547867870 Control_Rep2
#> 3 223.3712 0.3011167 0.234843211 Treatment_Rep1
#> 4 201.5765 0.3582424 0.424722648 Treatment_Rep2
# 3. Estimated Proportions (first 5 genes, first 6 samples)
print(results$Propestimates[1:5, 1:6])
#> Sample1 Sample2 V1_p2 V1_p3 V2_p2 V2_p3
#> Gene1 253 250 0.17673392 0.0021771360 0.05453876 0.15010079
#> Gene2 237 236 0.05319552 0.0006988169 0.04201403 0.03579036
#> Gene3 249 238 0.07548677 0.0007666551 0.02538431 0.03309269
#> Gene4 243 248 0.10875934 0.0031976685 0.05999263 0.12758567
#> Gene5 241 237 0.06175113 0.0011478345 0.02123376 0.05113433
The plot shows: - The proportion of RNA in each fraction - The “Lost” fraction (grey) representing unrecoverable material - Consistency across replicates
Now let’s identify genes with differential polysome association between conditions:
# Test for differential proportion in heavy polysomes
diff_heavy <- DiffPropTest(
NormObject = results,
Conditions = c("Control", "Treatment"),
Types = "Heavy_Polysome",
Test = "Logit"
)
#> Performing Logit test comparing Conditions Control and Treatment in the Heavy_Polysome fraction
#> Applying FDR correction...
#> Analysis complete. Found 14 transcripts with padj < 0.01
# View top differentially associated genes
top_genes <- diff_heavy[order(diff_heavy$padj), ]
print(head(top_genes, 10))
#> transcript mean_success_cond1 mean_success_cond2 mean_diff log2FC
#> 72 Gene72 0.02617524 0.1390912 0.11291599 2.828417
#> 82 Gene82 0.04533465 0.1535843 0.10824963 2.352886
#> 50 Gene50 0.06573491 0.1267624 0.06102752 2.633829
#> 6 Gene6 0.02014019 0.1492147 0.12907447 2.603087
#> 11 Gene11 0.01350989 0.1079133 0.09440345 3.669058
#> 29 Gene29 0.02035831 0.1017006 0.08134232 2.841302
#> 33 Gene33 0.01628909 0.1063028 0.09001371 3.070389
#> 35 Gene35 0.02986100 0.1431196 0.11325857 2.242306
#> 40 Gene40 0.03166690 0.1083089 0.07664199 2.710986
#> 46 Gene46 0.02526712 0.1347122 0.10944504 2.256062
#> pval padj
#> 72 3.099748e-05 0.001549874
#> 82 1.675630e-05 0.001549874
#> 50 5.916326e-05 0.001972109
#> 6 8.567553e-05 0.002141888
#> 11 7.595966e-04 0.006124418
#> 29 6.455671e-04 0.006124418
#> 33 7.840268e-04 0.006124418
#> 35 7.961743e-04 0.006124418
#> 40 6.586108e-04 0.006124418
#> 46 7.268264e-04 0.006124418
Positive mean_diff indicates higher proportion in Treatment.
volcano <- PlotComparison(
diff_heavy,
Conditions = c("Control", "Treatment"),
Types = "Heavy_Polysome",
cutoff = 20
)
print(volcano)
For real experiments, ensure your data meets these requirements:
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.12.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
#>
#> locale:
#> [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8
#> [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Australia/Sydney
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] FracFixR_1.0.0
#>
#> loaded via a namespace (and not attached):
#> [1] future.apply_1.20.0 gtable_0.3.6 jsonlite_2.0.0
#> [4] dplyr_1.1.4 compiler_4.4.2 tidyselect_1.2.1
#> [7] parallel_4.4.2 tidyr_1.3.1 jquerylib_0.1.4
#> [10] globals_0.18.0 scales_1.4.0 yaml_2.3.10
#> [13] fastmap_1.2.0 ggplot2_3.5.2 R6_2.6.1
#> [16] labeling_0.4.3 generics_0.1.4 knitr_1.50
#> [19] future_1.67.0 tibble_3.3.0 bslib_0.9.0
#> [22] pillar_1.11.0 RColorBrewer_1.1-3 rlang_1.1.6
#> [25] cachem_1.1.0 xfun_0.52 nnls_1.6
#> [28] sass_0.4.10 aod_1.3.3 cli_3.6.5
#> [31] withr_3.0.2 magrittr_2.0.3 digest_0.6.37
#> [34] grid_4.4.2 lifecycle_1.0.4 vctrs_0.6.5
#> [37] evaluate_1.0.4 glue_1.8.0 farver_2.1.2
#> [40] listenv_0.9.1 codetools_0.2-20 parallelly_1.45.1
#> [43] purrr_1.1.0 rmarkdown_2.29 matrixStats_1.5.0
#> [46] tools_4.4.2 pkgconfig_2.0.3 htmltools_0.5.8.1
Cleynen A, Ravindran A, Shirokikh N (2025). FracFixR: A compositional statistical framework for absolute proportion estimation between fractions in RNA sequencing data.
For more examples and advanced usage, see the FracFixR GitHub repository.