Skip to content

RNA-clique

DOI

This is the repository for RNA-clique, a tool for computing pairwise genetic distances from RNA-seq data. The software accepts as input assembled transcriptomes from two or more samples and produces as its output a matrix containing pairwise distances ranging from 0 to 1.

Installation

This software is written in Python. The software additionally requires NCBI BLAST+ and several Python libraries. Guides are provided for installation on specific systems. Alternatively, for installing on other systems, you can see the requirements

Installation guides

Basic usage

To run RNA-clique on your assembled transcriptomes, first make sure that your data are in a format understood by RNA-clique

Then, run rna-clique with the directories containing your transcriptomes, an output directory, and a setting for the number of top genes to select.

rna-clique -O my_rna_clique_out -n 50000 \
           path/to/transcriptome1_dir \
           path/to/transcriptome2_dir \
           path/to/transcriptome3_dir ...

RNA-clique produces an output matrix at my_rna_clique_out/matrix.h5. To see it in a human-readable format, use export_matrix.

python -m rna_clique.export_matrix -m my_rna_clique_out/matrix.h5 

More details about the usage of RNA-clique can be found in the Command-line usage guide

Downstream analyses

The export_matrix program prints the calculated matrix to the standard output, so you can use redirection or pipes to save the results to a file. You could then use the matrix in any downstream application capable of loading arbitrary matrices from files.

For example, if you output the matrix to a file named distances, you could load the matrix in R using the following code:

dis <- as.matrix(read.table("distances", sep=" "))

Using RNA-clique in Python code

You can use RNA-clique directly from your Python code. For example,

from rna_clique.rna_clique import rna_clique
from pathlib import Path

out_dir = Path("rna_clique_out")
out_dir.mkdir(exist_ok=True)
# Get the SampleSimilarity object and a dict mapping paths to their sample
# names.
sim, path_to_sample = rna_clique(
    [
        Path("path/to/transcriptome1_dir"),
        Path("path/to/transcriptome2_dir"),
        Path("path/to/transcriptome3_dir"),
    ],
    out_dir_1=out_dir / "od1",
    out_dir_2=out_dir / "od2",
    cache_dir=out_dir / "db_cache",
    output_graph=output_dir / "graph.pkl",
    output_matrix=output_dir / "matrix.h5",
    top_genes=50000
)
print(sim.get_dissimilarity_df())

For information on finer-grained control via RNA-clique's Python API, see the API Guide

License

All code is licensed under the MIT license, which may be found at LICENSE at the root of this repository.

A machine-readable copyright file in Debian format may also be found at copyright.

Citation

If you use RNA-clique for your work, please cite "RNA-clique: a method for computing genetic distances from RNA-seq data".

@article{tapia2024rna,
  title={{RNA-clique: a method for computing genetic distances from RNA-seq data}},
  author={Tapia, Andrew C and Jaromczyk, Jerzy W and Moore, Neil and Schardl, Christopher L},
  journal={BMC Bioinformatics},
  volume={25},
  year={2024},
  publisher={BioMed Central},
  keywords={pub}
}

Additional documentation