RNA-clique
This is the repository for RNA-clique, a tool for computing pairwise genetic distances from RNA-seq data. The software accepts as input assembled transcriptomes from two or more samples and produces as its output a matrix containing pairwise distances ranging from 0 to 1.
Installation
This software is written in Python. The software additionally requires NCBI BLAST+ and several Python libraries. Guides are provided for installation on specific systems. Alternatively, for installing on other systems, you can see the requirements
Installation guides
Basic usage
To run RNA-clique on your assembled transcriptomes, first make sure that your data are in a format understood by RNA-clique
Then, run rna-clique with the directories containing your transcriptomes, an
output directory, and a setting for the number of top genes to select.
rna-clique -O my_rna_clique_out -n 50000 \
path/to/transcriptome1_dir \
path/to/transcriptome2_dir \
path/to/transcriptome3_dir ...
RNA-clique produces an output matrix at my_rna_clique_out/matrix.h5. To see it
in a human-readable format, use export_matrix.
More details about the usage of RNA-clique can be found in the Command-line usage guide
Downstream analyses
The export_matrix program prints the calculated matrix to the standard
output, so you can use redirection or pipes to save the results to a file. You
could then use the matrix in any downstream application capable of loading
arbitrary matrices from files.
For example, if you output the matrix to a file named distances, you could
load the matrix in R using the following code:
Using RNA-clique in Python code
You can use RNA-clique directly from your Python code. For example,
from rna_clique.rna_clique import rna_clique
from pathlib import Path
out_dir = Path("rna_clique_out")
out_dir.mkdir(exist_ok=True)
# Get the SampleSimilarity object and a dict mapping paths to their sample
# names.
sim, path_to_sample = rna_clique(
[
Path("path/to/transcriptome1_dir"),
Path("path/to/transcriptome2_dir"),
Path("path/to/transcriptome3_dir"),
],
out_dir_1=out_dir / "od1",
out_dir_2=out_dir / "od2",
cache_dir=out_dir / "db_cache",
output_graph=output_dir / "graph.pkl",
output_matrix=output_dir / "matrix.h5",
top_genes=50000
)
print(sim.get_dissimilarity_df())
For information on finer-grained control via RNA-clique's Python API, see the API Guide
License
All code is licensed under the MIT license, which may be found at LICENSE at the root of this repository.
A machine-readable copyright file in Debian format may also be found at copyright.
Citation
If you use RNA-clique for your work, please cite "RNA-clique: a method for computing genetic distances from RNA-seq data".
@article{tapia2024rna,
title={{RNA-clique: a method for computing genetic distances from RNA-seq data}},
author={Tapia, Andrew C and Jaromczyk, Jerzy W and Moore, Neil and Schardl, Christopher L},
journal={BMC Bioinformatics},
volume={25},
year={2024},
publisher={BioMed Central},
keywords={pub}
}