Skip to content

select_top_genes_all

select_top_and_save(out_dir, transcripts, x, *args)

Select the top n genes by k-mer coverage from transcripts and save them.

This function wraps TopGeneSelector, saving the selected top transcripts by k-mer coverage in a specified directory. The name given to the output file is "{sample}_top.fasta", where the sample is assumed to be the name of the directory in which the transcripts FASTA file is located.

Additional arguments provided will be passed to the TopGeneSelector.from_path classmethod used to construct a TopGeneSelector object.

Parameters:

Name Type Description Default
out_dir Path

Location in which to save top n genes.

required
transcripts str

Name of the FASTA file containing transcripts.

required
x Path

Directory containing the transcripts FASTA file.

required

Returns:

Type Description
tuple[Path, str]

Path to the output file and the inferred sample name.

Source code in src/rna_clique/select_top_genes_all.py
def select_top_and_save(
        out_dir: Path,
        transcripts: str,
        x: Path,
        *args
) -> tuple[Path, str]:
    """Select the top n genes by k-mer coverage from transcripts and save them.

    This function wraps TopGeneSelector, saving the selected top transcripts by
    k-mer coverage in a specified directory. The name given to the output file
    is "{sample}_top.fasta", where the sample is assumed to be the name of the
    directory in which the transcripts FASTA file is located.

    Additional arguments provided will be passed to the
    TopGeneSelector.from_path classmethod used to construct a TopGeneSelector
    object.

    Parameters:
        out_dir:           Location in which to save top n genes.
        transcripts (str): Name of the FASTA file containing transcripts.
        x:                 Directory containing the transcripts FASTA file.

    Returns:
        Path to the output file and the inferred sample name.
    """
    out = out_dir / (x.stem + "_top.fasta")
    Bio.SeqIO.write(
        TopGeneSelector.from_path(
            x / transcripts,
            *args
        ).get_top_gene_seqs(),
        out,
        "fasta"
    )
    return (out, x.stem)