specimen.util package

specimen.util submodules

specimen.util.set_up module

Collection of functions for setting up the environment for the pipelines.

specimen.util.set_up.CMPB_CONFIG_PATHS_REQUIRED = ['mediapath']

specimen.util.set_up.HQTB_CONFIG_PATH_OPTIONAL = ['media_gap', 'ncbi_map', 'biocyc', 'universal', 'pan-core', 'fasta', 'gff', 'dmnd-database', 'database-mapping']

specimen.util.set_up.HQTB_CONFIG_PATH_REQUIRED = ['annotated_genome', 'full_sequence', 'model', 'diamond', 'media_analysis']

specimen.util.set_up.PIPELINE_PATHS_OPTIONAL = {'cmpb': ['modelpath', 'full_genome_sequence', 'gff', 'protein_fasta', 'gene-table', 'reacs-table', 'gff', 'dmnd-database', 'database-mapping', 'reaction_direction'], 'hqtb': ['media_gap', 'ncbi_map', 'biocyc', 'universal', 'pan-core', 'fasta', 'gff', 'dmnd-database', 'database-mapping']}

specimen.util.set_up.PIPELINE_PATHS_REQUIRED = {'cmpb': ['mediapath'], 'hqtb': ['annotated_genome', 'full_sequence', 'model', 'diamond', 'media_analysis']}

specimen.util.set_up.build_data_directories(pipeline: Literal['hqtb', 'high-quality template based', 'cmpb', 'carveme modelpolisher based'], parent_dir: str)[source]

Set up the necessary directory structure and download files if possible for the given pipeline.

Args:

pipeline (Literal[‘hqtb’,’high’):
For which pipeline the structure should be.
parent_dir (str):
Parent directory/ Path to write the structure to.

Raises:

ValueError: Unknown input for parameter pipeline

specimen.util.set_up.download_config(filename: str = 'my_basic_config.yaml', type: Literal['hqtb-basic', 'hqtb-advanced', 'hqtb-defaults', 'media', 'cmpb'] = 'hqtb basic')[source]

Load a configuration file from the package and save a copy for the user to edit.

The media config and the config for the cmpb / CarveMe + Modelpolisher based pipeline can be downloaded using ‘media’ and ‘cmpb’ respectively

For the hqtb / high-quality template based pipeline:

Depending on the knowledge of the user, either a ‘hqtb-basic’ or an ‘hqtb-advanced’ type of configuration file can be downloaded (or ‘hqtb-defaults’ for developers).

Args:

filename (str, optional):
Filename/filepath to save the downloaded config file under. Defaults to ‘my_basic_config.yaml’.
type (Literal[‘hqtb-basic’,’hqtb-advanced’,’hqtb-defaults’,’media’,’cmpb’], optional):
The type of file to download. Can be ‘hqtb-basic’, ‘hqtb-advanced’ or ‘hqtb-defaults’ or ‘media’ or ‘cmpb’. Defaults to ‘hqtb basic’.

Raises:

ValueError: Unknown type of config file detected.

specimen.util.set_up.save_cmpb_user_input(configpath: str | None = None) → dict[source]

Guide the user step by step through the creation of the configuration for a cmpb pipeline run (via commandline).

Args:

configpath (Union[str,None], optional):
Path to a file to save the config under. Defaults to None.

Returns:

dict:: The configuration in dictionary format.

specimen.util.set_up.validate_config(userc: str, pipeline: Literal['hqtb', 'cmpb'] = 'hqtb') → dict[source]

Validate a user hqtb config file for use in the pipeline.

Note

Currently not everything is checked, mainly the needed files are.

Args:

userc (str):
Path to the user configuration file.

Raises:

FileNotFoundError: Directory set for config:data:data:direc does not exist.

Returns:

dict:: The validated, read-in configuration file, nested (read-in yaml file).

specimen.util.util module

Utility functions.

specimen.util.util.create_DIAMOND_db_from_folder(dir: str, out: str, name: str = 'database', extension: str = 'faa', threads: int = 2)[source]

Build a DIAMOND database from a folder containing FASTA files.

Args:

dir (str):
Path to the directory to search for FASTA files for the database (recursive file search).
out (str):
Path of the directory of the output.
name (str, optional):
Name of the created database. Defaults to ‘database’.
extension (str, optional):
File extension of the FASTA files (to determine which files to search for). Defaults to ‘faa’.
threads (int, optional):
Number of threads to use for DIAMOND. Defaults to 2.

specimen.util.util.create_NCBIinfo_mapping(dir: str, out: str, extension: Literal['gbff'] = 'gbff')[source]

Create a NCBI information mapping file from a folder containing e.g. gbff files.

Args:

dir (str):
Path to the directory for the recursive file search for the mapping.
out (str):
Path of the directory for the output.
extension (Literal[‘gbff’], optional):
Name of the file extension to be searched. Default is gbff, and currently it is advised to leave it at that. Defaults to ‘gbff’.