specimen.util package

specimen.util submodules

specimen.util.set_up module

Collection of functions for setting up the environment for the pipelines.

specimen.util.set_up.CMPB_CONFIG_PATHS_REQUIRED = ['mediapath']
specimen.util.set_up.HQTB_CONFIG_PATH_OPTIONAL = ['media_gap', 'ncbi_map', 'biocyc', 'universal', 'pan-core', 'fasta', 'gff', 'dmnd-database', 'database-mapping']
specimen.util.set_up.HQTB_CONFIG_PATH_REQUIRED = ['annotated_genome', 'full_sequence', 'model', 'diamond', 'media_analysis']
specimen.util.set_up.PIPELINE_PATHS_OPTIONAL = {'cmpb': ['modelpath', 'full_genome_sequence', 'gff', 'protein_fasta', 'gene-table', 'reacs-table', 'gff', 'dmnd-database', 'database-mapping', 'reaction_direction'], 'hqtb': ['media_gap', 'ncbi_map', 'biocyc', 'universal', 'pan-core', 'fasta', 'gff', 'dmnd-database', 'database-mapping']}
specimen.util.set_up.PIPELINE_PATHS_REQUIRED = {'cmpb': ['mediapath'], 'hqtb': ['annotated_genome', 'full_sequence', 'model', 'diamond', 'media_analysis']}
specimen.util.set_up.build_data_directories(pipeline: Literal['hqtb', 'high-quality template based', 'cmpb', 'carveme modelpolisher based'], parent_dir: str)[source]

Set up the necessary directory structure and download files if possible for the given pipeline.

Args:
  • pipeline (Literal[‘hqtb’,’high’):

    For which pipeline the structure should be.

  • parent_dir (str):

    Parent directory/ Path to write the structure to.

Raises:
  • ValueError: Unknown input for parameter pipeline

specimen.util.set_up.download_config(filename: str = 'my_basic_config.yaml', type: Literal['hqtb-basic', 'hqtb-advanced', 'hqtb-defaults', 'media', 'cmpb'] = 'hqtb basic')[source]

Load a configuration file from the package and save a copy for the user to edit.

The media config and the config for the cmpb / CarveMe + Modelpolisher based pipeline can be downloaded using ‘media’ and ‘cmpb’ respectively

For the hqtb / high-quality template based pipeline:

Depending on the knowledge of the user, either a ‘hqtb-basic’ or an ‘hqtb-advanced’ type of configuration file can be downloaded (or ‘hqtb-defaults’ for developers).

Args:
  • filename (str, optional):

    Filename/filepath to save the downloaded config file under. Defaults to ‘my_basic_config.yaml’.

  • type (Literal[‘hqtb-basic’,’hqtb-advanced’,’hqtb-defaults’,’media’,’cmpb’], optional):

    The type of file to download. Can be ‘hqtb-basic’, ‘hqtb-advanced’ or ‘hqtb-defaults’ or ‘media’ or ‘cmpb’. Defaults to ‘hqtb basic’.

Raises:
  • ValueError: Unknown type of config file detected.

specimen.util.set_up.save_cmpb_user_input(configpath: str | None = None) dict[source]

Guide the user step by step through the creation of the configuration for a cmpb pipeline run (via commandline).

Args:
  • configpath (Union[str,None], optional):

    Path to a file to save the config under. Defaults to None.

Returns:
dict:

The configuration in dictionary format.

specimen.util.set_up.validate_config(userc: str, pipeline: Literal['hqtb', 'cmpb'] = 'hqtb') dict[source]

Validate a user hqtb config file for use in the pipeline.

Note

Currently not everything is checked, mainly the needed files are.

Args:
  • userc (str):

    Path to the user configuration file.

Raises:
  • FileNotFoundError: Directory set for config:data:data:direc does not exist.

Returns:
dict:

The validated, read-in configuration file, nested (read-in yaml file).

specimen.util.util module

Utility functions.

specimen.util.util.create_DIAMOND_db_from_folder(dir: str, out: str, name: str = 'database', extension: str = 'faa', threads: int = 2)[source]

Build a DIAMOND database from a folder containing FASTA files.

Args:
  • dir (str):

    Path to the directory to search for FASTA files for the database (recursive file search).

  • out (str):

    Path of the directory of the output.

  • name (str, optional):

    Name of the created database. Defaults to ‘database’.

  • extension (str, optional):

    File extension of the FASTA files (to determine which files to search for). Defaults to ‘faa’.

  • threads (int, optional):

    Number of threads to use for DIAMOND. Defaults to 2.

specimen.util.util.create_NCBIinfo_mapping(dir: str, out: str, extension: Literal['gbff'] = 'gbff')[source]

Create a NCBI information mapping file from a folder containing e.g. gbff files.

Args:
  • dir (str):

    Path to the directory for the recursive file search for the mapping.

  • out (str):

    Path of the directory for the output.

  • extension (Literal[‘gbff’], optional):

    Name of the file extension to be searched. Default is gbff, and currently it is advised to leave it at that. Defaults to ‘gbff’.