specimen.util package
specimen.util submodules
specimen.util.set_up module
Collection of functions for setting up the environment for the pipelines.
- specimen.util.set_up.CMPB_CONFIG_PATHS_REQUIRED = ['mediapath']
- specimen.util.set_up.HQTB_CONFIG_PATH_OPTIONAL = ['media_gap', 'ncbi_map', 'biocyc', 'universal', 'pan-core', 'fasta', 'gff', 'dmnd-database', 'database-mapping']
- specimen.util.set_up.HQTB_CONFIG_PATH_REQUIRED = ['annotated_genome', 'full_sequence', 'model', 'diamond', 'media_analysis']
- specimen.util.set_up.PIPELINE_PATHS_OPTIONAL = {'cmpb': ['modelpath', 'full_genome_sequence', 'gff', 'protein_fasta', 'gene-table', 'reacs-table', 'gff', 'dmnd-database', 'database-mapping', 'reaction_direction'], 'hqtb': ['media_gap', 'ncbi_map', 'biocyc', 'universal', 'pan-core', 'fasta', 'gff', 'dmnd-database', 'database-mapping']}
- specimen.util.set_up.PIPELINE_PATHS_REQUIRED = {'cmpb': ['mediapath'], 'hqtb': ['annotated_genome', 'full_sequence', 'model', 'diamond', 'media_analysis']}
- specimen.util.set_up.build_data_directories(pipeline: Literal['hqtb', 'high-quality template based', 'cmpb', 'carveme modelpolisher based'], parent_dir: str)[source]
Set up the necessary directory structure and download files if possible for the given pipeline.
- Args:
- pipeline (Literal[‘hqtb’,’high’):
For which pipeline the structure should be.
- parent_dir (str):
Parent directory/ Path to write the structure to.
- Raises:
ValueError: Unknown input for parameter pipeline
- specimen.util.set_up.download_config(filename: str = 'my_basic_config.yaml', type: Literal['hqtb-basic', 'hqtb-advanced', 'hqtb-defaults', 'media', 'cmpb'] = 'hqtb basic')[source]
Load a configuration file from the package and save a copy for the user to edit.
The media config and the config for the cmpb / CarveMe + Modelpolisher based pipeline can be downloaded using ‘media’ and ‘cmpb’ respectively
For the hqtb / high-quality template based pipeline:
Depending on the knowledge of the user, either a ‘hqtb-basic’ or an ‘hqtb-advanced’ type of configuration file can be downloaded (or ‘hqtb-defaults’ for developers).
- Args:
- filename (str, optional):
Filename/filepath to save the downloaded config file under. Defaults to ‘my_basic_config.yaml’.
- type (Literal[‘hqtb-basic’,’hqtb-advanced’,’hqtb-defaults’,’media’,’cmpb’], optional):
The type of file to download. Can be ‘hqtb-basic’, ‘hqtb-advanced’ or ‘hqtb-defaults’ or ‘media’ or ‘cmpb’. Defaults to ‘hqtb basic’.
- Raises:
ValueError: Unknown type of config file detected.
- specimen.util.set_up.save_cmpb_user_input(configpath: str | None = None) dict[source]
Guide the user step by step through the creation of the configuration for a cmpb pipeline run (via commandline).
- Args:
- configpath (Union[str,None], optional):
Path to a file to save the config under. Defaults to None.
- Returns:
- dict:
The configuration in dictionary format.
- specimen.util.set_up.validate_config(userc: str, pipeline: Literal['hqtb', 'cmpb'] = 'hqtb') dict[source]
Validate a user hqtb config file for use in the pipeline.
Note
Currently not everything is checked, mainly the needed files are.
- Args:
- userc (str):
Path to the user configuration file.
- Raises:
FileNotFoundError: Directory set for config:data:data:direc does not exist.
- Returns:
- dict:
The validated, read-in configuration file, nested (read-in yaml file).
specimen.util.util module
Utility functions.
- specimen.util.util.create_DIAMOND_db_from_folder(dir: str, out: str, name: str = 'database', extension: str = 'faa', threads: int = 2)[source]
Build a DIAMOND database from a folder containing FASTA files.
- Args:
- dir (str):
Path to the directory to search for FASTA files for the database (recursive file search).
- out (str):
Path of the directory of the output.
- name (str, optional):
Name of the created database. Defaults to ‘database’.
- extension (str, optional):
File extension of the FASTA files (to determine which files to search for). Defaults to ‘faa’.
- threads (int, optional):
Number of threads to use for DIAMOND. Defaults to 2.
- specimen.util.util.create_NCBIinfo_mapping(dir: str, out: str, extension: Literal['gbff'] = 'gbff')[source]
Create a NCBI information mapping file from a folder containing e.g. gbff files.
- Args:
- dir (str):
Path to the directory for the recursive file search for the mapping.
- out (str):
Path of the directory for the output.
- extension (Literal[‘gbff’], optional):
Name of the file extension to be searched. Default is gbff, and currently it is advised to leave it at that. Defaults to ‘gbff’.