specimen.hqtb.core package

specimen.hqtb.core submodules

specimen.hqtb.core.analysis

Analyse a model (step 5 of the workflow).

specimen.hqtb.core.analysis.run(model_path: str, dir: str, media_path: str = None, namespace: Literal['BiGG'] = 'BiGG', pc_model_path: str = None, pc_based_on: Literal['id'] = 'id', test_aa_auxotrophies: bool = True, pathway: bool = True)[source]

SPECIMEN Step 5: Analyse the generated model.

Args:

model_path (str):
Path to the model.
dir (str):
Path to the output directory.
media_path (str, optional):
Path to a media config file. Using this enables growth simulation. Defaults to None.
namespace (Literal[‘BiGG’], optional):
Namespace to work on. Defaults to ‘BiGG’.
pc_model_path (str, optional):
Path to a core-pan model. Defaults to None.
pc_based_on (Literal[‘id’], optional):
How to compare the model to the core-pan model. Defaults to ‘id’.
test_aa_auxotrophies (bool, optional):
Option to enable the amino acid auxotrophy simulation. Defaults to True.
pathway (bool, optional):
Optional to enable KEGG pathway analysis. Defaults to True.

specimen.hqtb.core.bidirectional_blast

Perform a bidirectional blastp using DIAMOND on an input and a template (annotated genomes).

specimen.hqtb.core.bidirectional_blast.bdbp_diamond(dir: str, template_name: str, input_name: str, template_path: str, input_path: str, sensitivity='sensitive', threads=2)[source]

Perform bidirectional blastp using DIAMOND.

Args:

dir (str):
Path to the directory parent to in/out.
template_name (str):
Name of the template genome.
input_name (str):
Name of the input genome.
template_path (str):
Path to the CDS FASTA-file of the template.
input_path (str):
Path to the CDS FASTA-file of the input.
sensitivity (str, optional):
Sensitivity mode for DIAMOND. Defaults to ‘sensitive’.
threads (int, optional):
Number of threads to use when running DIAMOND. Defaults to 2.

specimen.hqtb.core.bidirectional_blast.create_diamond_db(dir: str, name: str, path: str, threads: int)[source]

Create a DIAMOND database for a given protein FASTA file.

Args:

dir (str):
Path to the data directory.
name (str):
Name of the genome/database.
path (str):
Path to the FASTA-file.
threads (int):
Number of threads to use.

specimen.hqtb.core.bidirectional_blast.extract_bestbdbp_hits(tvq: str, qvt: str, name: str, cov: float = 0.25)[source]

Extract the best directional blastp hits from two tsv files, which were generate by bdbp_diamond() generated or similar steps.

Args:

tvq (str):
Path to the template vs. query file.
qvt (str):
Path to the query vs. template file.
name (str):
Name (path) of the output file.
cov (float, optional):
Cut-off value for the coverage. All hits with coverage < cov will be excluded. Defaults to 0.25.

specimen.hqtb.core.bidirectional_blast.extract_cds(file: str, name: str, dir: str, collect_info: list, identifier: str) → str[source]

Extract the CDS from a genbank file (annotated genome). Produces a FASTA-file.

Args:

file (str):
File to extract CDS from.
name (str):
Name of the genome.
dir (str):
Directory for the ouput.
collect_info (list):
Feature identifiers to collect information from.
identifier (str):
Feature identifier to use of the header of the FASTA.

Returns:

str:: Name of the FASTA-file

specimen.hqtb.core.bidirectional_blast.run(template: str, input: str, dir: str, template_name: str | None = None, input_name: str | None = None, temp_header: str | None = None, in_header: str | None = None, threads: int = 2, extra_info: list[str] = ['locus_tag', 'product', 'protein_id'], sensitivity: Literal['sensitive', 'more-sensitive', 'very-sensitive', 'ultra-sensitive'] = 'more-sensitive')[source]

Run the bidirectional blast on a template and input genome (annotated).

Args:

template (str):
Path to the annotated genome file used as a template.
input (str):
Path to the annotated genome file used as a input.
dir (str):
Path to the output directory.
template_name (str, optional):
Name of the annotated genome file used as a template.. Defaults to None.
input_name (str, optional):
Name of the annotated genome file used as input.. Defaults to None.
temp_header (str, optional):
Feature qualifier of the gbff (NCBI) / faa (PROKKA) of the template to use as header for the FASTA files. If None is given, sets it based on file extension (currently only implemented for gbff and faa). Defaults to ‘protein_id’.
in_header (str, optional):
Feature qualifier of the gbff (NCBI) / faa (PROKKA) of the input to use as header for the FASTA files. If None is given, sets it based on file extension (currently only implememted for gbff and faa). Defaults to ‘locus_tag’.
threads (int, optional):
Number of threads to be used for DIAMOND. Defaults to 2.
extra_info (list[str], optional):
List of feature qualifiers to be extracted from the annotated genome files as additional information. Defaults to [‘locus_tag’, ‘product’, ‘protein_id’].
sensitivity (Literal[‘sensitive’, ‘more-sensitive’, ‘very-sensitive’, ‘ultra-sensitive’], optional):
Sensitivity mode for DIAMOND blastp run.. Defaults to ‘more-sensitive’.

Raises:

ValueError: Unknown file extension. Please set value for temp_header manually or check file.
ValueError: Unknown file extension. Please set value for in_header manually or check file.
ValueError: Unknown sensitive mode

specimen.hqtb.core.bidirectional_blast.run_diamond_blastp(dir: str, db: str, query: str, fasta_path: str, sensitivity: str, threads: int)[source]

Run DIAMOND blastp for a given database name and FASTA - relies on the structure created by bidirectional_blast.

Args:

dir (str):
Parent directory of the place to save the files to.
db (str):
Name of the genome/database used as the database.
query (str):
Name of the genome used as the query.
fasta_path (str):
Path to the FASTA-file containing the CDS.
sensitivity (str):
Sensitivity mode to use for DIAMOND blastp.
threads (int):
Number of threads that will be used for running DIAMOND

specimen.hqtb.core.generate_draft_model

Generate a draft model from a template model.

The basic idea has been adapted from Norsigian et al. (2020).

specimen.hqtb.core.generate_draft_model.check_unchanged(draft: Model, bbh: DataFrame) → Model[source]

Check the genes names (more correctly, the IDs) for still existing original col_names. Depending on the case, decide if to keep or remove them.

Args:

draft (cobra.Model):
The draft model currently in the making.
bbh (pd.DataFrame):
The table from run() containing the bidirectional blastp best hits information.

Returns:

cobra.Model:: The model after the check and possible removal of genes.

specimen.hqtb.core.generate_draft_model.edit_template_identifiers(data: DataFrame, edit: Literal['no', 'dot-to-underscore']) → DataFrame[source]

Edit the subject IDs to fit the gene IDs of the template model. Requires further extention, if needed edits are not included.

Args:

data (pd.DataFrame):
The data frame containing the bidirectional blastp best hits information.
edit (Literal[‘no’,’dot-to-underscore’]):
Type of edit to perform. Currently possible options: no, dot-to-underscore.

Returns:

pd.DataFrame:: The (un)edited DataFrame.

specimen.hqtb.core.generate_draft_model.gen_draft_model(model: Model, bbh: DataFrame, name: str, dir: str, edit: Literal['no', 'dot-to-underscore'], medium: str = 'default', namespace: Literal['BiGG'] = 'BiGG') → Model[source]

Generate a draft model from a template model and the results of a bidirectional blastp (blast best hits) table and save it as a new model.

Args:

model (cobra.Model):
The template model.
bbh (pd.DataFrame):
The bidirectional blastp best hits table.
name (str):
Name of the newly generated model.
dir (str):
Path to the directory to save the new model in.
edit (Literal[‘no’,’dot-to-underscore’):
Type of edit to perform. Currently possible options: no, dot-to-underscore.
medium (str, optional):
Name of the to be loaded from the refineGEMs database or ‘default’ = the one from the template model. If given the keyword ‘exchanges’, will use all exchange reactions in the model as a medium. Defaults to ‘default’.
namespace (Literal[‘BiGG’], optional):
Namespace of the model. Defaults to ‘BiGG’.

Returns:

cobra.Model:: The generated draft model.

specimen.hqtb.core.generate_draft_model.pid_filter(data: DataFrame, pid: float) → DataFrame[source]

Filter the data based on PID threshold. Entries above the given value are retained.

Args:

data (pd.DataFrame):
The data from teh previous step (see bidirectional_blast) containing at least a ‘PID’ column.
pid (float):
PID threshold value, given in percentage e.g. 80.0.

Returns:

pd.DataFrame:: The filtered data.

specimen.hqtb.core.generate_draft_model.remove_absent_genes(model: Model, genes: list[str]) → Model[source]

Remove a list of genes from a given model.

Note

Genes that are not found in the model are skipped.

Args:

model (cobra.Model):
A template model to delete genes from. A copy will be created before deleting.
genes (list[str]):
Gene identifiers of genes that should be deleted.

Returns:

cobra.Model:: A new model with the given genes deleted, if found in the original model.

specimen.hqtb.core.generate_draft_model.rename_found_homologs(draft: Model, bbh: DataFrame) → Model[source]

Rename the genes in the model correnspondingly to the homologous ones found in the query.

Args:

draft (cobra.Model):
The draft model with the to-be-renamed genes.
bbh (pd.DataFrame):
The table from run() containing the bidirectional blastp best hits information

Returns:

cobra.Model:: The draft model with renamed genes.

specimen.hqtb.core.generate_draft_model.run(template: str, bpbbh: str, dir: str, edit_names: Literal['no', 'dot-to-underscore'] = 'no', pid: float = 80.0, name: str | None = None, medium: str = 'default', namespace: str = 'BiGG', memote: bool = False)[source]

Generate a draft model from a blastp best hits tsv file and a template model.

Args:

template (str):
Path to the file containing the template model.
bpbbh (str):
Path to the blastp bidirectional best hits.
dir (str):
Path to output directory.
edit_names (Literal[‘no’,’dot-to-underscore’, optional):
Type of edit to perform. Currently possible options: no, dot-to-underscore. Defaults to ‘no’.
pid (float, optional):
Threshold value for determining, if a gene is counted as present or absent. Given in percentage, e.g. 80.0 = 80%. Defaults to 80.0.
name (Union[str,None], optional):
Name of the output model. If not given, takes name from filename. Defaults to None.
medium (str, optional):
Name of the medium to be loaded from the refineGEMs database or ‘default’ = the one from the template model. If given the keyword ‘exchanges’, will use all exchange reactions in the model as a medium. Defaults to ‘default’.
namespace (str, optional):
Namespace of the model. Defaults to ‘BiGG’.
memote (bool, optional):
Option to run memote after creating the draft model. Defaults to False.

Raises:

ValueError: ‘Edit_names value not in list of allowed values: no, dot-to-underscore’

specimen.hqtb.core.validation

Validate a model (step 4 of the workflow).

Implemented tests in include: - cobra/sbml check using cobrapy

specimen.hqtb.core.validation.run(dir: str, model_path: str, tests: None | str | list = None, run_all: bool = True)[source]

SPECIMEN Step 4: Validate the model.

Included tests (name : description): - modelpolisher: Semantic control and BiGG annotation fixing with ModelPolisher - cobra: SBML validation using COBRApy

Args:

dir (str):
Path to the output directory.
model_path (str):
Path to the model to be validated
tests (Union[None, str, list], optional):
Tests to perform. If the test name is either in a string or an element in a list, the corresponding test will be run. Defaults to None.
run_all (bool, optional):
Run al available tests. If True, overwrites the previous parameter. Defaults to True.

specimen.hqtb.core package

specimen.hqtb.core.refinement subpackage

specimen.hqtb.core.refinement.annotation submodule

specimen.hqtb.core.refinement.cleanup submodule

specimen.hqtb.core.refinement.extension submodule

specimen.hqtb.core.refinement.smoothing submodule

specimen.hqtb.core submodules

specimen.hqtb.core.analysis

specimen.hqtb.core.bidirectional_blast

specimen.hqtb.core.generate_draft_model

specimen.hqtb.core.validation