The specimen.cmd_access submodule
=================================

.. warning:: 

   The ``HQTB`` workflow is under heavy construction due to 
   changes in ``refineGEMs``. It might not work as expected 
   (or throw errors). Please await the next update.

.. automodule:: specimen.cmd_access
   :members:
   :undoc-members:
   :show-inheritance:

After a successfull installation, ``SPECIMEN`` can be accessed via the command line
from inside the Python environment it was installed in using:

.. code-block:: bash

    specimen [OPTIONS] COMMAND [ARGS]

The following commands are available:

- ``cmpb`` : Workflow for GEM curation based on CarveMe model and ModelPolisher.
- ``hqtb`` : Workflow for GEM curation based on a high-quality template.
- ``setup`` : Setup structure, data and more.


General Options
---------------

- ``--help``: Call the help page of the command.

specimen setup 
--------------

.. code:: bash

   specimen setup config 

Download a configuration file, either for the worfklow or for media.

Options:

- ``--filename/-f``: Name/Path to save the config under.
- ``--type/-t``: Type of config to download. Can be media or basic/advanced for the worfklow config.

.. code:: bash

   specimen data structure [WORKFLOW]

Setup a directory with the basic structure for the data needed for the workflow.

Argument:

- ``WORKFLOW``: The name of the worfklow to setup the structure for.

Options:

- ``--dir/-d``: Name/Path of the directory
- ``--chunk-size/-s``: Parameter for doenloading files from the web.

specimen hqtb 
-------------

.. code:: bash

   specimen hqtb run [CONFIG]

Run the complete worfklow with a configuration file as input.

.. code:: bash 

   specimen hqtb wrapper [CONFIG]
   
Run the workflow using a config on a directory containing multiple input genomes.

Options:

- ``--dir/-d``: Name/Path of the directory that contains the input.

.. code:: bash 

   specimen hqtb bdb [TEMPLATE] [INPUT]

Run step 1: bidirectional BLAST of the workflow. Requires the input and template genome as input.

Options:

- ``--template-name`` : Name of the annotated genome of the template, if it should not be extracted from the filename.
- ``--input-name`` : Name of the annotated genome of the input, if it should not be extracted from the filename.
- ``--temp-header``: Feature qualifier of the gbff of the template to use as header for the FASTA files. Defaults to ``protein_id``.
- ``--in-header``: Feature qualifier of the gbff of the input to use as header for the FASTA files. Defaults to ``locus_tag``.
- ``--dir/-d``: Name/Path of the directory to save the output to.
- ``--threads/-t``: Number of threads to use for DIAMOND.
- ``--sensitivity/-s``: Sensitvity mode to use for DIAMOND.

.. code:: bash

   specimen hqtb draft [TEMPLATE] [BPBBH]

Run step 2: generate draft model of the worfklow. Requires the results of the bidirectional BLAST 
and the template model as input.

Options:

- ``--dir/-d``: Name/Path of the directory to save the output to.
- ``--edit-names``: Choose an option to change the IDs inside the bpbbh-file to fit the model, if neccessary.
- ``--pid``: Threshold value for the PID. Default to 80.0 (80 percent)
- ``--name``: Name of the output model.
- ``--medium``: Medium for the new model. Can be a name of a medium in the refineGEMS database, they keyword *default*, which uses the medium from the template or the keyword *exchanges*, which constructs a medium from all available exchange reactions.
- ``namespace``: Namespace to use for the model.
- ``--memote``: Run Memote after contructing the draft model.

.. code:: bash 

   specimen hqtb validation [MODEL]

Run step 4: validation on a model.

Options:

- ``--dir/-d``: Name/Path of the directory to save the output to.
- ``--run-test/-t``: Specify validation tests to be run (multiple can be set). If the keyword *all* is given, runs all available tests.

.. code:: bash 

   specimen hqtb analysis [MODEL]

Run step 5: analysis on a model.

Options:

- ``--dir``: Path to a directory for the output.
- ``--pan-core-comparison/--pcc``: Option on which feature the comparison of pan-core model and model should should be based on. Default is "id".
- ``--pan-core-model/--pcm``: Path to a pan-core model.
- ``--namespace/-n``: Namespace used by the given model. Defaults to BiGG.
- ``--media-path/--mp``: Path to a media config file. Enables growth analysis if given.
- ``--test-aa-auxotrophies/--taa``: Option to test media/model for auxotrophies.
- ``--pathway/--pathway-analysis``: Option to perform a pathway analysis using KEGG pathway identifiers.

specimen hqtb refinement
^^^^^^^^^^^^^^^^^^^^^^^^

Run the different parts of the step 3: refinement of the workflow.

.. code:: bash

   specimen hqtb refinement extension 

Run the first part, extension.

Required options:

- ``--draft``: Path to the draft model.
- ``--gene-list/-g``: Path to a csv file containing information on all the genes found in the annotated genome.
- ``--fasta/-f``: Path to the (protein) FASTA file containing the CDS sequences
- ``--db/--database``: Path to the database used for running DIAMOND.
- ``--mnx-chem-prop``: Path to the MetaNetX chem_prop namespace file.
- ``--mnx-chem-xref``: Path to the MetaNetX chem_xref namespace file.
- ``--mnx-reac-prop``: Path to the MetaNetX reac_prop namespace file.
- ``--mnx-reac-xref``: Path to the MetaNetX reac_xref namespace file.

Further options:

- ``--ncbi_map``: Path to the ncbi information mapping file. Optional, but recommended.
- ``--ncbi_dat``: Path to the ncbi database information file. Optional, but recommended.
- ``--dir/-d``: Path to the directory for the output (directories)
- ``--id/-i``: Name of the column of the csv file that contains the entries that were used as gene identifiers in the draft model.
- ``--sensitivity/-s``: Sensitivity mode for DIAMOND blastp run. Default is sensitive.
- ``--coverage/-c``: Threshold value for the query coverage for DIAMOND. Default is 80.0.
- ``--pid``: PID (percentage identity value) to filter the blast hist by. Default is 90.0, only hits equal or above the given value are kept.
- ``--threads/-t``: Number of threads to be used.
- ``--include_dna``: Include reactions with DNA in their name when added (developer information: True == excluded).
- ``--include_rna``: Include reactions with RNA in their name when added (developer information: True == excluded).
- ``--memote``: Use memote on the extended model.

.. code:: bash

   specimen hqtb refinement cleanup [MODEL]

Based on a draft model, run the second part of refinement, cleanup.

Options:

- ``--dir/-d``: Path to the directory for the output (directories)
- ``--biocyc_db``: Path to the BioCyc (MetaCyc) database information file (for reactions). Optional, but recommended. Necessary for checking directionality
- ``--check_dupl_reac/--cdr``: 'Check for duplicate reactions.
- ``--check_dupl_meta/--cdm``: default='default``: Check for duplicate metabolites. Can "default" (starting point MetaNetX), exhaustive (iterate over all annotations as starting points) or "skip".
- ``--objective_function``: '--of``: Name, ID of the objective function of the model. Default is "Growth".
- ``--remove_dupl_meta/--rdm``: 'Option for removing/replacing duplicate metabolites.
- ``--remove_unused_meta/--rum``: 'Option for removing unused metabolites from the model. Only used when cdm is not skipped.
- ``--remove_dupl_reac/--rdr``: 'Option for removing duplicate reaction from the model.
- ``--universal/-u``: Path to a universal model containing reactions used for gapfilling.
- ``--media-path/--mp``: Path to a media config to use for gapfilling.
- ``--namespace/--nsp``: Namespace to use for the model.
- ``--growth_threshold/-gt``: Threshold value for a model to be considered growing.
- ``--iterations/-i``: Number of iterations for the gapfilling. If 0 is passed, uses full set of reactions instead of heuristic.
- ``--chunk_size``: Number of reactions to be tested simultaniously if using the heuristic version of gapfilling. If this is 0, heuristic will not be applied.

.. code:: bash

   specimen hqtb refinement annotation [MODEL]
   
Run the thrid part of the refinement, annotation, on a given model.

Options:

- ``--dir``: Path to a directory for the output.
- ``--kegg-via-ec/--via-ec``: Try to map EC numbers to KEGG pathway, if KEGG reaction cannot be mapped directly.
- ``--kegg-via-rc/--via-rc``: Try to map RC numbers to KEGG pathway, if KEGG reaction cannot be mapped directly.
- ``--memote``: Use memote on the extended model.

.. code:: bash 

   specimen hqtb refinement smoothing [MODEL]

Required Options:

- ``--genome/-g``: Path to the genome FASTA (e.g. .fna) file of your genome.

Further Options:

- ``--dir/-d``: Path to a directory for the output.
- ``--egc-solver/--egc``: String sets the type of solver to use to solve EGCs. Otherwise just reports existing EGCs.
- ``--namespace/--nsp``: Namespace of the model.
- ``--mcc``: Option to perform MassChargeCuration on the model. Can be used directly on model or as extra information. Choices are "apply","extra" and "skip". Deafult is "skip".
- ``--dna_weight_frac``: DNA macromolecular weight fraction for your organism. Default is 0.023 for Klebsiella based on Liao et al.
- ``--ion_weight_frac``: weight fraction for the coenzymes and ions. Default is 0.05 based on the default of BOFdat.
- ``--memote``: Use memote on the extended model.

specimen cmpb
-------------

.. code:: bash

   specimen cmpb run [CONFIG]

Run the complete CMPB workflow with a configuration file as input.