Command Line Interface¶
Command line interfaces (CLI) are provided for the two most common tasks: conformer generation and fingerprinting. At the moment, using the CLI requires downloading the E3FP source.
In the below examples, we assume the E3FP repository is located at
$E3FP_REPO
.
Conformer Generation CLI¶
To see all available options, run
$ python $E3FP_REPO/e3fp/conformer/generate.py --help
usage: Generate conformers from mol2 or SMILES [-h] [-m MOL2 [MOL2 ...]]
[-s SMILES [SMILES ...]]
[--standardise STANDARDISE]
[-n NUM_CONF] [--first FIRST]
[--pool_multiplier POOL_MULTIPLIER]
[-r RMSD_CUTOFF]
[-e MAX_ENERGY_DIFF]
[-f {uff,mmff94,mmff94s}]
[--seed SEED] [-o OUT_DIR]
[-C {0,1,2,None}] [-O]
[--values_file VALUES_FILE]
[--prioritize]
[--params PARAMS] [-l LOG]
[-p NUM_PROC]
[--parallel_mode {mpi,processes,threads,serial}]
[-v]
optional arguments:
-h, --help show this help message and exit
-m MOL2 [MOL2 ...], --mol2 MOL2 [MOL2 ...]
Path to mol2 file(s), each with one molecule.
(default: None)
-s SMILES [SMILES ...], --smiles SMILES [SMILES ...]
Path to file(s) with SMILES and name. (space-
separated) (default: None)
--standardise STANDARDISE
Clean molecules before generating conformers by
standardisation. (default: False)
-n NUM_CONF, --num_conf NUM_CONF
Set single number of conformers to use. -1 results in
auto choosing. (default: -1)
--first FIRST Set maximum number of first conformers to accept.
Conformer generation is unaffected, except it may
terminate early when this number of conformers is
reached. (default: -1)
--pool_multiplier POOL_MULTIPLIER
Factor to multiply `num_conf` by to generate
conformers. Results are then pruned to `num_conf`.
(default: 1)
-r RMSD_CUTOFF, --rmsd_cutoff RMSD_CUTOFF
Choose RMSD cutoff between conformers (default: 0.5)
-e MAX_ENERGY_DIFF, --max_energy_diff MAX_ENERGY_DIFF
Maximum energy difference between lowest energy
conformer and any accepted conformer. (default: None)
-f {uff,mmff94,mmff94s}, --forcefield {uff,mmff94,mmff94s}
Choose forcefield for minimization. (default: uff)
--seed SEED Random seed for conformer generation. (default: -1)
-o OUT_DIR, --out_dir OUT_DIR
Directory to save conformers. (default: conformers)
-C {0,1,2,None}, --compress {0,1,2,None}
Compression to use for SDF files. None and 0 default
to uncompressed ".sdf". 1 and 2 result in gzipped and
bzipped SDF files, respectively. (default: 2)
-O, --overwrite Overwrite existing conformer files. (default: False)
--values_file VALUES_FILE
Save RMSDs and energies to specified hdf5 file.
(default: None)
--prioritize Prioritize likely fast molecules first. (default:
False)
--params PARAMS INI formatted file with parameters. If provided, all
parameters controlling conformer generation are
ignored. (default: None)
-l LOG, --log LOG Generate logfile. (default: None)
-p NUM_PROC, --num_proc NUM_PROC
Set number of processors to use. (default: None)
--parallel_mode {mpi,processes,threads,serial}
Set number of processors to use. (default: None)
-v, --verbose Run with extra verbosity. (default: False)
We will generate conformers for the molecule whose SMILES string is defined in
caffeine.smi
.
CN1C=NC2=C1C(=O)N(C(=O)N2C)C caffeine
The below example generates at most 3 conformers for this molecule.
$ python $E3FP_REPO/e3fp/conformer/generate.py -s caffeine.smi --num_conf 3 -o ./
2017-07-17 00:11:05,743|WARNING|Only 1 processes available. 'mpi' mode not available.
2017-07-17 00:11:05,748|INFO|num_proc is not specified. 'processes' mode will use all 8 processes
2017-07-17 00:11:05,748|INFO|Parallelizer initialized with mode 'processes' and 8 processors.
2017-07-17 00:11:05,748|INFO|Input type: Detected SMILES file(s)
2017-07-17 00:11:05,748|INFO|Input file number: 1
2017-07-17 00:11:05,748|INFO|Parallel Type: processes
2017-07-17 00:11:05,748|INFO|Out Directory: ./
2017-07-17 00:11:05,749|INFO|Overwrite Existing Files: False
2017-07-17 00:11:05,749|INFO|Target Conformer Number: 3
2017-07-17 00:11:05,749|INFO|First Conformers Number: all
2017-07-17 00:11:05,749|INFO|Pool Multiplier: 1
2017-07-17 00:11:05,749|INFO|RMSD Cutoff: 0.5
2017-07-17 00:11:05,749|INFO|Maximum Energy Difference: None
2017-07-17 00:11:05,749|INFO|Forcefield: UFF
2017-07-17 00:11:05,749|INFO|Starting.
2017-07-17 00:11:05,779|INFO|Generating conformers for caffeine.
2017-07-17 00:11:05,823|INFO|Generated 1 conformers for caffeine.
2017-07-17 00:11:05,829|INFO|Saved conformers for caffeine to ./caffeine.sdf.bz2.
The result is a multi-conformer SDF file called caffeine.sdf.bz2
in the
current directory.
Fingerprinting CLI¶
To see all available options, run
$ python $E3FP_REPO/e3fp/fingerprint/generate.py --help
usage: Generate E3FP fingerprints from SDF files. [-h] [-b BITS]
[--first FIRST] [-m LEVEL]
[-r RADIUS_MULTIPLIER]
[--stereo STEREO]
[--counts COUNTS]
[--params PARAMS]
[-o OUT_DIR_BASE]
[--out_ext {.fp.pkl,.fp.gz,.fp.bz2}]
[-d DB_FILE] [--all_iters]
[-O] [-l LOG] [-p NUM_PROC]
[--parallel_mode {mpi,processes,threads,serial}]
[-v]
sdf_files [sdf_files ...]
positional arguments:
sdf_files Path to SDF file(s), each with one molecule and
multiple conformers.
optional arguments:
-h, --help show this help message and exit
-b BITS, --bits BITS Set number of bits for final folded fingerprint. If -1
or None, unfolded (2^32-bit) fingerprints are
generated. (default: 4294967296)
--first FIRST Set maximum number of first conformers for which to
generate fingerprints. (default: 3)
-m LEVEL, --level LEVEL, --max_iterations LEVEL
Maximum number of iterations for fingerprint
generation. If -1, fingerprinting is run until
termination, and `all_iters` is set to False.
(default: 5)
-r RADIUS_MULTIPLIER, --radius_multiplier RADIUS_MULTIPLIER, --shell_radius RADIUS_MULTIPLIER
Distance to increment shell radius at around each
atom, starting at 0.0. (default: 1.718)
--stereo STEREO Differentiate by stereochemistry. (default: True)
--counts COUNTS Store counts-based E3FC instead of default bit-based.
(default: False)
--params PARAMS INI formatted file with parameters. If provided, all
parameters controlling conformer generation are
ignored. (default: None)
-o OUT_DIR_BASE, --out_dir_base OUT_DIR_BASE
Basename for output directory to save fingerprints.
Iteration number is appended to basename. (default:
None)
--out_ext {.fp.pkl,.fp.gz,.fp.bz2}
Extension for fingerprint pickles. (default: .fp.bz2)
-d DB_FILE, --db_file DB_FILE
Output file containing FingerprintDatabase object
(default: fingerprints.fpz)
--all_iters Save fingerprints from all iterations to file(s).
(default: False)
-O, --overwrite Overwrite existing file(s). (default: False)
-l LOG, --log LOG Log filename. (default: None)
-p NUM_PROC, --num_proc NUM_PROC
Set number of processors to use. (default: None)
--parallel_mode {mpi,processes,threads,serial}
Set parallelization mode to use. (default: None)
-v, --verbose Run with extra verbosity. (default: False)
To continue the above example, we will fingerprint our caffeine conformers.
$ python $E3FP_REPO/e3fp/fingerprint/generate.py caffeine.sdf.bz2 --bits 1024
2017-07-17 00:12:33,797|WARNING|Only 1 processes available. 'mpi' mode not available.
2017-07-17 00:12:33,801|INFO|num_proc is not specified. 'processes' mode will use all 8 processes
2017-07-17 00:12:33,801|INFO|Parallelizer initialized with mode 'processes' and 8 processors.
2017-07-17 00:12:33,801|INFO|Initializing E3FP generation.
2017-07-17 00:12:33,801|INFO|Getting SDF files
2017-07-17 00:12:33,801|INFO|SDF File Number: 1
2017-07-17 00:12:33,802|INFO|Database File: fingerprints.fpz
2017-07-17 00:12:33,802|INFO|Max First Conformers: 3
2017-07-17 00:12:33,802|INFO|Bits: 1024
2017-07-17 00:12:33,802|INFO|Level/Max Iterations: 5
2017-07-17 00:12:33,802|INFO|Shell Radius Multiplier: 1.718
2017-07-17 00:12:33,802|INFO|Stereo Mode: True
2017-07-17 00:12:33,802|INFO|Connected-only mode: on
2017-07-17 00:12:33,802|INFO|Invariant type: Daylight
2017-07-17 00:12:33,802|INFO|Parallel Mode: processes
2017-07-17 00:12:33,802|INFO|Starting
2017-07-17 00:12:33,829|INFO|Generating fingerprints for caffeine.
2017-07-17 00:12:33,935|INFO|Generated 1 fingerprints for caffeine.
2017-07-17 00:12:34,011|INFO|Saved FingerprintDatabase with fingerprints to fingerprints.fpz
The result is a file fingerprints.fpz
containing a
FingerprintDatabase
. To use such a database, consult
Fingerprint Storage.