Command Line Interface

Command line interfaces (CLI) are provided for the two most common tasks: conformer generation and fingerprinting. At the moment, using the CLI requires downloading the E3FP source.

In the below examples, we assume the E3FP repository is located at $E3FP_REPO.

Conformer Generation CLI

To see all available options, run

$ python $E3FP_REPO/e3fp/conformer/generate.py --help
usage: Generate conformers from mol2 or SMILES [-h] [-m MOL2 [MOL2 ...]]
                                               [-s SMILES [SMILES ...]]
                                               [--standardise STANDARDISE]
                                               [-n NUM_CONF] [--first FIRST]
                                               [--pool_multiplier POOL_MULTIPLIER]
                                               [-r RMSD_CUTOFF]
                                               [-e MAX_ENERGY_DIFF]
                                               [-f {uff,mmff94,mmff94s}]
                                               [--seed SEED] [-o OUT_DIR]
                                               [-C {0,1,2,None}] [-O]
                                               [--values_file VALUES_FILE]
                                               [--prioritize]
                                               [--params PARAMS] [-l LOG]
                                               [-p NUM_PROC]
                                               [--parallel_mode {mpi,processes,threads,serial}]
                                               [-v]

optional arguments:
  -h, --help            show this help message and exit
  -m MOL2 [MOL2 ...], --mol2 MOL2 [MOL2 ...]
                        Path to mol2 file(s), each with one molecule.
                        (default: None)
  -s SMILES [SMILES ...], --smiles SMILES [SMILES ...]
                        Path to file(s) with SMILES and name. (space-
                        separated) (default: None)
  --standardise STANDARDISE
                        Clean molecules before generating conformers by
                        standardisation. (default: False)
  -n NUM_CONF, --num_conf NUM_CONF
                        Set single number of conformers to use. -1 results in
                        auto choosing. (default: -1)
  --first FIRST         Set maximum number of first conformers to accept.
                        Conformer generation is unaffected, except it may
                        terminate early when this number of conformers is
                        reached. (default: -1)
  --pool_multiplier POOL_MULTIPLIER
                        Factor to multiply `num_conf` by to generate
                        conformers. Results are then pruned to `num_conf`.
                        (default: 1)
  -r RMSD_CUTOFF, --rmsd_cutoff RMSD_CUTOFF
                        Choose RMSD cutoff between conformers (default: 0.5)
  -e MAX_ENERGY_DIFF, --max_energy_diff MAX_ENERGY_DIFF
                        Maximum energy difference between lowest energy
                        conformer and any accepted conformer. (default: None)
  -f {uff,mmff94,mmff94s}, --forcefield {uff,mmff94,mmff94s}
                        Choose forcefield for minimization. (default: uff)
  --seed SEED           Random seed for conformer generation. (default: -1)
  -o OUT_DIR, --out_dir OUT_DIR
                        Directory to save conformers. (default: conformers)
  -C {0,1,2,None}, --compress {0,1,2,None}
                        Compression to use for SDF files. None and 0 default
                        to uncompressed ".sdf". 1 and 2 result in gzipped and
                        bzipped SDF files, respectively. (default: 2)
  -O, --overwrite       Overwrite existing conformer files. (default: False)
  --values_file VALUES_FILE
                        Save RMSDs and energies to specified hdf5 file.
                        (default: None)
  --prioritize          Prioritize likely fast molecules first. (default:
                        False)
  --params PARAMS       INI formatted file with parameters. If provided, all
                        parameters controlling conformer generation are
                        ignored. (default: None)
  -l LOG, --log LOG     Generate logfile. (default: None)
  -p NUM_PROC, --num_proc NUM_PROC
                        Set number of processors to use. (default: None)
  --parallel_mode {mpi,processes,threads,serial}
                        Set number of processors to use. (default: None)
  -v, --verbose         Run with extra verbosity. (default: False)

We will generate conformers for the molecule whose SMILES string is defined in caffeine.smi.

caffeine.smi
CN1C=NC2=C1C(=O)N(C(=O)N2C)C caffeine

The below example generates at most 3 conformers for this molecule.

$ python $E3FP_REPO/e3fp/conformer/generate.py -s caffeine.smi --num_conf 3 -o ./
2017-07-17 00:11:05,743|WARNING|Only 1 processes available. 'mpi' mode not available.
2017-07-17 00:11:05,748|INFO|num_proc is not specified. 'processes' mode will use all 8 processes
2017-07-17 00:11:05,748|INFO|Parallelizer initialized with mode 'processes' and 8 processors.
2017-07-17 00:11:05,748|INFO|Input type: Detected SMILES file(s)
2017-07-17 00:11:05,748|INFO|Input file number: 1
2017-07-17 00:11:05,748|INFO|Parallel Type: processes
2017-07-17 00:11:05,748|INFO|Out Directory: ./
2017-07-17 00:11:05,749|INFO|Overwrite Existing Files: False
2017-07-17 00:11:05,749|INFO|Target Conformer Number: 3
2017-07-17 00:11:05,749|INFO|First Conformers Number: all
2017-07-17 00:11:05,749|INFO|Pool Multiplier: 1
2017-07-17 00:11:05,749|INFO|RMSD Cutoff: 0.5
2017-07-17 00:11:05,749|INFO|Maximum Energy Difference: None
2017-07-17 00:11:05,749|INFO|Forcefield: UFF
2017-07-17 00:11:05,749|INFO|Starting.
2017-07-17 00:11:05,779|INFO|Generating conformers for caffeine.
2017-07-17 00:11:05,823|INFO|Generated 1 conformers for caffeine.
2017-07-17 00:11:05,829|INFO|Saved conformers for caffeine to ./caffeine.sdf.bz2.

The result is a multi-conformer SDF file called caffeine.sdf.bz2 in the current directory.

Fingerprinting CLI

To see all available options, run

$ python $E3FP_REPO/e3fp/fingerprint/generate.py --help
usage: Generate E3FP fingerprints from SDF files. [-h] [-b BITS]
                                                  [--first FIRST] [-m LEVEL]
                                                  [-r RADIUS_MULTIPLIER]
                                                  [--stereo STEREO]
                                                  [--counts COUNTS]
                                                  [--params PARAMS]
                                                  [-o OUT_DIR_BASE]
                                                  [--out_ext {.fp.pkl,.fp.gz,.fp.bz2}]
                                                  [-d DB_FILE] [--all_iters]
                                                  [-O] [-l LOG] [-p NUM_PROC]
                                                  [--parallel_mode {mpi,processes,threads,serial}]
                                                  [-v]
                                                  sdf_files [sdf_files ...]

positional arguments:
  sdf_files             Path to SDF file(s), each with one molecule and
                        multiple conformers.

optional arguments:
  -h, --help            show this help message and exit
  -b BITS, --bits BITS  Set number of bits for final folded fingerprint. If -1
                        or None, unfolded (2^32-bit) fingerprints are
                        generated. (default: 4294967296)
  --first FIRST         Set maximum number of first conformers for which to
                        generate fingerprints. (default: 3)
  -m LEVEL, --level LEVEL, --max_iterations LEVEL
                        Maximum number of iterations for fingerprint
                        generation. If -1, fingerprinting is run until
                        termination, and `all_iters` is set to False.
                        (default: 5)
  -r RADIUS_MULTIPLIER, --radius_multiplier RADIUS_MULTIPLIER, --shell_radius RADIUS_MULTIPLIER
                        Distance to increment shell radius at around each
                        atom, starting at 0.0. (default: 1.718)
  --stereo STEREO       Differentiate by stereochemistry. (default: True)
  --counts COUNTS       Store counts-based E3FC instead of default bit-based.
                        (default: False)
  --params PARAMS       INI formatted file with parameters. If provided, all
                        parameters controlling conformer generation are
                        ignored. (default: None)
  -o OUT_DIR_BASE, --out_dir_base OUT_DIR_BASE
                        Basename for output directory to save fingerprints.
                        Iteration number is appended to basename. (default:
                        None)
  --out_ext {.fp.pkl,.fp.gz,.fp.bz2}
                        Extension for fingerprint pickles. (default: .fp.bz2)
  -d DB_FILE, --db_file DB_FILE
                        Output file containing FingerprintDatabase object
                        (default: fingerprints.fpz)
  --all_iters           Save fingerprints from all iterations to file(s).
                        (default: False)
  -O, --overwrite       Overwrite existing file(s). (default: False)
  -l LOG, --log LOG     Log filename. (default: None)
  -p NUM_PROC, --num_proc NUM_PROC
                        Set number of processors to use. (default: None)
  --parallel_mode {mpi,processes,threads,serial}
                        Set parallelization mode to use. (default: None)
  -v, --verbose         Run with extra verbosity. (default: False)

To continue the above example, we will fingerprint our caffeine conformers.

$ python $E3FP_REPO/e3fp/fingerprint/generate.py caffeine.sdf.bz2 --bits 1024
2017-07-17 00:12:33,797|WARNING|Only 1 processes available. 'mpi' mode not available.
2017-07-17 00:12:33,801|INFO|num_proc is not specified. 'processes' mode will use all 8 processes
2017-07-17 00:12:33,801|INFO|Parallelizer initialized with mode 'processes' and 8 processors.
2017-07-17 00:12:33,801|INFO|Initializing E3FP generation.
2017-07-17 00:12:33,801|INFO|Getting SDF files
2017-07-17 00:12:33,801|INFO|SDF File Number: 1
2017-07-17 00:12:33,802|INFO|Database File: fingerprints.fpz
2017-07-17 00:12:33,802|INFO|Max First Conformers: 3
2017-07-17 00:12:33,802|INFO|Bits: 1024
2017-07-17 00:12:33,802|INFO|Level/Max Iterations: 5
2017-07-17 00:12:33,802|INFO|Shell Radius Multiplier: 1.718
2017-07-17 00:12:33,802|INFO|Stereo Mode: True
2017-07-17 00:12:33,802|INFO|Connected-only mode: on
2017-07-17 00:12:33,802|INFO|Invariant type: Daylight
2017-07-17 00:12:33,802|INFO|Parallel Mode: processes
2017-07-17 00:12:33,802|INFO|Starting
2017-07-17 00:12:33,829|INFO|Generating fingerprints for caffeine.
2017-07-17 00:12:33,935|INFO|Generated 1 fingerprints for caffeine.
2017-07-17 00:12:34,011|INFO|Saved FingerprintDatabase with fingerprints to fingerprints.fpz

The result is a file fingerprints.fpz containing a FingerprintDatabase. To use such a database, consult Fingerprint Storage.