Fingerprint Comparison¶
The e3fp.fingerprint.metrics
sub-package provides several useful methods for
batch comparison of fingerprints in various representations.
Fingerprint Metrics¶
These metrics operate directly on pairs of Fingerprint
and
FingerprintDatabase
objects or on a combination of each. If
only a single variable is specified, self-comparison is performed. The
implemented methods are common functions for fingerprint similarity in the
literature.
Array Metrics¶
To efficiently compare fingerprint databases above, we provide comparison metrics that can operate directly on the internal sparse matrix representation without the need to “densify it”. We describe these here, as they have several additional features.
The array metrics implemented in e3fp.fingerprint.metrics.array_metrics
are
implemented such that they may take any combination of dense and sparse inputs.
Additionally, they are designed to function as
scikit-learn-compatible kernels
for machine learning tasks. For example, one might perform an analysis using a
support vector machine (SVM) and Tanimoto kernel.
>>> from sklearn.svm import SVC
>>> from e3fp.fingerprint.metrics.array_metrics import tanimoto
>>> clf = SVC(kernel=tanimoto)
>>> clf.fit(X, y)
...
>>> clf.predict(test)
...
Most common fingerprint comparison metrics only apply to binary fingerprints. We include several that operate equally well on count- and float-based fingerprints. For example, to our knowledge, we provide the only open source implementation of Soergel similarity, the analog to the Tanimoto coefficient for non-binary fingerprints that can efficiently operate on sparse inputs.
>>> from e3fp.fingerprint.metrics.array_metrics import soergel
>>> clf = SVC(kernel=soergel)
>>> clf.fit(X, y)
...
>>> clf.predict(test)
...