e3fp.fingerprint.metrics.array_metrics module

Fingerprint array comparison metrics.

Each is fully compatible with both dense and sparse inputs.

Author: Seth Axen E-mail: seth.axen@gmail.com

cosine(X, Y=None, assume_binary=False)[source]

Compute the Cosine similarities between X and Y.

Parameters
  • X (array_like or sparse matrix) – with shape (n_fprints_X, n_bits).

  • Y (array_like or sparse matrix, optional) – with shape (n_fprints_Y, n_bits).

  • assume_binary (bool, optional) – Assume data is binary (results in efficiency boost). If data is not binary, the result will be incorrect.

Returns

cosine

Return type

array of shape (n_fprints_X, n_fprints_Y)

See also

dice, soergel, tanimoto

dice(X, Y=None)[source]

Compute the Dice coefficients between X and Y.

Data must be binary. This is not checked.

Parameters
  • X (array_like or sparse matrix) – with shape (n_fprints_X, n_bits).

  • Y (array_like or sparse matrix, optional) – with shape (n_fprints_Y, n_bits).

Returns

dice

Return type

array of shape (n_fprints_X, n_fprints_Y)

pearson(X, Y=None)[source]

Compute the Pearson correlation between X and Y.

Parameters
  • X (array_like or sparse matrix) – with shape (n_fprints_X, n_bits).

  • Y (array_like or sparse matrix, optional) – with shape (n_fprints_Y, n_bits).

Returns

pearson

Return type

array of shape (n_fprints_X, n_fprints_Y)

See also

soergel

Soergel similarity for non-binary data

cosine, dice, tanimoto

soergel(X, Y=None)[source]

Compute the Soergel similarities between X and Y.

Soergel similarity is the complement of Soergel distance and can be thought of as the analog of the Tanimoto coefficient for count/float-based data. For binary data, it is equivalent to the Tanimoto coefficient.

Parameters
  • X (array_like or sparse matrix) – with shape (n_fprints_X, n_bits).

  • Y (array_like or sparse matrix, optional) – with shape (n_fprints_Y, n_bits).

Returns

soergel

Return type

array of shape (n_fprints_X, n_fprints_Y)

Notes

If Numba is available, this function is jit-compiled and much more efficient.

See also

tanimoto

A fast version of this function for binary data.

pearson

Pearson correlation, also appropriate for non-binary data.

cosine, dice

tanimoto(X, Y=None)[source]

Compute the Tanimoto coefficients between X and Y.

Data must be binary. This is not checked.

Parameters
  • X (array_like or sparse matrix) – with shape (n_fprints_X, n_bits).

  • Y (array_like or sparse matrix, optional) – with shape (n_fprints_Y, n_bits).

Returns

tanimoto

Return type

array of shape (n_fprints_X, n_fprints_Y)

See also

soergel

Analog to Tanimoto for non-binary data.

cosine, dice, pearson