Welcome to Procrustes’s Documentation!
Procrustes is a free, open-source, and cross-platform Python library for (generalized) Procrustes problems with the goal of finding the optimal transformation(s) that makes two matrices as close as possible to each other. Please use the following citation in any publication using Procrustes library:
“Procrustes: A Python Library to Find Transformations that Maximize the Similarity Between Matrices”, F. Meng, M. Richer, A. Tehrani, J. La, T. D. Kim, P. W. Ayers, F. Heidar-Zadeh, Computer Physics Communications, 276(108334), 2022..
The Procrustes source code is hosted on GitHub and is released under the GNU General Public License v3.0. We welcome any contributions to the Procrustes library in accordance with our Code of Conduct; please see our Contributing Guidelines. Please report any issues you encounter while using Procrustes library on GitHub Issues. For further information and inquiries please contact us at qcdevs@gmail.com.
Description of Procrustes Methods
Procrustes problems arise when one wishes to find one or two transformations, \(\mathbf{T} \in \mathbb{R}^{n \times n}\) and \(\mathbf{S} \in \mathbb{R}^{m \times m}\), that make matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\) (input matrix) resemble matrix \(\mathbf{B} \in \mathbb{R}^{m \times n}\) (target or reference matrix) as closely as possible:
where, the \(\| \cdot \|_{F}\) denotes the Frobenius norm defined as,
Here \(a_{ij}\) and \(\text{Tr}(\mathbf{A})\) denote the \(ij\)-th element and trace of matrix \(\mathbf{A}\), respectively. When \(\mathbf{S}\) is an identity matrix, this is called a one-sided Procrustes problem, and when it is equal to \(\mathbf{T}\), this becomes two-sided Procrustes problem with one transformation, otherwise, it is called two-sided Procrustes problem. Different Procrustes problems use different choices for the transformation matrices \(\mathbf{S}\) and \(\mathbf{T}\) which are commonly taken to be orthogonal/unitary matrices, rotation matrices, symmetric matrices, or permutation matrices. The table below summarizes various Procrustes methods supported:
Procrustes Type |
\(\mathbf{S}\) |
\(\mathbf{T}\) |
Constraints |
\(\mathbf{I}\) |
\(\mathbf{T}\) |
None |
|
\(\mathbf{I}\) |
\(\mathbf{Q}\) |
\({\mathbf{Q}^{-1} = {\mathbf{Q}}^\dagger}\) |
|
\(\mathbf{I}\) |
\(\mathbf{R}\) |
\(\begin{cases} \mathbf{R}^{-1} = {\mathbf{R}}^\dagger \\ \left | \mathbf{R} \right | = 1 \\ \end{cases}\) |
|
\(\mathbf{I}\) |
\(\mathbf{X}\) |
\(\mathbf{X} = \mathbf{X}^\dagger\) |
|
\(\mathbf{I}\) |
\(\mathbf{P}\) |
\(\begin{cases} [\mathbf{P}]_{ij} \in \{0, 1\} \\ \sum_{i=1}^n [\mathbf{P}]_{ij} = \sum_{j=1}^n [\mathbf{P}]_{ij} = 1 \\ \end{cases}\) |
|
\(\mathbf{Q}_1^\dagger\) |
\(\mathbf{Q}_2\) |
\(\begin{cases} \mathbf{Q}_1^{-1} = \mathbf{Q}_1^\dagger \\ \mathbf{Q}_2^{-1} = \mathbf{Q}_2^\dagger \\ \end{cases}\) |
|
\(\mathbf{Q}^{\dagger}\) |
\(\mathbf{Q}\) |
\(\mathbf{Q}^{-1} = \mathbf{Q}^\dagger\) |
|
\(\mathbf{P}_1^{\dagger}\) |
\(\mathbf{P}_2\) |
\(\begin{cases} [\mathbf{P}_1]_{ij} \in \{0, 1\} \\ [\mathbf{P}_2]_{ij} \in \{0, 1\} \\ \sum_{i=1}^n [\mathbf{P}_1]_{ij} = \sum_{j=1}^n [\mathbf{P}_1]_{ij} = 1 \\ \sum_{i=1}^n [\mathbf{P}_2]_{ij} = \sum_{j=1}^n [\mathbf{P}_2]_{ij} = 1 \\ \end{cases}\) |
|
\(\mathbf{P}^{\dagger}\) |
\(\mathbf{P}\) |
\(\begin{cases} [\mathbf{P}]_{ij} \in \{0, 1\} \\ \sum_{i=1}^n [\mathbf{P}]_{ij} = \sum_{j=1}^n [\mathbf{P}]_{ij} = 1 \\ \end{cases}\) |
In addition to these Procrustes methods, summarized in the table above, the generalized Procrustes analysis (GPA) [15][1][16][17][18] and softassign algorithm [19][20][21] are also implemented in our package. The GPA algorithm seeks the optimal transformation matrices \(\mathbf{T}\) to superpose the given objects (usually more than 2) with minimum distance,
where \(\mathbf{A}_i\) and \(\mathbf{A}_j\) are the configurations and \(\mathbf{T}_i\) and \(\mathbf{T}_j\) denotes the transformation matrices for \(\mathbf{A}_i\) and \(\mathbf{A}_j\) respectively. When only two objects are given, the problem shrinks to generic Procrustes.
The softassign algorithm was first proposed to deal with quadratic assignment problem [19] inspired by statistical physics algorithms and has subsequently been developed theoretically [20][21] and extended to many other applications [22][20][23][24][25]. Because the two-sided permutation Procrustes problem is a special case of the quadratic assignment problem, it can be used here. The objective function is to minimize \(E_{qap} (\mathbf{M}, \mu, \nu)\), [20][26], which is defined as follows,
Procrustes problems arise when aligning molecules and other objects, when evaluating optimal basis transformations, when determining optimal mappings between sets, and in many other contexts. This package includes the options to translate, scale, and zero-pad matrices, so that matrices with different centers/scaling/sizes can be considered.
Installation
Downloading Code
The latest code can be obtained through theochem (https://github.com/theochem/procrustes) in Github,
git clone git@github.com:theochem/procrustes.git
Dependencies
The following dependencies will be necessary for Procrustes to build properly,
Python >= 3.6: https://www.python.org/
SciPy >= 1.5.0: https://www.scipy.org/
NumPy >= 1.18.5: https://www.numpy.org/
Pip >= 19.0: https://pip.pypa.io/
PyTest >= 5.4.3: https://docs.pytest.org/
PyTest-Cov >= 2.8.0: https://pypi.org/project/pytest-cov/
Sphinx >= 2.3.0, if one wishes to build the documentation locally: https://www.sphinx-doc.org/
Installation
The stable release of the package can be easily installed through the pip and conda package management systems, which install the dependencies automatically, if not available. To use pip, simply run the following command:
pip install qc-procrustes
To use conda, one can either install the package through Anaconda Navigator or run the following command in a desired conda environment:
conda install -c theochem qc-procrustes
Alternatively, the Procrustes source code can be download from GitHub (either the stable version or the development version) and then installed from source. For example, one can download the latest source code using git by:
# download source code
git clone git@github.com:theochem/procrustes.git
cd procrustes
From the parent directory, the dependencies can either be installed using pip by:
# install dependencies using pip
pip install -r requirements.txt
or, through conda by:
# create and activate myenv environment
# Procruste works with Python 3.6, 3.7, and 3.8
conda create -n myenv python=3.6
conda activate myenv
# install dependencies using conda
conda install --yes --file requirements.txt
Finally, the Procrustes package can be installed (from source) by:
# install Procrustes from source
pip install .
Testing
To make sure that the package is installed properly, the Procrustes tests should be executed using pytest from the parent directory:
# testing without coverage report
pytest -v .
In addition, to generate a coverage report alongside testing, one can use:
# testing with coverage report
pytest --cov-config=.coveragerc --cov=procrustes procrustes/test
Quick Start
Orthogonal Procrustes
Given matrix \(\mathbf{A}_{m \times n}\) and a reference matrix \(\mathbf{B}_{m\times n}\), find the orthogonal transformation matrix \(\mathbf{Q}_{n\times n}\) that makes \(\mathbf{AQ}\) as close as possible to \(\mathbf{B}\), i.e.,
\begin{equation} \underbrace{\min}_{\left\{\mathbf{Q} | \mathbf{Q}^{-1} = {\mathbf{Q}}^\dagger \right\}} \|\mathbf{A}\mathbf{Q} - \mathbf{B}\|_{F}^2 \end{equation}
The code block below gives an example of the orthogonal Procrustes problem for random matrices \(\mathbf{A}\) and \(\mathbf{B}\). Here, matrix \(\mathbf{B}\) is constructed by shifting an orthogonal transformation of matrix \(\mathbf{A}\), so the matrices can be perfectly matched. As is the case with all Procrustes flavours, the user can specify whether the matrices should be translated (so that both are centered at origin) and/or scaled (so that both are normalized to unity with respect to the Frobenius norm). In addition, the other optional arguments (not appearing in the code-block below) specify whether the zero columns (on the right-hand side) and rows (at the bottom) should be removed prior to transformation.
[1]:
import numpy as np
from scipy.stats import ortho_group
from procrustes import orthogonal
# random input 10x7 matrix A
a = np.random.rand(10, 7)
# random orthogonal 7x7 matrix T (acting as Q)
t = ortho_group.rvs(7)
# target matrix B (which is a shifted AT)
b = np.dot(a, t) + np.random.rand(1, 7)
# orthogonal Procrustes analysis with translation
result = orthogonal(a, b, scale=True, translate=True)
# compute transformed matrix A (i.e., A x Q)
aq = np.dot(result.new_a, result.t)
# display Procrustes results
print("Procrustes Error = ", result.error)
print("\nDoes the obtained transformation match variable t? ", np.allclose(t, result.t))
print("Does AQ and B matrices match?", np.allclose(aq, result.new_b))
print("Transformation Matrix T = ")
print(result.t)
print("")
print("Matrix A (after translation and scaling) = ")
print(result.new_a)
print("")
print("Matrix AQ = ")
print(aq)
print("")
print("Matrix B (after translation and scaling) = ")
print(result.new_b)
Procrustes Error = 2.4289236009459004e-30
Does the obtained transformation match variable t? True
Does AQ and B matrices match? True
Transformation Matrix T =
[[-0.05485823 -0.21987681 0.71054591 -0.25910115 0.54815253 -0.11057903
-0.25285756]
[-0.52504088 0.57359009 0.08125893 -0.38628239 -0.16691215 0.36377965
-0.28162755]
[-0.13427477 -0.35050672 0.5114134 -0.07667139 -0.71683955 0.04718091
0.27496942]
[ 0.19650197 0.65967521 0.42739622 0.33295066 0.08143714 -0.10148903
0.46449961]
[ 0.45237142 -0.06057057 -0.04420777 -0.48171449 0.15466827 0.61863151
0.38866554]
[-0.0206257 0.12956287 -0.16552502 -0.64649468 -0.03687109 -0.66740601
0.30107121]
[ 0.67794881 0.21015927 0.12230059 -0.13005245 -0.35481889 -0.12155183
-0.56892541]]
Matrix A (after translation and scaling) =
[[-0.00055723 -0.01377505 -0.08991782 -0.16477186 0.14750515 0.00704046
-0.20861997]
[ 0.13709756 -0.02863268 -0.04751636 0.07709446 -0.14385904 0.10799371
-0.0032753 ]
[-0.16853924 -0.17772675 0.06648402 -0.16374724 0.04464136 0.09435058
0.03478612]
[ 0.11811112 -0.01507476 -0.06673255 0.139217 -0.0976698 -0.11659321
-0.03462358]
[-0.15903118 -0.01484409 -0.16695146 0.11677978 -0.19205803 0.01023694
0.10550636]
[ 0.02382673 -0.01263136 0.05182648 -0.10599329 -0.08805356 -0.07338563
0.00990156]
[-0.16720248 -0.04508176 0.03784015 0.10676988 0.04590812 0.00066062
-0.03174471]
[-0.06525965 0.14948796 0.07651695 0.02690549 -0.13166814 -0.24761947
-0.19722096]
[ 0.12908648 0.16851992 -0.01048313 -0.21501857 0.21002468 0.11819247
0.14823474]
[ 0.15246788 -0.01024143 0.14893372 0.18276434 0.20522923 0.09912353
0.17705575]]
Matrix AQ =
[[-0.08789303 -0.13682353 -0.15112392 -0.09097679 0.14960896 0.1194413
0.08089827]
[-0.04048379 0.04296137 0.09182028 0.00475755 0.09519934 -0.19631547
-0.02539345]
[ 0.10328743 -0.17937652 -0.1835174 -0.03432131 -0.13263092 -0.06584305
0.06085581]
[-0.02749881 0.06414488 0.12745368 0.15361799 0.12791064 -0.01422005
-0.03266849]
[ 0.04631795 0.19713971 -0.12997567 0.17079912 -0.02302644 -0.14601308
-0.07885918]
[-0.05406852 -0.10266454 0.01435798 0.04801396 -0.04504059 -0.00072564
-0.09940151]
[ 0.04797432 0.05870918 -0.06350457 0.07497127 -0.08421826 0.02485656
0.1510763 ]
[-0.26805587 0.02546819 0.03909608 0.21141601 -0.05464003 0.17025779
-0.00558307]
[ 0.05666167 -0.03614484 -0.00256305 -0.36619058 0.00816506 0.1013846
-0.14997974]
[ 0.22375865 0.06658609 0.25795658 -0.17208723 -0.04132777 0.00717706
0.09905505]]
Matrix B (after translation and scaling) =
[[-0.08789303 -0.13682353 -0.15112392 -0.09097679 0.14960896 0.1194413
0.08089827]
[-0.04048379 0.04296137 0.09182028 0.00475755 0.09519934 -0.19631547
-0.02539345]
[ 0.10328743 -0.17937652 -0.1835174 -0.03432131 -0.13263092 -0.06584305
0.06085581]
[-0.02749881 0.06414488 0.12745368 0.15361799 0.12791064 -0.01422005
-0.03266849]
[ 0.04631795 0.19713971 -0.12997567 0.17079912 -0.02302644 -0.14601308
-0.07885918]
[-0.05406852 -0.10266454 0.01435798 0.04801396 -0.04504059 -0.00072564
-0.09940151]
[ 0.04797432 0.05870918 -0.06350457 0.07497127 -0.08421826 0.02485656
0.1510763 ]
[-0.26805587 0.02546819 0.03909608 0.21141601 -0.05464003 0.17025779
-0.00558307]
[ 0.05666167 -0.03614484 -0.00256305 -0.36619058 0.00816506 0.1013846
-0.14997974]
[ 0.22375865 0.06658609 0.25795658 -0.17208723 -0.04132777 0.00717706
0.09905505]]
Accessing Scaling- and Translation-Related Information
In order to access the scaling factor of the initial two input matrices, we can use _scale_array
function in utils
module.
[2]:
# accessing the scaling factor when preparing two input matrices
from procrustes.utils import _scale_array
# array_a is scaled to match array_b's norm; after scaling the norm of $a$ equals the norm of $b$
scaled_a, scaling_factor = _scale_array(a, b)
# print(f"The scaled matrix $a$ is {scaled_a}.")
print(f"The scaling factor is {scaling_factor}.")
The scaling factor is 0.8623410931853571.
The user may also want to get the translation matrix or the translated matrix. This can be achieved by using _translate_array
function in the utils
module.
[3]:
# accessing the translation matrix
from procrustes.utils import _translate_array
# array_a is translated to centroid of array_b
# after translation, the centroid of $a$ will match the centroid of $b$
trans_array_a, centroid = _translate_array(a, b)
# print(f"The translated array $a$ is {trans_array_a}.")
print(f"The centroid is {centroid}.")
The centroid is [-0.36907112 -1.20438156 -0.25281312 -0.3612893 -1.00236417 -0.150441
-0.93179808].
The corresponding file can be obtained from:
Jupyter Notebook:
Quick_Start.ipynb
Tutorials
In order to show how to use Procrustes, we have implemented some practical examples.
Chemical Structure Alignment
Molecular alignment is a fundamental problem in cheminformatics and can be used for structure determination, similarity based searching, and ligand-based drug design. This problem can be formulated as an orthogonal Procrustes problem where the two matrices represent three-dimensional Cartesian coordinates of molecules.
Orthogonal Procrustes
Given matrix \(\mathbf{A}_{m \times n}\) and a reference \(\mathbf{B}_{m \times n}\), find the find the orthogonal transformation matrix \(\mathbf{Q}_{n \times n}\) that makes \(\mathbf{A}\) as close as possible to \(\mathbf{B}\), i.e.,
\begin{equation} \underbrace{\min}_{\left\{\mathbf{Q} | \mathbf{Q}^{-1} = {\mathbf{Q}}^\dagger \right\}} \|\mathbf{A}\mathbf{Q} - \mathbf{B}\|_{F}^2 \end{equation}
In the code block below, we use the procrustes
library for protein structure alignment. We use the 2HHB, which has cyclic-\(C_2\) global symmetry, and load its PDB file using IOData library to obtain the 3D-Cartesian coordinates atoms. In 2HHB, the A and C (or B and D) are hemoglobin deoxy-alpha (beta) chains shown in Fig. (i), and their \(C_{\alpha}\) atoms in Fig. (ii) are aligned in Fig. (iii) to show that they are homologous.
The results in Fig. (iii) are obtained with orthogonal Procrustes which implements the Kabsch algorithm. The root-mean-square deviation (RMSD) is used to assess the discrepancy between structures before and after the translation-rotation transformation.
Download IOData & Matplotlib Libraries & Example Files
Install IOData library and Matplotlib, if there are not available on your system; this is required on Binder.
Download 2hhb.pdb file used in the example below which is stored in Procrustes GitHub repository example files.
[ ]:
# If needed, install IOData library (this is required on Binder)
import sys
# See https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/
!{sys.executable} -m pip install qc-iodata
!{sys.executable} -m pip install matplotlib
[ ]:
# If needed, download the example files
import os
from urllib.request import urlretrieve
fpath = "notebook_data/chemical_strcuture_alignment/"
if not os.path.exists(fpath):
os.makedirs(fpath, exist_ok=True)
urlretrieve(
"https://raw.githubusercontent.com/theochem/procrustes/master/doc/notebooks/notebook_data/chemical_strcuture_alignment/2hhb.pdb",
os.path.join(fpath, "2hhb.pdb")
)
Procrustes Analysis
[1]:
# chemical structure alignment with orthogonal Procrustes
from pathlib import Path
import numpy as np
from iodata import load_one
from iodata.utils import angstrom
from procrustes import rotational
# load PDB
pdb = load_one(Path("notebook_data/chemical_strcuture_alignment/2hhb.pdb"))
# get coordinates of C_alpha atoms in chains A & C (in angstrom)
chainid = pdb.extra['chainids']
attypes = pdb.atffparams['attypes']
# alpha carbon atom coordinates in chain A
ca_a = pdb.atcoords[(chainid == 'A') & (attypes == 'CA')] / angstrom
# alpha carbon atom coordinates in chain A
ca_c = pdb.atcoords[(chainid == 'C') & (attypes == 'CA')] / angstrom
# compute root-mean-square deviation of original chains A and C
rmsd_before = np.sqrt(np.mean(np.sum((ca_a - ca_c)**2, axis=1)))
print("RMSD of initial coordinates:", rmsd_before)
# rotational Procrustes analysis
result = rotational(ca_a, ca_c, translate=True)
# compute transformed (translated & rotated) coordinates of chain A
ca_at = np.dot(result.new_a, result.t)
# compute root-mean-square deviation of trainsformed chains A and C
rmsd_after = np.sqrt(np.mean(np.sum((ca_at - result.new_b)**2, axis=1)))
print("RMSD of transformed coordinates:", rmsd_after)
RMSD of initial coordinates: 39.46851987559469
RMSD of transformed coordinates: 0.23003870483785113
Plot Procrustes Results
[2]:
# Plot outputs of Procrustes
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(12, 10))
# =============
# First Subplot
# =============
# set up the axes for the first plot
ax = fig.add_subplot(1, 2, 1, projection='3d')
# coordinates of chains A & C (before alignment)
coords1, coords2 = ca_a, ca_c
title = "Original Chains: RMSD={:0.2f} $\AA$".format(rmsd_before)
ax.scatter(xs=coords1[:, 0], ys=coords1[:, 1], zs=coords1[:, 2],
marker="o", color="blue", s=55, label="Chain A")
ax.scatter(xs=coords2[:, 0], ys=coords2[:, 1], zs=coords2[:, 2],
marker="o", color="red", s=55, label="Chain C")
ax.set_xlabel("X/$\AA$", fontsize=14)
ax.set_ylabel("Y/$\AA$", fontsize=14)
ax.set_zlabel("Z/$\AA$", fontsize=14)
ax.legend(fontsize=12, loc="best")
plt.title(title, fontsize=14)
# ==============
# Second Subplot
# ==============
# set up the axes for the second plot
ax = fig.add_subplot(1, 2, 2, projection='3d')
# coordinates of chains A & C after translation and rotation
coords1, coords2 = ca_at, result.new_b
title="Aligned Chains: RMSD={:0.2f} $\AA$".format(rmsd_after)
ax.scatter(xs=coords1[:, 0], ys=coords1[:, 1], zs=coords1[:, 2],
marker="o", color="blue", s=55, label="Chain A")
ax.scatter(xs=coords2[:, 0], ys=coords2[:, 1], zs=coords2[:, 2],
marker="o", color="red", s=55, label="Chain C")
ax.set_xlabel("X/$\AA$", fontsize=14)
ax.set_ylabel("Y/$\AA$", fontsize=14)
ax.set_zlabel("Z/$\AA$", fontsize=14)
ax.legend(fontsize=12, loc="best")
plt.title(title, fontsize=14)
plt.show()

The corresponding file can be obtained from:
Jupyter Notebook:
Chemical_Structure_Alignment.ipynb
Chirality Check
In chemistry, a molecule is chiral if it cannot be superimposed onto its mirror image by any combination of translation and rotation. These non-superposable mirror images are called enantiomers which share identical chemical and physical properties, but have distinct chemical reactivity and optical rotation properties. Checking whether two structures are enantiomers can be formulated as a Procrustes problem.
Rotational Procrustes
Given matrix \(\mathbf{A}_{m \times n}\) and a reference \(\mathbf{B}_{m \times n}\), find the rotational transformation matrix \(\mathbf{R}_{n \times n}\) that makes \(\mathbf{A}\) as close as possible to \(\mathbf{B}\), i.e.,
\begin{equation} \underbrace{\min}_{\left\{\mathbf{R} \left| {\mathbf{R}^{-1} = {\mathbf{R}}^\dagger \atop \left| \mathbf{R} \right| = 1} \right. \right\}} \|\mathbf{A}\mathbf{R} - \mathbf{B}\|_{F}^2 \end{equation}Orthogonal Procrustes
Given matrix \(\mathbf{A}_{m \times n}\) and a reference \(\mathbf{B}_{m \times n}\), find the find the orthogonal transformation matrix \(\mathbf{Q}_{n \times n}\) that makes \(\mathbf{A}\) as close as possible to \(\mathbf{B}\), i.e.,
\begin{equation} \underbrace{\min}_{\left\{\mathbf{Q} | \mathbf{Q}^{-1} = {\mathbf{Q}}^\dagger \right\}} \|\mathbf{A}\mathbf{Q} - \mathbf{B}\|_{F}^2 \end{equation}
In the code block below, we use the procrustes
library to check whether two geometries of the CHFClBr molecule are enantiomers; see Fig. (i). The 3D-Cartesian coordinates of molecules are loaded from their XYZ files using IOData library. Testing whether the coordinates can be matched through translation and rotation (i.e., rotational Procrustes) reveals that these two structures are not identical; see Fig (ii). However, the two coordinates are
enantiomers because they can be matched through translation, rotation, and reflection (i.e., orthogonal Procrustes) as shown in Fig (iii).
Download IOData & Matplotlib Libraries & Example Files
Install IOData Package and Matplotlib, if there are not available on your system; this is required on Binder.
Download enantiomer1.xyz and enantiomer2.xyz files used in the example below which are stored in Procrustes GitHub repository example files.
[ ]:
# If needed, install IOData library (this is required on Binder)
import sys
# See https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/
!{sys.executable} -m pip install qc-iodata
!{sys.executable} -m pip install matplotlib
[ ]:
# If needed, download the example files
import os
import urllib.request
fpath = "notebook_data/chirality_checking/"
if not os.path.exists(fpath):
os.makedirs(fpath, exist_ok=True)
urllib.request.urlretrieve(
"https://raw.githubusercontent.com/theochem/procrustes/master/doc/notebooks/notebook_data/chirality_checking/enantiomer1.xyz",
os.path.join(fpath, "enantiomer1.xyz")
)
urllib.request.urlretrieve(
"https://raw.githubusercontent.com/theochem/procrustes/master/doc/notebooks/notebook_data/chirality_checking/enantiomer2.xyz",
os.path.join(fpath, "enantiomer2.xyz")
)
Procrustes Analysis
[1]:
# chirality check with rotational and orthogonal Procrustes
from pathlib import Path
import numpy as np
from iodata import load_one
from procrustes import orthogonal, rotational
# load CHClFBr enantiomers' coordinates from XYZ files
a = load_one(Path("notebook_data/chirality_checking/enantiomer1.xyz")).atcoords
b = load_one(Path("notebook_data/chirality_checking/enantiomer2.xyz")).atcoords
# rotational Procrustes on a & b coordinates
result_rot = rotational(a, b, translate=True, scale=False)
print("Rotational Procrustes Error = ", result_rot.error) # output: 26.085545
# orthogonal Procrustes on a & b coordinates
result_ortho = orthogonal(a, b, translate=True, scale=False)
print("Orthogonal Procrustes Error = ", result_ortho.error) # output: 4.432878e-08
Rotational Procrustes Error = 26.08554575402178
Orthogonal Procrustes Error = 4.432878638501341e-08
Plot Procrustes Results
[2]:
# Plot outputs of Procrustes
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(12, 10))
# =============
# First Subplot
# =============
# set up the axes for the first plot
ax = fig.add_subplot(1, 2, 1, projection='3d')
# coordinates of rotated molecule A and molecule B
a_rot = np.dot(result_rot.new_a, result_rot.t)
coords1, coords2 = a_rot, result_rot.new_b
title = "Rotational Procrustes Error={:0.2f} $\AA$".format(result_rot.error)
ax.scatter(xs=coords1[:, 0], ys=coords1[:, 1], zs=coords1[:, 2],
marker="o", color="blue", s=55, label="enantiomer1 with rotation")
ax.scatter(xs=coords2[:, 0], ys=coords2[:, 1], zs=coords2[:, 2],
marker="o", color="red", s=55, label="enantiomer2")
ax.set_xlabel("X/$\AA$", fontsize=14)
ax.set_ylabel("Y/$\AA$", fontsize=14)
ax.set_zlabel("Z/$\AA$", fontsize=14)
ax.legend(fontsize=12, loc="best")
plt.title(title, fontsize=14)
# ==============
# Second Subplot
# ==============
# set up the axes for the second plot
ax = fig.add_subplot(1, 2, 2, projection='3d')
# coordinates of rotated-and-refelcted molecule A and molecule B
a_rot = np.dot(result_ortho.new_a, result_ortho.t)
coords1, coords2 = a_rot, result_rot.new_b
title="Orthogonal Procrustes Error={:0.2f} $\AA$".format(result_ortho.error)
ax.scatter(xs=coords1[:, 0], ys=coords1[:, 1], zs=coords1[:, 2],
marker="o", color="blue", s=55, label="enantiomer1 with rotation and reflection")
ax.scatter(xs=coords2[:, 0], ys=coords2[:, 1], zs=coords2[:, 2],
marker="o", color="red", s=55, label="enantiomer2")
ax.set_xlabel("X/$\AA$", fontsize=14)
ax.set_ylabel("Y/$\AA$", fontsize=14)
ax.set_zlabel("Z/$\AA$", fontsize=14)
ax.legend(fontsize=12, loc="best")
plt.title(title, fontsize=14)
plt.show()

The corresponding file can be obtained from:
Jupyter Notebook:
Chirality_Check.ipynb
Atom-Atom Mapping
Given two molecular structures, it is important to identify atoms that are chemically similar. This a commonly used in 3D-QSAR pharmacore analysis, substructure searching, metabolic pathway identification, and chemical machine learning. This problem can be formulated as a 2-sided permutation Procrustes with single transformation.
Permutation Procrustes 2-Sided with Single-Transformation
Given matrix \(\mathbf{A}_{n \times n}\) and a reference \(\mathbf{B}_{n \times n}\), find a permutation of rows/columns of \(\mathbf{A}_{n \times n}\) that makes it as close as possible to \(\mathbf{B}_{n \times n}\), i.e.,
\begin{equation} \underbrace{\text{min}}_{\left\{\mathbf{P} \left| {p_{ij} \in \{0, 1\} \atop \sum_{i=1}^n p_{ij} = \sum_{j=1}^n p_{ij} = 1} \right. \right\}} \|\mathbf{P}^\dagger \mathbf{A} \mathbf{P} - \mathbf{B}\|_{F}^2 \\ \end{equation}
In the code block below, we use the procrustes
library to map atoms of but-1-en-3-yne (molecule A) and 3,3-dimethylpent-1-en-4-yne (molecule B) in Fig. (i). Based on our chemical intuition, we can tell that the triple and double bonds of the molecules “match”; however, simple (geometric) molecular alignment based on three-dimensional coordinates does not identify that. The key step is defining a representation that contains bonding information before applying permutation
Procrustes to match atoms.
Fig. (ii): Inspired by graph theory, we represent each molecule with an “adjacency” matrix where the diagonal elements are the atomic numbers and the off-diagonal elements are the bond orders. This results in matrices \(\mathbf{A} \in \mathbb{R}^{4 \times 4}\) and \(\mathbf{B} \in \mathbb{R}^{7 \times 7}\). Note that the permutation Procrustes requires the two matrices to be of the same size, so the smaller matrix \(\mathbf{A}\) is padded with zero rows and columns to have same shape as matrix \(\mathbf{B}\).
Fig. (iii): The mapping between atoms can be also directly deduced from the optimal permutation matrix \(\mathbf{P}\). Specifically, the transformed matrix \(\mathbf{P^{\top}AP}\) should be compared to matrix \(\mathbf{B}\) to identify the matching atoms; the zero rows/columns in \(\mathbf{A}\) (colored in blue) correspond to atoms in \(\mathbf{B}\) for which there are no corresponding atoms.
[1]:
# atom-atom mapping with 2-sided permutation procrustes (with single transformation)
import numpy as np
from procrustes import permutation_2sided
# Define molecule A representing but‐1‐en‐3‐yne
A = np.array([[6, 3, 0, 0],
[3, 6, 1, 0],
[0, 1, 6, 2],
[0, 0, 2, 6]])
# Define molecule B representing 3,3‐dimethylpent‐1‐en‐4‐yne
B = np.array([[6, 3, 0, 0, 0, 0, 0],
[3, 6, 1, 0, 0, 0, 0],
[0, 1, 6, 1, 0, 1, 1],
[0, 0, 1, 6, 2, 0, 0],
[0, 0, 0, 2, 6, 0, 0],
[0, 0, 1, 0, 0, 6, 0],
[0, 0, 1, 0, 0, 0, 6]])
# Two-sided permutation Procrustes (with single transformation)
result = permutation_2sided(A, B, method="approx-normal1", single=True, pad=True)
# Compute the transformed molecule A using transformation matrix P
P = result.t
new_A = np.dot(P.T, np.dot(result.new_a, P)).astype(int)
print("Permutation Matrix:\n", P)
print("\nTransformed A: \n", new_A)
print("\nCompare to Original (padded) B:\n", result.new_b)
print("\nProcrustes Error:", result.error)
Permutation Matrix:
[[1. 0. 0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0. 0. 0.]
[0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0.]
[0. 0. 1. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 1. 0.]
[0. 0. 0. 0. 0. 0. 1.]]
Transformed A:
[[6 3 0 0 0 0 0]
[3 6 0 1 0 0 0]
[0 0 0 0 0 0 0]
[0 1 0 6 2 0 0]
[0 0 0 2 6 0 0]
[0 0 0 0 0 0 0]
[0 0 0 0 0 0 0]]
Compare to Original (padded) B:
[[6 3 0 0 0 0 0]
[3 6 1 0 0 0 0]
[0 1 6 1 0 1 1]
[0 0 1 6 2 0 0]
[0 0 0 2 6 0 0]
[0 0 1 0 0 6 0]
[0 0 1 0 0 0 6]]
Procrustes Error: 118.0
The corresponding file can be obtained from:
Jupyter Notebook:
Atom_Atom_Mapping.ipynb
Ranking by Reordering
The problem of ranking a set of objects is ubiquitous not only in everyday life, but also for many scientific problems such as information retrieval, recommender systems, natural language processing, and drug discovery. This problem can be formulated as a 2-sided permutation Procrustes with single transformation.
Permutation Procrustes 2-Sided with Single-Transformation
Given matrix \(\mathbf{A}_{n \times n}\) and a reference \(\mathbf{B}_{n \times n}\), find a permutation of rows/columns of \(\mathbf{A}_{n \times n}\) that makes it as close as possible to \(\mathbf{B}_{n \times n}\), i.e.,
\begin{equation} \underbrace{\text{min}}_{\left\{\mathbf{P} \left| {p_{ij} \in \{0, 1\} \atop \sum_{i=1}^n p_{ij} = \sum_{j=1}^n p_{ij} = 1} \right. \right\}} \|\mathbf{P}^\dagger \mathbf{A} \mathbf{P} - \mathbf{B}\|_{F}^2 \\ \end{equation}
The code block below, we use the procrustes
library to rank five American collegiate football teams, where each team plays one game against every other team, using their score-differentials as summarized below (data taken from A. N. Langville, C. D. Meyer, Ranking by Reordering Methods, Princeton University Press, 2012, Ch. 8, pp. 97–112). Here, each team is given a zero score for a game they lost (e.g., Duke lost to every other team) and the score difference is calculated for games won
(e.g., Miami beat Duke by 45 points and UNC by 18 points). These results are also summarized in the square score-differential matrix \(\mathbf{A}\) in Fig (i).
Team |
Duke |
Miami |
UNC |
UVA |
VT |
---|---|---|---|---|---|
Duke |
0 |
0 |
0 |
0 |
0 |
Miami |
45 |
0 |
18 |
8 |
20 |
UNC |
3 |
0 |
0 |
2 |
0 |
UVA |
31 |
0 |
0 |
0 |
0 |
VT |
45 |
0 |
27 |
38 |
0 |
Before applying Procrustes, one needs to define a proper target matrix. Traditionally, the rank-differential matrix has been used for this purpose and is defined for \(n\) teams as,
The rank-differential matrix \(\mathbf{R} \in \mathbb{R}^{n \times n}\) is an upper-triangular matrix and its \(ij\)-th element specifies the difference in ranking between team \(i\) and team \(j\). Considering the rank-differential matrix in Fig. (ii) as the target matrix \(\mathbf{B}\), the two-sided permutation Procrustes finds the single permutation matrix that maximizes the similarity between the score-differential matrix \(\mathbf{A}\) and the rank-differential matrix \(\mathbf{B}\). This results to \([5,2,4,3,1]\) as the final rankings of the teams in Fig. (iii).
[1]:
# ranking by reordering with 2-sided permutation procrustes (with single transformation)
import numpy as np
from procrustes import permutation_2sided
# input score-differential matrix
A = np.array([[ 0, 0, 0 , 0, 0 ], # Duke
[45, 0, 18, 8, 20], # Miami
[ 3, 0, 0 , 2, 0 ], # UNC
[31, 0, 0 , 0, 0 ], # UVA
[45, 0, 27, 38, 0 ]]) # VT
# make rank-differential matrix
n = A.shape[0]
B = np.zeros((n, n))
for index in range(n):
B[index, index:] = range(0, n - index)
# rank teams using two-sided Procrustes
result = permutation_2sided(A, B, single=True, method="approx-normal1")
# compute teams' ranks (by adding 1 because Python's list index starts from 0)
_, ranks = np.where(result.t == 1)
ranks += 1
print("Ranks = ", ranks) # displays [5, 2, 4, 3, 1]
Ranks = [5 2 4 3 1]
The corresponding file can be obtained from:
Jupyter Notebook:
Ranking_by_Reordering.ipynb
The user can download the jupyter notebook from our github repo or just use the online version hosted by MyBinder.
References
John C. Gower. Procrustes methods. WIREs Computational Statistics, 2(4):503–508, 2010. doi:10.1002/wics.107.
Peter H. Schönemann. A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1):1–10, Mar 1966. doi:10.1007/BF02289451.
Z. Zhang. A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11):1330–1334, Nov 2000. doi:10.1109/34.888718.
John C Gower and Garmt B Dijksterhuis. Procrustes Problems. Volume 30. Oxford University Press, 2004. doi:10.1093/acprof:oso/9780198510581.001.0001.
Frank B Brokken. Orthogonal procrustes rotation maximizing congruence. Psychometrika, 48(3):343–352, 1983. doi:10.1007/BF02293679.
JL Farrell, JC Stuelpnagel, RH Wessner, JR Velman, and JE Brook. A least squares estimate of satellite attitude (grace wahba). SIAM Review, 8(3):384–386, 1966. doi:10.1137/1008080.
Nicholas J Higham. The symmetric procrustes problem. BIT Numerical Mathematics, 28(1):133–143, 1988. doi:10.1007/BF01934701.
René Escalante and Marcos Raydan. Dykstra’s algorithm for constrained least-squares rectangular matrix problems. Computers & Mathematics with Applications, 35(6):73–79, 1998. doi:10.1016/S0898-1221(98)00020-0.
Juan Peng, Xi-Yan Hu, and Lei Zhang. The (m, n)-symmetric procrustes problem. Applied Mathematics and Computation, 198(1):24–34, 2008. doi:10.1016/j.amc.2007.08.094.
Farnaz Heidar Zadeh and Paul W Ayers. Molecular alignment as a penalized permutation procrustes problem. Journal of Mathematical Chemistry, 51(3):927–936, 2013. doi:10.1007/s10910-012-0119-2.
Peter H Schönemann. On two-sided orthogonal procrustes problems. Psychometrika, 33(1):19–33, 1968. doi:10.1007/BF02289673.
Shinji Umeyama. An eigendecomposition approach to weighted graph matching problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(5):695–703, 1988. doi:10.1109/34.6778.
Pythagoras Papadimitriou. Parallel Solution of SVD-Related Problems, with Applications. PhD thesis, University of Manchester, 1993. URL: http://vummath.ma.man.ac.uk/~higham/links/theses/papad93.pdf.
Chris Ding, Tao Li, and Michael I Jordan. Nonnegative matrix factorization for combinatorial optimization: spectral clustering, graph matching, and clique finding. In ICDM ‘08: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, 183–192. IEEE, 2008. doi:10.1109/ICDM.2008.130.
Mikkel B Stegmann and David Delgado Gomez. A brief introduction to statistical shape analysis. Informatics and Mathematical Modelling, Technical University of Denmark, DTU, 2002. URL: http://www2.compute.dtu.dk/pubdb/pubs/403-full.html.
John C Gower. Generalized procrustes analysis. Psychometrika, 40(1):33–51, 1975. doi:10.1007/BF02291478.
Jos MF Ten Berge. Orthogonal procrustes rotation for two or more matrices. Psychometrika, 42(2):267–276, 1977. doi:10.1007/BF02294053.
Ingwer Borg and Patrick JF Groenen. Modern Multidimensional Scaling: Theory and Applications. Springer Science & Business Media, 2005. doi:10.1007/0-387-28981-X.
JJ Kosowsky and Alan L Yuille. The invisible hand algorithm: solving the assignment problem with statistical physics. Neural Networks, 7(3):477–490, 1994. doi:10.1016/0893-6080(94)90081-7.
Steven Gold and Anand Rangarajan. Softassign versus softmax: benchmarks in combinatorial optimization. In Advances in Neural Information Processing Systems, 626–632. 1996. doi:10.5555/2998828.2998917.
Anand Rangarajan, Alan L Yuille, Steven Gold, and Eric Mjolsness. A convergence proof for the softassign quadratic assignment algorithm. In Advances in Neural Information Processing Systems, 620–626. 1997. doi:10.5555/2998981.2999069.
Jiahui Wang, Xiaoshuang Zeng, Wenjie Luo, and Wei An. The application of neural network in multiple object tracking. DEStech Transactions on Computer Science and Engineering, pages 358–264, 2018. doi:10.12783/dtcse/csse2018/24504.
Steven Gold, Anand Rangarajan, and others. Softmax to softassign: neural network algorithms for combinatorial optimization. Journal of Artificial Neural Networks, 2(4):381–399, 1996. doi:10.5555/235912.235919.
Yu Tian, Junchi Yan, Hequan Zhang, Ya Zhang, Xiaokang Yang, and Hongyuan Zha. On the convergence of graph matching: graduated assignment revisited. In European Conference on Computer Vision, 821–835. Springer, 2012. doi:10.1007/978-3-642-33712-3_59.
Z Sheikhbahaee, R Nakajima, T Erben, P Schneider, H Hildebrandt, and AC Becker. Photometric calibration of the combo-17 survey with the softassign procrustes matching method. Monthly Notices of the Royal Astronomical Society, 471(3):3443–3455, 2017. doi:10.1093/mnras/stx1810.
Alan L Yuille and JJ Kosowsky. Statistical physics algorithms that converge. Neural Computation, 6(3):341–356, 1994. doi:10.1162/neco.1994.6.3.341.