PDB_Numpy package

Submodules

`Coor` Class

class pdb_numpy.coor.Coor(coor_in=None, pdb_lines=None, pdb_id=None)[source]

Bases: object

The Coor class is a coordinate and topology object based on coordinates like pdb or gro files. It has attributes such as ‘models’, ‘crystal_pack’, and ‘active_model’ that store a list of Model objects, the crystal packing as a string, and the index of the active model, respectively.

The class provides methods to read pdb, pqr, mmcif files, parse pdb lines, download pdb files from the PDB database, and write pdb and pqr files. Additionally, it has methods to select atoms, retrieve amino acid sequences, and change the order of atoms in the model.

Attributes:

modelslist: List of Model objects
crystal_packstr: Crystal Packing as a string
active_modelint: Index of the active model

Methods

read(file_in)	Read a pdb file and store atom informations as a dictionnary od numpy array. The fonction can also read pqr files if the file extension is .pqr
select_atoms(select)	Return a list of atom index corresponding to the selection
simple_select_atoms(select)	Return a list of atom index corresponding to a simple selection
select_index(select)	Return a list of atom index corresponding to the selection
dist_under_index(index, dist)	Return a list of atom index corresponding to the distance
get_index_select(select)	Return a list of atom index corresponding to the selection
get_aa_seq(model_num=0)	Return a string of amino acid sequence
get_aa_DL_seq(model_num=0)	Return a string of amino acid sequence with disulfide bonds

add_symmetry()[source]

Apply the symmetry matrix to the coordinates. Add a model for each symmetry matrix.

Parameters:

None

Returns:

Coor: A new Coor object with the symmetry added.

property alterloc

apply_transformation(index_list=None)[source]

Apply the transformation matrix to the coordinates. Add a new model with the transformed coordinates.

Parameters:

index_listlist, optional: Index list of the transformation matrix to apply, by default all

Returns:

Coor: A new Coor object with the transformation added.

property beta

property chain

change_order(field, order_list)[source]

Change the order of the atoms in the model. The change_order() function takes in two arguments, field and order_list, which are used to change the order of the atoms in the model. The field argument specifies which field to change the order by, while order_list is a list of the new order. The function then modifies the atom dictionary in each model to match the new order.

Parameters:

fieldstr: The field to change the order by.
order_listlist: List of the new order

Returns:

None: Change the order of the atoms in the model

Examples

>>> test = Coor(pdb_id='1jd4')
>>> test.change_order('chain', ['B', 'C', 'A'])

property com: Return the center of mass of the active model

compute_chains_CA(Ca_cutoff=4.2)[source]

Correct the chain ID’s of a coor object, by checking consecutive Calphas atoms distance. If the distance is higher than Ca_cutoff , the former atoms are considered as in a different chain.

Parameters:

Ca_cutofffloat, optional: Cutoff distance between consecutive Calpha atoms, by default 4.5 Angstrom

Returns:

None

copy_box(x=1, y=1, z=1)[source]

Copy the box in the x, y and z direction.

Parameters:

xint, optional: Number of copy in the x direction, by default 1
yint, optional: Number of copy in the y direction, by default 1
zint, optional: Number of copy in the z direction, by default 1

Returns:

Coor: A new Coor object with the box copied.

property elem

property field

get_aa_DL_seq(gap_in_seq=True, frame=0)[source]

Get the amino acid sequence from a coor object. The function calculates the amino acid sequence based on the CA atoms in the structure.

L or D form is determined using CA-N-C-CB angle Angle should take values around +34° and -34° for L- and D-amino acid residues. if amino acid is in D form it will be in lower case.

Reference: https://onlinelibrary.wiley.com/doi/full/10.1002/prot.10320

Parameters:

selfCoor: Coor object
gap_in_seqbool, optional: if True, add gaps in the sequence, by default True
frameint: Frame number for the selection, default is 0

Returns:

dict: Dictionary with chain as key and sequence as value.

Examples

>>> prot_coor = Coor('1y0m.pdb')
Succeed to read file 1y0m.pdb ,  648 atoms found
>>> prot_coor.get_aa_DL_seq()
{'A': 'TFKSAVKALFDYKAQREDELTFTKSAIIQNVEKQDGGWWRGDYGGKKQLWFPSNYVEEMIN'}
>>> prot_coor = Coor('6be9_frame_0.pdb')
Succeed to read file 6be9_frame_0.pdb ,  104 atoms found
>>> prot_coor.get_aa_DL_seq()
Residue K2 is in D form
Residue N6 is in D form
Residue P7 is in D form
{'A': 'TkNDTnp'}

Warning

If atom chains are not arranged sequentially (A,A,A,B,B,A,A,A …), the first atom seq will be overwritten by the last one.

get_aa_seq(gap_in_seq=True, frame=0)[source]

Get the amino acid sequence from a Coor object. This function takes a Coor object, selects the CA atoms using the select_atoms() method with the argument name CA, and returns a dictionary with the amino acid sequence of each chain in the protein, where the key is the chain identifier and the value is the sequence.

The function first creates empty dictionaries seq_dict and aa_num_dict. These will be used to store the sequence and the number of the last amino acid in each chain, respectively. Then, it loops through each CA atom in the selected atoms and retrieves the chain identifier, residue name, and residue number. If the chain identifier is not in seq_dict, the function initializes an empty string for the sequence of that chain and stores the residue number in aa_num_dict. Then, if the residue name is recognized as a standard amino acid in AA_DICT, the function adds the corresponding one-letter code to the sequence string for the corresponding chain in seq_dict, and updates the last amino acid number in aa_num_dict. If the residue name is not recognized, the function logs a warning message. If the residue number is not consecutive to the previous one and gap_in_seq is set to True, the function adds gaps to the sequence.

Finally, the function returns seq_dict, the dictionary with the amino acid sequences of each chain in the protein.

Parameters:

selfCoor: Coor object
gap_in_seqbool, optional: if True, add gaps in the sequence, by default True
frameint: Frame number for the selection, default is 0

Returns:

dict: Dictionary with chain as key and sequence as value.

Examples

>>> prot_coor = Coor('1y0m.pdb')
>>> prot_coor.get_aa_seq()
{'A': 'TFKSAVKALFDYKAQREDELTFTKSAIIQNVEKQDGGWWRGDYGGKKQLWFPSNYVEEMIN'}

Warning

If atom chains are not arranged sequentially (A,A,A,B,B,A,A,A …), the first atom seq will be overwritten by the last one.

get_index_select(selection, frame=0)[source]

Return index from the Coor object based on the selection string.

Parameters:

selfCoor: Coor object
selectionstr: Selection string
frameint: Frame number for the selection, default is 0

Returns:

list: a list of indexes

property insertres

property len

merge_models()[source]

Merge all models into the first model.

Returns:

None

property model_num

property name

property num

property occ

read(file_in)[source]

Read a pdb/pqr/gro/cif file and return atom information as a Coor object. It determines the file format based on the file extension and parses the lines accordingly. If the file extension is not recognized, it assumes it is a pdb file.

Parameters:

file_instr: Path of the pdb file to read

Returns:

None

Examples

>>> prot_coor = Coor('1y0m.pdb')
Succeed to read file ...1y0m.pdb ,  648 atoms found
>>> prot_coor.read('1y0m.gro')
Succeed to read file ...1y0m.gro ,  648 atoms found

remove_overlap_chain(cutoff=1.0, frame=0)[source]

Remove atoms that are closer than cutoff from another atom.

Parameters:

cutofffloat, optional: Cutoff distance, by default 1.0 Angstrom

Returns:

Coor: A new Coor object with the overlapping atoms removed.

reset_residue_index()[source]

Reset the residue index to the original index of the pdb file. This function resets the residue index in the model to the original index of the pdb file. It loops through each model in the Coor object and for each atom in the model, it compares the residue number with the last residue number seen. If they are different, it increments the residue number, and sets the residue number of the current atom to the new residue number. This effectively resets the residue index to the original index of the pdb file. The function does not return anything, it simply modifies the residue number of each atom in the Coor object.

Returns:

None: Change the residue index in the model

property resid

property residue

property resname

select_atoms(selection, frame=0)[source]

This method allows selecting atoms from the PDB file based on a selection string. The selection string follows a syntax similar to that used in the VMD molecular visualization software.

Parameters:

selfCoor: Coor object
selectionstr: Selection string
frameint: Frame number for the selection, default is 0

Returns:

Coor: a new Coor object with the selected atoms

select_index(indexes)[source]

The select_index() function selects atoms from a Coor object based on the provided indexes and returns a new Coor object with the selected atoms.

Parameters:

selfCoor: Coor object
indexeslist: List of indexes
frameint: Frame number for the selection, default is 0

Returns:

Coor: a new Coor object with the selected atoms

property total_len

property uniq_resid

write(file_out, overwrite=False)[source]

Write a pdb/pqr/gro/cif file from a Coor object. It determines the file format based on the file extension and writes the lines accordingly. If the file extension is not recognized, it assumes it is a pdb file.

Parameters:

file_outstr: Path of the pdb file to write
overwritebool: If False, check if the file exists and don’t overwrite it. If True, overwrite the file without asking for confirmation.

Returns:

None

property x

property xyz

property y

property z

`Model` Class

class pdb_numpy.model.Model[source]

Bases: object

Model class for pdb_numpy

Attributes:

atom_dictdict: Dictionary containing the atom information
lenint: Number of atoms in the model
fieldnumpy.ndarray: Array containing the field of the atom
numnumpy.ndarray: Array containing the atom number
namenumpy.ndarray: Array containing the atom name
resnamenumpy.ndarray: Array containing the residue name
alterlocnumpy.ndarray: Array containing the alternate location
chainnumpy.ndarray: Array containing the chain
insertresnumpy.ndarray: Array containing the insertion code
elemnumpy.ndarray: Array containing the element
residnumpy.ndarray: Array containing the residue number
uniq_residnumpy.ndarray: Array containing the unique residue id
xnumpy.ndarray: Array containing the x coordinate
ynumpy.ndarray: Array containing the y coordinate
znumpy.ndarray: Array containing the z coordinate
occupancynumpy.ndarray: Array containing the occupancy
bfactornumpy.ndarray: Array containing the bfactor
xyznumpy.ndarray: Array containing the x, y and z coordinates

Methods

select(selection)	Select atoms from the model
select_index(selection)	Select atoms from the model and return the index

add_atom(index, name, resname, num, resid, uniq_resid, chain, xyz, bfactor=0, occupancy=0, altloc='', insertres='', elem='')[source]

Add an atom to the Model object.

Parameters:

selfModel: Model object
atomAtom: Atom object

property alterloc

property beta

property chain

dist_under_index(sel_2, cutoff)[source]

Select atoms from the PDB file based on distance.

Parameters:

selfModel: Model object for the first selection
sel_2Model: Model object for the second selection
cutofffloat: Cutoff distance for the selection

Returns:

List: list of boolean values for each atom in the PDB file

property elem

property field

get_index_select(selection)[source]

Return index from the PDB file based on the selection string.

Parameters:

selfModel: Model object
selectionstr: Selection string
frameint: Frame number for the selection, default is 0

Returns:

list: a list of indexes

property insertres

property len

property name

property num

property occ

property resid

property residue

property resname

select_atoms(selection)[source]

Select atoms from the PDB file based on the selection string.

Parameters:

selfModel: Model object
selectionstr: Selection string
frameint: Frame number for the selection, default is 0

Returns:

Coor: a new Model object with the selected atoms

select_index(indexes)[source]

Select atoms from the PDB file based on the selection indexes.

Parameters:

selfModel: Model object
indexeslist: List of indexes
frameint: Frame number for the selection, default is 0

Returns:

Coor: a new Coor object with the selected atoms

select_tokens(tokens)[source]

Select atoms from the PDB file based on the selection tokens. Selection tokens are a list of tokens that can be either simple selection or nested selection. A simple selection contains only one keyword, operator, and values. A nested selection contains a list or sub-list of tokens.

Parameters:

selfModel: Model object
tokenslist: List of nested tokens
frameint: Frame number for the selection, default is 0

Returns:

list: a list of boolean values for each atom in the PDB file

simple_select_atoms(column, values, operator='==')[source]

Select atoms from the PDB file based on the selection tokens. Selection tokens are simple selection containing only one keyword, operator, and values.

The keywords :

“resname”
“chain”
“name”
“altloc”
“resid”
“residue”
“beta”
“occupancy”
“x”, “y”, “z”.

The operators are:

“==”
“!=”
“>”
“>=”
“<”
“<=”
“isin”

Parameters:

selfModel: Model object
columnstr: Keyword for the selection
valueslist: List of values for the selection
operatorstr: Operator for the selection
frameint: Frame number for the selection, default is 0

Returns:

list: a list of boolean values for each atom in the PDB file

property uniq_resid

property x

property xyz

property y

property z

Analysis module

pdb_numpy.analysis.compute_pdockQ(coor, rec_chain=None, lig_chain=None, cutoff=8.0)[source]

Compute the pdockQ score as define in [1].

\[pDockQ = \frac{L}{1 + e^{-k (x-x_{0})}} + b\]

where

\[x = \overline{plDDT_{interface}} \cdot log(number \: of \: interface \: contacts)\]

\(L = 0.724\) is the maximum value of the sigmoid, \(k = 0.052\) is the slope of the sigmoid, \(x_{0} = 152.611\) is the midpoint of the sigmoid, and \(b = 0.018\) is the y-intercept of the sigmoid.

Implementation was inspired from https://gitlab.com/ElofssonLab/FoldDock/-/blob/main/src/pdockq.py

Parameters:

coorCoor: object containing the coordinates of the model
rec_chainlist: list of receptor chain
lig_chainlist: list of ligand chain
cutofffloat: cutoff for native contacts, default is 8.0 A

Returns:

list: pdockQ scores

References

[1]

Bryant P, Pozzati G and Elofsson A. Improved prediction of protein-protein interactions using AlphaFold2. Nature Communications. vol. 13, 1265 (2022) https://www.nature.com/articles/s41467-022-28865-w

pdb_numpy.analysis.compute_pdockQ2(coor, pae_array, rec_chain=None, lig_chain=None, cutoff=8.0)[source]

Compute the pdockQ2 score as define in [1].

\[pDockQ_2 = \frac{L}{1 + exp [-k*(X_i-X_0)]} + b\]

whith:

\[X_i = \langle \frac{1}{1+(\frac{PAE_{int}}{d_0})^2} \rangle - \langle pLDDT \rangle_{int}\]

\(L = 0.724\) is the maximum value of the sigmoid, \(k = 0.052\) is the slope of the sigmoid, \(x_{0} = 152.611\) is the midpoint of the sigmoid, and \(b = 0.018\) is the y-intercept of the sigmoid.

Implementation was inspired from https://gitlab.com/ElofssonLab/afm-benchmark/-/blob/main/src/pdockq2.py

Parameters:

coorCoor: object containing the coordinates of the model
rec_chainlist: list of receptor chain
lig_chainlist: list of ligand chain
cutofffloat: cutoff for native contacts, default is 8.0 A

Returns:

list: pdockQ scores

References

[1]

Zhu W, Shenoy A, Kundrotras P and Elofsson A. Evaluation of AlphaFold-Multimer prediction on multi-chain protein complexes. Bioinformatics. vol. 39, 7 (2023) btad424 https://academic.oup.com/bioinformatics/article/39/7/btad424/7219714

pdb_numpy.analysis.compute_pdockQ_sel(coor, rec_sel, lig_sel, cutoff=8.0)[source]

Compute the pdockQ score as define in [1]. Using two selection strings.

\[pDockQ = \frac{L}{1 + e^{-k (x-x_{0})}} + b\]

where

\[x = \overline{plDDT_{interface}} \cdot log(number \: of \: interface \: contacts)\]

\(L = 0.724\) is the maximum value of the sigmoid, \(k = 0.052\) is the slope of the sigmoid, \(x_{0} = 152.611\) is the midpoint of the sigmoid, and \(b = 0.018\) is the y-intercept of the sigmoid.

Implementation was inspired from https://gitlab.com/ElofssonLab/FoldDock/-/blob/main/src/pdockq.py

Parameters:

coorCoor: object containing the coordinates of the model
rec_selstr: selection string for receptor
lig_selstr: selection string for ligand
cutofffloat: cutoff for native contacts, default is 8.0 A

Returns:

float: pdockQ score

References

[1]

Bryant P, Pozzati G and Elofsson A. Improved prediction of protein-protein interactions using AlphaFold2. Nature Communications. vol. 13, 1265 (2022) https://www.nature.com/articles/s41467-022-28865-w

pdb_numpy.analysis.dockQ(coor, native_coor, rec_chain=None, lig_chain=None, native_rec_chain=None, native_lig_chain=None, back_atom=['CA', 'N', 'C', 'O'])[source]

The dockQ function computes the docking quality score between a model and a native structure. It takes as input the Cartesian coordinates of the model (coor) and the native structure (native_coor) and various optional parameters such as the receptor chain (rec_chain), the ligand chain (lig_chain), the native receptor chain (native_rec_chain), the native ligand chain (native_lig_chain) and the list of backbone atoms (back_atom).

The docking quality score is computed as follows:

Align the receptor on the native receptor using the backbone atoms
Compute the RMSD between the ligand and the native ligand
Compute the RMSD between the receptor and the native receptor

The function returns the docking quality score as a float value.

The function first gets the shortest chain’s sequence to identify the peptide and receptor chains for both the model and native structure. It then removes hydrogens and non-protein atoms as well as alternate locations from the coordinates of both the model and the native structure.

The function then removes incomplete backbone residues from both the model and native structure coordinates. The ligand chain is then put at the end of the dictionary. The model is then aligned on the native structure using the model back atoms. The function computes the RMSD between the ligand and the native ligand and the RMSD between the receptor and the native receptor.

The same residue in common is set and the interface coordinates and interface native coordinates are selected. Finally, the docking quality score is returned.

Parameters:

coorCoor: Model coordinates
native_coorCoor: Native coordinates
rec_chainlist, default=None: List of receptor chain
lig_chainlist, default=None: List of ligand chain
native_rec_chainlist, default=None: List of native receptor chain
native_lig_chainlist, default=None: List of native ligand chain
back_atomlist, default=[‘CA’, ‘N’, ‘C’, ‘O’]: List of backbone atoms

Returns:

float: Docking quality score

pdb_numpy.analysis.interface_rmsd(coor_1, coor_2, rec_chain, lig_chain, cutoff=10.0, back_atom=['CA', 'N', 'C', 'O'])[source]

Compute the interface RMSD between two models. The interface is defined as the distance between the ligand and the receptor below the cutoff distance. The RMSD is computed between the ligand and the receptor of the two models.

Parameters:

coor_1Coor: First coordinates
coor_2Coor: Second coordinates
rec_chainlist: List of receptor chain
lig_chainlist: List of ligand chain
cutofffloat, default=10.0: Cutoff distance for the interface
back_atomlist, default=[‘CA’, ‘N’, ‘C’, ‘O’]: List of backbone atoms to use for the RMSD calculation

Returns:

float: Interface RMSD value

Notes

Both coor object must have equivalent residues number in both chains. Chain order must also be equivalent.

pdb_numpy.analysis.native_contact(coor, native_coor, rec_chain, lig_chain, native_rec_chain, native_lig_chain, cutoff=5.0)[source]

Compute the native contact score between a model and a native structure.

The function returns two lists: fnat_list and fnonnat_list. fnat_list contains the fraction of native contacts, and fnonnat_list contains the fraction of non-native contacts.

The function first selects the atoms within the cutoff distance from the native receptor-ligand interface in the native structure, and then selects the atoms within the cutoff distance from the receptor-ligand interface in the model structure. It then compares the selected atoms in the model structure with the native atoms to determine the native contacts and non-native contacts.

The function uses the np.unique() function to get unique residue numbers, and then iterates over these residue numbers to compute the native contacts. It then computes the fraction of native contacts and non-native contacts for each model and adds them to the respective lists.

Finally, the function returns the fnat_list and fnonnat_list.

Parameters:

coorCoor: Model coordinates
native_coorCoor: Native coordinates
rec_chainlist: List of receptor chain
lig_chainlist: List of ligand chain
native_rec_chainlist: List of native receptor chain
native_lig_chainlist: List of native ligand chain
cutofffloat, default=5.0: Cutoff distance for the native contact

Returns:

float: Native contact score

pdb_numpy.analysis.rmsd(coor_1, coor_2, selection='name CA', index_list=None, frame_ref=0)[source]

Compute RMSD between two sets of coordinates.

The RMSD (Root Mean Square Deviation) measures the similarity between two sets of coordinates by calculating the average distance between the corresponding atoms. It is given by the formula:

\[RMSD(v,w) = \sqrt{ \frac{1}{N} \sum_{i=1}^{N} (v_i - w_i)^2 }\]

where v and w are the two sets of coordinates, and N is the total number of atoms.

Parameters:

coor_1Coor: First set of coordinates.
coor_2Coor: Second set of coordinates.
selectionstr, optional: A selection string specifying which atoms to include in the RMSD calculation. By default, it selects the alpha carbon atoms (‘name CA’).
index_listlist, optional: A list of two arrays containing the indices of the atoms to include in the RMSD calculation. This option is provided as an alternative to the selection argument, and can be used if you need to calculate the RMSD for a custom set of atoms. If this argument is provided, the selection argument is ignored.
frame_refint, optional: The frame number of the reference structure. By default, it is set to 0.

Returns:

float: RMSD value

DSSP module

pdb_numpy.DSSP.add_NH(coor)[source]

Add NH atoms to a protein structure.

This function adds NH atoms to a protein structure by computing the coordinates of the hydrogen atoms using neighboring C and CA atoms. The new NH atoms are added to the protein structure.

Parameters:

coorCoor: Protein coordinates.

Returns:

TO FIX IS SHOULD RETURN A NEW COOR OBJECT

pdb_numpy.DSSP.compute_DSSP(coor)[source]

Compute DSSP for a protein.

The compute_DSSP function takes in a coor parameter which represents the coordinates of a protein. It computes the secondary structure of the protein based on the input coordinates and returns a sequence of secondary structure elements. The output sequence consists of the following elements:

“H” represents a 4-helix (\(\alpha\)-helix)
“B” represents a residue in an isolated \(\beta\)-bridge
“E” represents an extended strand that participates in a \(\beta\)-ladder
“G” represents a 3-helix (\(3_{10}\)-helix)
“I” represents a 5-helix (\(\pi\)-helix)
“T” represents an H-bonded turn
“S” represents a bend

The function first adds NH hydrogen atoms to the protein coordinates and selects the \(\alpha\)-carbons of the protein. It then computes a distance matrix between all residues and uses this to compute the secondary structure of the protein. It first computes turns, then \(\beta\)-sheets, and finally assigns \(\alpha\)-helices, \(\pi\)-helices, and \(3_{10}\)-helices. Finally, it joins overlapping helices and returns the secondary structure sequence.

Parameters:

coorCoor: Coor object containing the protein coordinates.

Returns:

SS_seqnumpy.ndarray: A numpy array of shape (n_res,) where n_res is the number of residues

Note

96 % accuracy compared to DSSP (3eam)
omitted β-bulge annotation

pdb_numpy.DSSP.compute_Hbond_matrix(model)[source]

Compute Hbond matrix for a protein.

Parameters:

modelModel: model containing the protein.

Returns:

Hbond_matnumpy.ndarray: A boolean matrix of shape (n_res, n_res) where n_res is the number of

Notes

The cutoff distance for the Hbond neighbor search is 8 Angstrom.

pdb_numpy.DSSP.compute_bend(CA_sel)[source]

Compute bend for a protein.

Parameters:

CA_selModel: AtomGroup containing only CA atoms.

Returns:

bendnumpy.ndarray

An array of boolean values indicating whether the corresponding: residue has a bend (True) or not (False). The length of the array is equal to the number of residues in the protein.

Notes

The bend is computed as the angle between the vectors connecting a residue’s CA atom to the CA atoms of the residues separated by two positions in sequence. A residue is considered to have a bend if the angle is greater than 70 degrees.

pdb_numpy.DSSP.get_NH_xyz(model)[source]

computes and returns the nitrogen-hydrogen (NH) coordinates of a protein.

Parameters:

modelModel: Model object containing protein coordinates.

Returns:

NH_listnumpy.ndarray: 2D array of shape (n_residues, 3) containing the NH coordinates for each residue in the protein. The first row contains None values to align with the residue numbering in the protein.

Notes

The function first selects the protein atoms excluding alternate locations B, C, D and E. It then extracts the C, CA and N atoms and the corresponding residue names. It computes the NH vectors for each residue and adds them to the N atom position. Finally, the NH coordinates are stored in a 2D numpy array where each row represents a residue and the columns correspond to the x, y, and z coordinates. The first row contains None values to align with the residue numbering in the protein.

pdb_numpy.DSSP.hbond_energy(vec_N, vec_H, vec_O, vec_C)[source]

Compute HBond energy based on ON, CH, OH and CN distances.

\[E = q_1 q_2 \left( \frac{1}{r(ON)} + \frac{l}{r(CH)} - \frac{l}{r(OH)} - \frac{l}{r(CN)} \right) f\]

where: \(q1 = 0.42 e\) and \(q2 = 0.20 e\) f is a dimensional factor \(f = 332\) and \(r\) is in Angstrom.

The function returns the calculated hydrogen bond energy in units of kcal/mol.

Parameters:

vec_Nnumpy.ndarray: Nitrogen coordinates.
vec_Hnumpy.ndarray: Hydrogen coordinates.
vec_Onumpy.ndarray: Oxygen coordinates.
vec_Cnumpy.ndarray: Carbon coordinates.

Returns:

energyfloat: HBond energy (kcal/mol).

Geom module

pdb_numpy.geom.angle_vec(vec_a, vec_b)[source]

Compute angle between two vectors.

Parameters:

vec_anumpy.ndarray: vector a
vec_bnumpy.ndarray: vector b

Returns:

float: angle between vec_a and vec_b in radians

Examples

>>> angle = Coor.angle_vec([1, 0, 0], [0, 1, 0])
>>> print(f'angle = {np.degrees(angle):.2f}')
angle = 90.00
>>> angle = Coor.angle_vec([1, 0, 0], [1, 0, 0])
>>> print(f'angle = {np.degrees(angle):.2f}')
angle = 0.00
>>> angle = Coor.angle_vec([1, 0, 0], [1, 1, 0])
>>> print(f'angle = {np.degrees(angle):.2f}')
angle = 45.00
>>> angle = Coor.angle_vec([1, 0, 0], [-1, 0, 0])
>>> print(f'angle = {np.degrees(angle):.2f}')
angle = 180.00

pdb_numpy.geom.atom_dihed_angle(atom_a, atom_b, atom_c, atom_d)[source]

Compute the dihedral anlge using 4 atoms.

Parameters:

atom_anp.array: Coordinates of the first atom
atom_bnp.array: Coordinates of the second atom
atom_cnp.array: Coordinates of the third atom
atom_dnp.array: Coordinates of the fourth atom

Returns:

float: Diheral angle in degrees

Examples

>>> atom_1 = {'xyz': np.array([0.0, -1.0, 0.0])}
>>> atom_2 = {'xyz': np.array([0.0, 0.0, 0.0])}
>>> atom_3 = {'xyz': np.array([1.0, 0.0, 0.0])}
>>> atom_4 = {'xyz': np.array([1.0, 1.0, 0.0])}
>>> atom_5 = {'xyz': np.array([1.0, -1.0, 0.0])}
>>> atom_6 = {'xyz': np.array([1.0, -1.0, 1.0])}
>>> angle_1 = Coor.atom_dihed_angle(atom_1, atom_2, atom_3, atom_4)
>>> print('{:.3f}'.format(angle_1))
180.000
>>> angle_2 = Coor.atom_dihed_angle(atom_1, atom_2, atom_3, atom_5)
>>> print('{:.3f}'.format(angle_2))
0.000
>>> angle_3 = Coor.atom_dihed_angle(atom_1, atom_2, atom_3, atom_6)
>>> print('{:.3f}'.format(angle_3))
-45.000

pdb_numpy.geom.compute_unit_cell_vectors(alpha, beta, gamma, a, b, c)[source]

Compute unit cell vectors. Formula from:

\[\begin{split}a = \begin{bmatrix} \alpha \\ 0 \\ 0 \end{bmatrix} \;\; b = \begin{bmatrix} \beta \cos(\gamma) \\ \beta \sin(\gamma) \\ 0 \end{bmatrix}\end{split}\]

\[\begin{split}c = \begin{bmatrix} c \cos(\beta) \\ \frac{c}{\sin(\gamma)} \left( \cos(\alpha) - \cos(\beta) \cos(\gamma) \right) \\ \frac{c}{\sin(\gamma)} \sqrt{1 - \cos^2(\alpha) - \cos^2(\beta) - \cos^2(\gamma) + 2 \cos(\alpha) \cos(\beta) \cos(\gamma)} \end{bmatrix}\end{split}\]

Parameters:

alphafloat: alpha angle in degrees
betafloat: beta angle in degrees
gammafloat: gamma angle in degrees
afloat: a length in Angstrom
bfloat: b length in Angstrom
cfloat: c length in Angstrom

Returns:

tuple: unit cell vectors (in Angstrom)

pdb_numpy.geom.cryst_convert(crystal_pack, format_out='pdb')[source]

PDB format: https://www.wwpdb.org/documentation/file-format-content/format33/sect8.html Gro to pdb: https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2008-May/033944.html https://en.wikipedia.org/wiki/Fractional_coordinates

Parameters:

crystal_packstr: line of the pdb file containing the crystal information
format_outstr, optional: format of the output, by default ‘pdb’

Returns:

str: line of the pdb/gro file containing the crystal information

Examples

>>> prot_coor = Coor()
>>> prot_coor.read_file(os.path.join(TEST_PATH, '1y0m.gro'))    >>> prot_coor.cryst_convert(format_out='pdb')
'CRYST1   28.748   30.978   29.753  90.00  92.12  90.00 P         1\n'

pdb_numpy.geom.cryst_convert_mmCIF(data_mmCIF, format_out='pdb')[source]

PDB format: https://www.wwpdb.org/documentation/file-format-content/format33/sect8.html Gro to pdb: https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2008-May/033944.html https://en.wikipedia.org/wiki/Fractional_coordinates

Parameters:

crystal_packstr: line of the pdb file containing the crystal information
format_outstr, optional: format of the output, by default ‘pdb’

Returns:

str: line of the pdb/gro file containing the crystal information

>>> prot_coor = Coor()
    ..

>>> prot_coor.read_file(os.path.join(TEST_PATH, '1y0m.gro'))    >>> prot_coor.cryst_convert(format_out='pdb')
    ..

‘CRYST1 28.748 30.978 29.753 90.00 92.12 90.00 P 1n’

PDB_Numpy package

Submodules

Coor Class

Model Class

Analysis module

DSSP module

Geom module

`Coor` Class

`Model` Class