PDB_Numpy package

Submodules

Coor Class

class pdb_numpy.coor.Coor(coor_in=None, pdb_lines=None, pdb_id=None)[source]

Bases: object

The Coor class is a coordinate and topology object based on coordinates like pdb or gro files. It has attributes such as ‘models’, ‘crystal_pack’, and ‘active_model’ that store a list of Model objects, the crystal packing as a string, and the index of the active model, respectively.

The class provides methods to read pdb, pqr, mmcif files, parse pdb lines, download pdb files from the PDB database, and write pdb and pqr files. Additionally, it has methods to select atoms, retrieve amino acid sequences, and change the order of atoms in the model.

Attributes:
modelslist

List of Model objects

crystal_packstr

Crystal Packing as a string

active_modelint

Index of the active model

Methods

read(file_in)

Read a pdb file and store atom informations as a dictionnary od numpy array. The fonction can also read pqr files if the file extension is .pqr

select_atoms(select)

Return a list of atom index corresponding to the selection

simple_select_atoms(select)

Return a list of atom index corresponding to a simple selection

select_index(select)

Return a list of atom index corresponding to the selection

dist_under_index(index, dist)

Return a list of atom index corresponding to the distance

get_index_select(select)

Return a list of atom index corresponding to the selection

get_aa_seq(model_num=0)

Return a string of amino acid sequence

get_aa_DL_seq(model_num=0)

Return a string of amino acid sequence with disulfide bonds

add_symmetry()[source]

Apply the symmetry matrix to the coordinates. Add a model for each symmetry matrix.

Parameters:
None
Returns:
Coor

A new Coor object with the symmetry added.

property alterloc
apply_transformation(index_list=None)[source]

Apply the transformation matrix to the coordinates. Add a new model with the transformed coordinates.

Parameters:
index_listlist, optional

Index list of the transformation matrix to apply, by default all

Returns:
Coor

A new Coor object with the transformation added.

property beta
property chain
change_order(field, order_list)[source]

Change the order of the atoms in the model. The change_order() function takes in two arguments, field and order_list, which are used to change the order of the atoms in the model. The field argument specifies which field to change the order by, while order_list is a list of the new order. The function then modifies the atom dictionary in each model to match the new order.

Parameters:
fieldstr

The field to change the order by.

order_listlist

List of the new order

Returns:
None

Change the order of the atoms in the model

Examples

>>> test = Coor(pdb_id='1jd4')
>>> test.change_order('chain', ['B', 'C', 'A'])
property com

Return the center of mass of the active model

compute_chains_CA(Ca_cutoff=4.2)[source]

Correct the chain ID’s of a coor object, by checking consecutive Calphas atoms distance. If the distance is higher than Ca_cutoff , the former atoms are considered as in a different chain.

Parameters:
Ca_cutofffloat, optional

Cutoff distance between consecutive Calpha atoms, by default 4.5 Angstrom

Returns:
None
copy_box(x=1, y=1, z=1)[source]

Copy the box in the x, y and z direction.

Parameters:
xint, optional

Number of copy in the x direction, by default 1

yint, optional

Number of copy in the y direction, by default 1

zint, optional

Number of copy in the z direction, by default 1

Returns:
Coor

A new Coor object with the box copied.

property elem
property field
get_aa_DL_seq(gap_in_seq=True, frame=0)[source]

Get the amino acid sequence from a coor object. The function calculates the amino acid sequence based on the CA atoms in the structure.

L or D form is determined using CA-N-C-CB angle Angle should take values around +34° and -34° for L- and D-amino acid residues. if amino acid is in D form it will be in lower case.

Reference: https://onlinelibrary.wiley.com/doi/full/10.1002/prot.10320

Parameters:
selfCoor

Coor object

gap_in_seqbool, optional

if True, add gaps in the sequence, by default True

frameint

Frame number for the selection, default is 0

Returns:
dict

Dictionary with chain as key and sequence as value.

Examples

>>> prot_coor = Coor('1y0m.pdb')
Succeed to read file 1y0m.pdb ,  648 atoms found
>>> prot_coor.get_aa_DL_seq()
{'A': 'TFKSAVKALFDYKAQREDELTFTKSAIIQNVEKQDGGWWRGDYGGKKQLWFPSNYVEEMIN'}
>>> prot_coor = Coor('6be9_frame_0.pdb')
Succeed to read file 6be9_frame_0.pdb ,  104 atoms found
>>> prot_coor.get_aa_DL_seq()
Residue K2 is in D form
Residue N6 is in D form
Residue P7 is in D form
{'A': 'TkNDTnp'}

Warning

If atom chains are not arranged sequentially (A,A,A,B,B,A,A,A …), the first atom seq will be overwritten by the last one.

get_aa_seq(gap_in_seq=True, frame=0)[source]

Get the amino acid sequence from a Coor object. This function takes a Coor object, selects the CA atoms using the select_atoms() method with the argument name CA, and returns a dictionary with the amino acid sequence of each chain in the protein, where the key is the chain identifier and the value is the sequence.

The function first creates empty dictionaries seq_dict and aa_num_dict. These will be used to store the sequence and the number of the last amino acid in each chain, respectively. Then, it loops through each CA atom in the selected atoms and retrieves the chain identifier, residue name, and residue number. If the chain identifier is not in seq_dict, the function initializes an empty string for the sequence of that chain and stores the residue number in aa_num_dict. Then, if the residue name is recognized as a standard amino acid in AA_DICT, the function adds the corresponding one-letter code to the sequence string for the corresponding chain in seq_dict, and updates the last amino acid number in aa_num_dict. If the residue name is not recognized, the function logs a warning message. If the residue number is not consecutive to the previous one and gap_in_seq is set to True, the function adds gaps to the sequence.

Finally, the function returns seq_dict, the dictionary with the amino acid sequences of each chain in the protein.

Parameters:
selfCoor

Coor object

gap_in_seqbool, optional

if True, add gaps in the sequence, by default True

frameint

Frame number for the selection, default is 0

Returns:
dict

Dictionary with chain as key and sequence as value.

Examples

>>> prot_coor = Coor('1y0m.pdb')
>>> prot_coor.get_aa_seq()
{'A': 'TFKSAVKALFDYKAQREDELTFTKSAIIQNVEKQDGGWWRGDYGGKKQLWFPSNYVEEMIN'}

Warning

If atom chains are not arranged sequentially (A,A,A,B,B,A,A,A …), the first atom seq will be overwritten by the last one.

get_index_select(selection, frame=0)[source]

Return index from the Coor object based on the selection string.

Parameters:
selfCoor

Coor object

selectionstr

Selection string

frameint

Frame number for the selection, default is 0

Returns:
list

a list of indexes

property insertres
property len
merge_models()[source]

Merge all models into the first model.

Returns:
None
property model_num
property name
property num
property occ
read(file_in)[source]

Read a pdb/pqr/gro/cif file and return atom information as a Coor object. It determines the file format based on the file extension and parses the lines accordingly. If the file extension is not recognized, it assumes it is a pdb file.

Parameters:
file_instr

Path of the pdb file to read

Returns:
None

Examples

>>> prot_coor = Coor('1y0m.pdb')
Succeed to read file ...1y0m.pdb ,  648 atoms found
>>> prot_coor.read('1y0m.gro')
Succeed to read file ...1y0m.gro ,  648 atoms found
remove_overlap_chain(cutoff=1.0, frame=0)[source]

Remove atoms that are closer than cutoff from another atom.

Parameters:
cutofffloat, optional

Cutoff distance, by default 1.0 Angstrom

Returns:
Coor

A new Coor object with the overlapping atoms removed.

reset_residue_index()[source]

Reset the residue index to the original index of the pdb file. This function resets the residue index in the model to the original index of the pdb file. It loops through each model in the Coor object and for each atom in the model, it compares the residue number with the last residue number seen. If they are different, it increments the residue number, and sets the residue number of the current atom to the new residue number. This effectively resets the residue index to the original index of the pdb file. The function does not return anything, it simply modifies the residue number of each atom in the Coor object.

Returns:
None

Change the residue index in the model

property resid
property residue
property resname
select_atoms(selection, frame=0)[source]

This method allows selecting atoms from the PDB file based on a selection string. The selection string follows a syntax similar to that used in the VMD molecular visualization software.

Parameters:
selfCoor

Coor object

selectionstr

Selection string

frameint

Frame number for the selection, default is 0

Returns:
Coor

a new Coor object with the selected atoms

select_index(indexes)[source]

The select_index() function selects atoms from a Coor object based on the provided indexes and returns a new Coor object with the selected atoms.

Parameters:
selfCoor

Coor object

indexeslist

List of indexes

frameint

Frame number for the selection, default is 0

Returns:
Coor

a new Coor object with the selected atoms

property total_len
property uniq_resid
write(file_out, overwrite=False)[source]

Write a pdb/pqr/gro/cif file from a Coor object. It determines the file format based on the file extension and writes the lines accordingly. If the file extension is not recognized, it assumes it is a pdb file.

Parameters:
file_outstr

Path of the pdb file to write

overwritebool

If False, check if the file exists and don’t overwrite it. If True, overwrite the file without asking for confirmation.

Returns:
None
property x
property xyz
property y
property z

Model Class

class pdb_numpy.model.Model[source]

Bases: object

Model class for pdb_numpy

Attributes:
atom_dictdict

Dictionary containing the atom information

lenint

Number of atoms in the model

fieldnumpy.ndarray

Array containing the field of the atom

numnumpy.ndarray

Array containing the atom number

namenumpy.ndarray

Array containing the atom name

resnamenumpy.ndarray

Array containing the residue name

alterlocnumpy.ndarray

Array containing the alternate location

chainnumpy.ndarray

Array containing the chain

insertresnumpy.ndarray

Array containing the insertion code

elemnumpy.ndarray

Array containing the element

residnumpy.ndarray

Array containing the residue number

uniq_residnumpy.ndarray

Array containing the unique residue id

xnumpy.ndarray

Array containing the x coordinate

ynumpy.ndarray

Array containing the y coordinate

znumpy.ndarray

Array containing the z coordinate

occupancynumpy.ndarray

Array containing the occupancy

bfactornumpy.ndarray

Array containing the bfactor

xyznumpy.ndarray

Array containing the x, y and z coordinates

Methods

select(selection)

Select atoms from the model

select_index(selection)

Select atoms from the model and return the index

add_atom(index, name, resname, num, resid, uniq_resid, chain, xyz, bfactor=0, occupancy=0, altloc='', insertres='', elem='')[source]

Add an atom to the Model object.

Parameters:
selfModel

Model object

atomAtom

Atom object

property alterloc
property beta
property chain
dist_under_index(sel_2, cutoff)[source]

Select atoms from the PDB file based on distance.

Parameters:
selfModel

Model object for the first selection

sel_2Model

Model object for the second selection

cutofffloat

Cutoff distance for the selection

Returns:
List

list of boolean values for each atom in the PDB file

property elem
property field
get_index_select(selection)[source]

Return index from the PDB file based on the selection string.

Parameters:
selfModel

Model object

selectionstr

Selection string

frameint

Frame number for the selection, default is 0

Returns:
list

a list of indexes

property insertres
property len
property name
property num
property occ
property resid
property residue
property resname
select_atoms(selection)[source]

Select atoms from the PDB file based on the selection string.

Parameters:
selfModel

Model object

selectionstr

Selection string

frameint

Frame number for the selection, default is 0

Returns:
Coor

a new Model object with the selected atoms

select_index(indexes)[source]

Select atoms from the PDB file based on the selection indexes.

Parameters:
selfModel

Model object

indexeslist

List of indexes

frameint

Frame number for the selection, default is 0

Returns:
Coor

a new Coor object with the selected atoms

select_tokens(tokens)[source]

Select atoms from the PDB file based on the selection tokens. Selection tokens are a list of tokens that can be either simple selection or nested selection. A simple selection contains only one keyword, operator, and values. A nested selection contains a list or sub-list of tokens.

Parameters:
selfModel

Model object

tokenslist

List of nested tokens

frameint

Frame number for the selection, default is 0

Returns:
list

a list of boolean values for each atom in the PDB file

simple_select_atoms(column, values, operator='==')[source]

Select atoms from the PDB file based on the selection tokens. Selection tokens are simple selection containing only one keyword, operator, and values.

The keywords :

  • “resname”

  • “chain”

  • “name”

  • “altloc”

  • “resid”

  • “residue”

  • “beta”

  • “occupancy”

  • “x”, “y”, “z”.

The operators are:

  • “==”

  • “!=”

  • “>”

  • “>=”

  • “<”

  • “<=”

  • “isin”

Parameters:
selfModel

Model object

columnstr

Keyword for the selection

valueslist

List of values for the selection

operatorstr

Operator for the selection

frameint

Frame number for the selection, default is 0

Returns:
list

a list of boolean values for each atom in the PDB file

property uniq_resid
property x
property xyz
property y
property z

Analysis module

pdb_numpy.analysis.compute_pdockQ(coor, rec_chain=None, lig_chain=None, cutoff=8.0)[source]

Compute the pdockQ score as define in [1].

\[pDockQ = \frac{L}{1 + e^{-k (x-x_{0})}} + b\]

where

\[x = \overline{plDDT_{interface}} \cdot log(number \: of \: interface \: contacts)\]

\(L = 0.724\) is the maximum value of the sigmoid, \(k = 0.052\) is the slope of the sigmoid, \(x_{0} = 152.611\) is the midpoint of the sigmoid, and \(b = 0.018\) is the y-intercept of the sigmoid.

Implementation was inspired from https://gitlab.com/ElofssonLab/FoldDock/-/blob/main/src/pdockq.py

Parameters:
coorCoor

object containing the coordinates of the model

rec_chainlist

list of receptor chain

lig_chainlist

list of ligand chain

cutofffloat

cutoff for native contacts, default is 8.0 A

Returns:
list

pdockQ scores

References

[1]

Bryant P, Pozzati G and Elofsson A. Improved prediction of protein-protein interactions using AlphaFold2. Nature Communications. vol. 13, 1265 (2022) https://www.nature.com/articles/s41467-022-28865-w

pdb_numpy.analysis.compute_pdockQ2(coor, pae_array, rec_chain=None, lig_chain=None, cutoff=8.0)[source]

Compute the pdockQ2 score as define in [1].

\[pDockQ_2 = \frac{L}{1 + exp [-k*(X_i-X_0)]} + b\]

whith:

\[X_i = \langle \frac{1}{1+(\frac{PAE_{int}}{d_0})^2} \rangle - \langle pLDDT \rangle_{int}\]

\(L = 0.724\) is the maximum value of the sigmoid, \(k = 0.052\) is the slope of the sigmoid, \(x_{0} = 152.611\) is the midpoint of the sigmoid, and \(b = 0.018\) is the y-intercept of the sigmoid.

Implementation was inspired from https://gitlab.com/ElofssonLab/afm-benchmark/-/blob/main/src/pdockq2.py

Parameters:
coorCoor

object containing the coordinates of the model

rec_chainlist

list of receptor chain

lig_chainlist

list of ligand chain

cutofffloat

cutoff for native contacts, default is 8.0 A

Returns:
list

pdockQ scores

References

[1]

Zhu W, Shenoy A, Kundrotras P and Elofsson A. Evaluation of AlphaFold-Multimer prediction on multi-chain protein complexes. Bioinformatics. vol. 39, 7 (2023) btad424 https://academic.oup.com/bioinformatics/article/39/7/btad424/7219714

pdb_numpy.analysis.compute_pdockQ_sel(coor, rec_sel, lig_sel, cutoff=8.0)[source]

Compute the pdockQ score as define in [1]. Using two selection strings.

\[pDockQ = \frac{L}{1 + e^{-k (x-x_{0})}} + b\]

where

\[x = \overline{plDDT_{interface}} \cdot log(number \: of \: interface \: contacts)\]

\(L = 0.724\) is the maximum value of the sigmoid, \(k = 0.052\) is the slope of the sigmoid, \(x_{0} = 152.611\) is the midpoint of the sigmoid, and \(b = 0.018\) is the y-intercept of the sigmoid.

Implementation was inspired from https://gitlab.com/ElofssonLab/FoldDock/-/blob/main/src/pdockq.py

Parameters:
coorCoor

object containing the coordinates of the model

rec_selstr

selection string for receptor

lig_selstr

selection string for ligand

cutofffloat

cutoff for native contacts, default is 8.0 A

Returns:
float

pdockQ score

References

[1]

Bryant P, Pozzati G and Elofsson A. Improved prediction of protein-protein interactions using AlphaFold2. Nature Communications. vol. 13, 1265 (2022) https://www.nature.com/articles/s41467-022-28865-w

pdb_numpy.analysis.dockQ(coor, native_coor, rec_chain=None, lig_chain=None, native_rec_chain=None, native_lig_chain=None, back_atom=['CA', 'N', 'C', 'O'])[source]

The dockQ function computes the docking quality score between a model and a native structure. It takes as input the Cartesian coordinates of the model (coor) and the native structure (native_coor) and various optional parameters such as the receptor chain (rec_chain), the ligand chain (lig_chain), the native receptor chain (native_rec_chain), the native ligand chain (native_lig_chain) and the list of backbone atoms (back_atom).

The docking quality score is computed as follows:

  1. Align the receptor on the native receptor using the backbone atoms

  2. Compute the RMSD between the ligand and the native ligand

  3. Compute the RMSD between the receptor and the native receptor

The function returns the docking quality score as a float value.

The function first gets the shortest chain’s sequence to identify the peptide and receptor chains for both the model and native structure. It then removes hydrogens and non-protein atoms as well as alternate locations from the coordinates of both the model and the native structure.

The function then removes incomplete backbone residues from both the model and native structure coordinates. The ligand chain is then put at the end of the dictionary. The model is then aligned on the native structure using the model back atoms. The function computes the RMSD between the ligand and the native ligand and the RMSD between the receptor and the native receptor.

The same residue in common is set and the interface coordinates and interface native coordinates are selected. Finally, the docking quality score is returned.

Parameters:
coorCoor

Model coordinates

native_coorCoor

Native coordinates

rec_chainlist, default=None

List of receptor chain

lig_chainlist, default=None

List of ligand chain

native_rec_chainlist, default=None

List of native receptor chain

native_lig_chainlist, default=None

List of native ligand chain

back_atomlist, default=[‘CA’, ‘N’, ‘C’, ‘O’]

List of backbone atoms

Returns:
float

Docking quality score

pdb_numpy.analysis.interface_rmsd(coor_1, coor_2, rec_chain, lig_chain, cutoff=10.0, back_atom=['CA', 'N', 'C', 'O'])[source]

Compute the interface RMSD between two models. The interface is defined as the distance between the ligand and the receptor below the cutoff distance. The RMSD is computed between the ligand and the receptor of the two models.

Parameters:
coor_1Coor

First coordinates

coor_2Coor

Second coordinates

rec_chainlist

List of receptor chain

lig_chainlist

List of ligand chain

cutofffloat, default=10.0

Cutoff distance for the interface

back_atomlist, default=[‘CA’, ‘N’, ‘C’, ‘O’]

List of backbone atoms to use for the RMSD calculation

Returns:
float

Interface RMSD value

Notes

Both coor object must have equivalent residues number in both chains. Chain order must also be equivalent.

pdb_numpy.analysis.native_contact(coor, native_coor, rec_chain, lig_chain, native_rec_chain, native_lig_chain, cutoff=5.0)[source]

Compute the native contact score between a model and a native structure.

The function returns two lists: fnat_list and fnonnat_list. fnat_list contains the fraction of native contacts, and fnonnat_list contains the fraction of non-native contacts.

The function first selects the atoms within the cutoff distance from the native receptor-ligand interface in the native structure, and then selects the atoms within the cutoff distance from the receptor-ligand interface in the model structure. It then compares the selected atoms in the model structure with the native atoms to determine the native contacts and non-native contacts.

The function uses the np.unique() function to get unique residue numbers, and then iterates over these residue numbers to compute the native contacts. It then computes the fraction of native contacts and non-native contacts for each model and adds them to the respective lists.

Finally, the function returns the fnat_list and fnonnat_list.

Parameters:
coorCoor

Model coordinates

native_coorCoor

Native coordinates

rec_chainlist

List of receptor chain

lig_chainlist

List of ligand chain

native_rec_chainlist

List of native receptor chain

native_lig_chainlist

List of native ligand chain

cutofffloat, default=5.0

Cutoff distance for the native contact

Returns:
float

Native contact score

pdb_numpy.analysis.rmsd(coor_1, coor_2, selection='name CA', index_list=None, frame_ref=0)[source]

Compute RMSD between two sets of coordinates.

The RMSD (Root Mean Square Deviation) measures the similarity between two sets of coordinates by calculating the average distance between the corresponding atoms. It is given by the formula:

\[RMSD(v,w) = \sqrt{ \frac{1}{N} \sum_{i=1}^{N} (v_i - w_i)^2 }\]

where v and w are the two sets of coordinates, and N is the total number of atoms.

Parameters:
coor_1Coor

First set of coordinates.

coor_2Coor

Second set of coordinates.

selectionstr, optional

A selection string specifying which atoms to include in the RMSD calculation. By default, it selects the alpha carbon atoms (‘name CA’).

index_listlist, optional

A list of two arrays containing the indices of the atoms to include in the RMSD calculation. This option is provided as an alternative to the selection argument, and can be used if you need to calculate the RMSD for a custom set of atoms. If this argument is provided, the selection argument is ignored.

frame_refint, optional

The frame number of the reference structure. By default, it is set to 0.

Returns:
float

RMSD value

DSSP module

pdb_numpy.DSSP.add_NH(coor)[source]

Add NH atoms to a protein structure.

This function adds NH atoms to a protein structure by computing the coordinates of the hydrogen atoms using neighboring C and CA atoms. The new NH atoms are added to the protein structure.

Parameters:
coorCoor

Protein coordinates.

Returns:
TO FIX IS SHOULD RETURN A NEW COOR OBJECT
pdb_numpy.DSSP.compute_DSSP(coor)[source]

Compute DSSP for a protein.

The compute_DSSP function takes in a coor parameter which represents the coordinates of a protein. It computes the secondary structure of the protein based on the input coordinates and returns a sequence of secondary structure elements. The output sequence consists of the following elements:

  • “H” represents a 4-helix (\(\alpha\)-helix)

  • “B” represents a residue in an isolated \(\beta\)-bridge

  • “E” represents an extended strand that participates in a \(\beta\)-ladder

  • “G” represents a 3-helix (\(3_{10}\)-helix)

  • “I” represents a 5-helix (\(\pi\)-helix)

  • “T” represents an H-bonded turn

  • “S” represents a bend

The function first adds NH hydrogen atoms to the protein coordinates and selects the \(\alpha\)-carbons of the protein. It then computes a distance matrix between all residues and uses this to compute the secondary structure of the protein. It first computes turns, then \(\beta\)-sheets, and finally assigns \(\alpha\)-helices, \(\pi\)-helices, and \(3_{10}\)-helices. Finally, it joins overlapping helices and returns the secondary structure sequence.

Parameters:
coorCoor

Coor object containing the protein coordinates.

Returns:
SS_seqnumpy.ndarray

A numpy array of shape (n_res,) where n_res is the number of residues

Note

  • 96 % accuracy compared to DSSP (3eam)

  • omitted β-bulge annotation

pdb_numpy.DSSP.compute_Hbond_matrix(model)[source]

Compute Hbond matrix for a protein.

Parameters:
modelModel

model containing the protein.

Returns:
Hbond_matnumpy.ndarray

A boolean matrix of shape (n_res, n_res) where n_res is the number of

Notes

The cutoff distance for the Hbond neighbor search is 8 Angstrom.

pdb_numpy.DSSP.compute_bend(CA_sel)[source]

Compute bend for a protein.

Parameters:
CA_selModel

AtomGroup containing only CA atoms.

Returns:
bendnumpy.ndarray
An array of boolean values indicating whether the corresponding

residue has a bend (True) or not (False). The length of the array is equal to the number of residues in the protein.

Notes

The bend is computed as the angle between the vectors connecting a residue’s CA atom to the CA atoms of the residues separated by two positions in sequence. A residue is considered to have a bend if the angle is greater than 70 degrees.

pdb_numpy.DSSP.get_NH_xyz(model)[source]

computes and returns the nitrogen-hydrogen (NH) coordinates of a protein.

Parameters:
modelModel

Model object containing protein coordinates.

Returns:
NH_listnumpy.ndarray

2D array of shape (n_residues, 3) containing the NH coordinates for each residue in the protein. The first row contains None values to align with the residue numbering in the protein.

Notes

The function first selects the protein atoms excluding alternate locations B, C, D and E. It then extracts the C, CA and N atoms and the corresponding residue names. It computes the NH vectors for each residue and adds them to the N atom position. Finally, the NH coordinates are stored in a 2D numpy array where each row represents a residue and the columns correspond to the x, y, and z coordinates. The first row contains None values to align with the residue numbering in the protein.

pdb_numpy.DSSP.hbond_energy(vec_N, vec_H, vec_O, vec_C)[source]

Compute HBond energy based on ON, CH, OH and CN distances.

\[E = q_1 q_2 \left( \frac{1}{r(ON)} + \frac{l}{r(CH)} - \frac{l}{r(OH)} - \frac{l}{r(CN)} \right) f\]

where: \(q1 = 0.42 e\) and \(q2 = 0.20 e\) f is a dimensional factor \(f = 332\) and \(r\) is in Angstrom.

The function returns the calculated hydrogen bond energy in units of kcal/mol.

Parameters:
vec_Nnumpy.ndarray

Nitrogen coordinates.

vec_Hnumpy.ndarray

Hydrogen coordinates.

vec_Onumpy.ndarray

Oxygen coordinates.

vec_Cnumpy.ndarray

Carbon coordinates.

Returns:
energyfloat

HBond energy (kcal/mol).

Geom module

pdb_numpy.geom.angle_vec(vec_a, vec_b)[source]

Compute angle between two vectors.

Parameters:
vec_anumpy.ndarray

vector a

vec_bnumpy.ndarray

vector b

Returns:
float

angle between vec_a and vec_b in radians

Examples

>>> angle = Coor.angle_vec([1, 0, 0], [0, 1, 0])
>>> print(f'angle = {np.degrees(angle):.2f}')
angle = 90.00
>>> angle = Coor.angle_vec([1, 0, 0], [1, 0, 0])
>>> print(f'angle = {np.degrees(angle):.2f}')
angle = 0.00
>>> angle = Coor.angle_vec([1, 0, 0], [1, 1, 0])
>>> print(f'angle = {np.degrees(angle):.2f}')
angle = 45.00
>>> angle = Coor.angle_vec([1, 0, 0], [-1, 0, 0])
>>> print(f'angle = {np.degrees(angle):.2f}')
angle = 180.00
pdb_numpy.geom.atom_dihed_angle(atom_a, atom_b, atom_c, atom_d)[source]

Compute the dihedral anlge using 4 atoms.

Parameters:
atom_anp.array

Coordinates of the first atom

atom_bnp.array

Coordinates of the second atom

atom_cnp.array

Coordinates of the third atom

atom_dnp.array

Coordinates of the fourth atom

Returns:
float

Diheral angle in degrees

Examples

>>> atom_1 = {'xyz': np.array([0.0, -1.0, 0.0])}
>>> atom_2 = {'xyz': np.array([0.0, 0.0, 0.0])}
>>> atom_3 = {'xyz': np.array([1.0, 0.0, 0.0])}
>>> atom_4 = {'xyz': np.array([1.0, 1.0, 0.0])}
>>> atom_5 = {'xyz': np.array([1.0, -1.0, 0.0])}
>>> atom_6 = {'xyz': np.array([1.0, -1.0, 1.0])}
>>> angle_1 = Coor.atom_dihed_angle(atom_1, atom_2, atom_3, atom_4)
>>> print('{:.3f}'.format(angle_1))
180.000
>>> angle_2 = Coor.atom_dihed_angle(atom_1, atom_2, atom_3, atom_5)
>>> print('{:.3f}'.format(angle_2))
0.000
>>> angle_3 = Coor.atom_dihed_angle(atom_1, atom_2, atom_3, atom_6)
>>> print('{:.3f}'.format(angle_3))
-45.000
pdb_numpy.geom.compute_unit_cell_vectors(alpha, beta, gamma, a, b, c)[source]

Compute unit cell vectors. Formula from:

\[\begin{split}a = \begin{bmatrix} \alpha \\ 0 \\ 0 \end{bmatrix} \;\; b = \begin{bmatrix} \beta \cos(\gamma) \\ \beta \sin(\gamma) \\ 0 \end{bmatrix}\end{split}\]
\[\begin{split}c = \begin{bmatrix} c \cos(\beta) \\ \frac{c}{\sin(\gamma)} \left( \cos(\alpha) - \cos(\beta) \cos(\gamma) \right) \\ \frac{c}{\sin(\gamma)} \sqrt{1 - \cos^2(\alpha) - \cos^2(\beta) - \cos^2(\gamma) + 2 \cos(\alpha) \cos(\beta) \cos(\gamma)} \end{bmatrix}\end{split}\]
Parameters:
alphafloat

alpha angle in degrees

betafloat

beta angle in degrees

gammafloat

gamma angle in degrees

afloat

a length in Angstrom

bfloat

b length in Angstrom

cfloat

c length in Angstrom

Returns:
tuple

unit cell vectors (in Angstrom)

pdb_numpy.geom.cryst_convert(crystal_pack, format_out='pdb')[source]

PDB format: https://www.wwpdb.org/documentation/file-format-content/format33/sect8.html Gro to pdb: https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2008-May/033944.html https://en.wikipedia.org/wiki/Fractional_coordinates

Parameters:
crystal_packstr

line of the pdb file containing the crystal information

format_outstr, optional

format of the output, by default ‘pdb’

Returns:
str

line of the pdb/gro file containing the crystal information

Examples

>>> prot_coor = Coor()
>>> prot_coor.read_file(os.path.join(TEST_PATH, '1y0m.gro'))    >>> prot_coor.cryst_convert(format_out='pdb')
'CRYST1   28.748   30.978   29.753  90.00  92.12  90.00 P         1\n'
pdb_numpy.geom.cryst_convert_mmCIF(data_mmCIF, format_out='pdb')[source]

PDB format: https://www.wwpdb.org/documentation/file-format-content/format33/sect8.html Gro to pdb: https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2008-May/033944.html https://en.wikipedia.org/wiki/Fractional_coordinates

Parameters:
crystal_packstr

line of the pdb file containing the crystal information

format_outstr, optional

format of the output, by default ‘pdb’

Returns:
str

line of the pdb/gro file containing the crystal information

>>> prot_coor = Coor()
    ..
>>> prot_coor.read_file(os.path.join(TEST_PATH, '1y0m.gro'))    >>> prot_coor.cryst_convert(format_out='pdb')
    ..
‘CRYST1 28.748 30.978 29.753 90.00 92.12 90.00 P 1n’