PDB_Numpy package
Submodules
Coor Class
- class pdb_numpy.coor.Coor(coor_in=None, pdb_lines=None, pdb_id=None)[source]
Bases:
objectThe Coor class is a coordinate and topology object based on coordinates like pdb or gro files. It has attributes such as ‘models’, ‘crystal_pack’, and ‘active_model’ that store a list of Model objects, the crystal packing as a string, and the index of the active model, respectively.
The class provides methods to read pdb, pqr, mmcif files, parse pdb lines, download pdb files from the PDB database, and write pdb and pqr files. Additionally, it has methods to select atoms, retrieve amino acid sequences, and change the order of atoms in the model.
- Attributes:
- modelslist
List of Model objects
- crystal_packstr
Crystal Packing as a string
- active_modelint
Index of the active model
Methods
read(file_in)
Read a pdb file and store atom informations as a dictionnary od numpy array. The fonction can also read pqr files if the file extension is .pqr
select_atoms(select)
Return a list of atom index corresponding to the selection
simple_select_atoms(select)
Return a list of atom index corresponding to a simple selection
select_index(select)
Return a list of atom index corresponding to the selection
dist_under_index(index, dist)
Return a list of atom index corresponding to the distance
get_index_select(select)
Return a list of atom index corresponding to the selection
get_aa_seq(model_num=0)
Return a string of amino acid sequence
get_aa_DL_seq(model_num=0)
Return a string of amino acid sequence with disulfide bonds
- add_symmetry()[source]
Apply the symmetry matrix to the coordinates. Add a model for each symmetry matrix.
- Parameters:
- None
- Returns:
- Coor
A new Coor object with the symmetry added.
- property alterloc
- apply_transformation(index_list=None)[source]
Apply the transformation matrix to the coordinates. Add a new model with the transformed coordinates.
- Parameters:
- index_listlist, optional
Index list of the transformation matrix to apply, by default all
- Returns:
- Coor
A new Coor object with the transformation added.
- property beta
- property chain
- change_order(field, order_list)[source]
Change the order of the atoms in the model. The change_order() function takes in two arguments, field and order_list, which are used to change the order of the atoms in the model. The field argument specifies which field to change the order by, while order_list is a list of the new order. The function then modifies the atom dictionary in each model to match the new order.
- Parameters:
- fieldstr
The field to change the order by.
- order_listlist
List of the new order
- Returns:
- None
Change the order of the atoms in the model
Examples
>>> test = Coor(pdb_id='1jd4') >>> test.change_order('chain', ['B', 'C', 'A'])
- property com
Return the center of mass of the active model
- compute_chains_CA(Ca_cutoff=4.2)[source]
Correct the chain ID’s of a coor object, by checking consecutive Calphas atoms distance. If the distance is higher than
Ca_cutoff, the former atoms are considered as in a different chain.- Parameters:
- Ca_cutofffloat, optional
Cutoff distance between consecutive Calpha atoms, by default 4.5 Angstrom
- Returns:
- None
- copy_box(x=1, y=1, z=1)[source]
Copy the box in the x, y and z direction.
- Parameters:
- xint, optional
Number of copy in the x direction, by default 1
- yint, optional
Number of copy in the y direction, by default 1
- zint, optional
Number of copy in the z direction, by default 1
- Returns:
- Coor
A new Coor object with the box copied.
- property elem
- property field
- get_aa_DL_seq(gap_in_seq=True, frame=0)[source]
Get the amino acid sequence from a coor object. The function calculates the amino acid sequence based on the CA atoms in the structure.
L or D form is determined using CA-N-C-CB angle Angle should take values around +34° and -34° for L- and D-amino acid residues. if amino acid is in D form it will be in lower case.
Reference: https://onlinelibrary.wiley.com/doi/full/10.1002/prot.10320
- Parameters:
- selfCoor
Coor object
- gap_in_seqbool, optional
if True, add gaps in the sequence, by default True
- frameint
Frame number for the selection, default is 0
- Returns:
- dict
Dictionary with chain as key and sequence as value.
Examples
>>> prot_coor = Coor('1y0m.pdb') Succeed to read file 1y0m.pdb , 648 atoms found >>> prot_coor.get_aa_DL_seq() {'A': 'TFKSAVKALFDYKAQREDELTFTKSAIIQNVEKQDGGWWRGDYGGKKQLWFPSNYVEEMIN'} >>> prot_coor = Coor('6be9_frame_0.pdb') Succeed to read file 6be9_frame_0.pdb , 104 atoms found >>> prot_coor.get_aa_DL_seq() Residue K2 is in D form Residue N6 is in D form Residue P7 is in D form {'A': 'TkNDTnp'}
Warning
If atom chains are not arranged sequentially (A,A,A,B,B,A,A,A …), the first atom seq will be overwritten by the last one.
- get_aa_seq(gap_in_seq=True, frame=0)[source]
Get the amino acid sequence from a Coor object. This function takes a Coor object, selects the CA atoms using the select_atoms() method with the argument name CA, and returns a dictionary with the amino acid sequence of each chain in the protein, where the key is the chain identifier and the value is the sequence.
The function first creates empty dictionaries seq_dict and aa_num_dict. These will be used to store the sequence and the number of the last amino acid in each chain, respectively. Then, it loops through each CA atom in the selected atoms and retrieves the chain identifier, residue name, and residue number. If the chain identifier is not in seq_dict, the function initializes an empty string for the sequence of that chain and stores the residue number in aa_num_dict. Then, if the residue name is recognized as a standard amino acid in AA_DICT, the function adds the corresponding one-letter code to the sequence string for the corresponding chain in seq_dict, and updates the last amino acid number in aa_num_dict. If the residue name is not recognized, the function logs a warning message. If the residue number is not consecutive to the previous one and gap_in_seq is set to True, the function adds gaps to the sequence.
Finally, the function returns seq_dict, the dictionary with the amino acid sequences of each chain in the protein.
- Parameters:
- selfCoor
Coor object
- gap_in_seqbool, optional
if True, add gaps in the sequence, by default True
- frameint
Frame number for the selection, default is 0
- Returns:
- dict
Dictionary with chain as key and sequence as value.
Examples
>>> prot_coor = Coor('1y0m.pdb') >>> prot_coor.get_aa_seq() {'A': 'TFKSAVKALFDYKAQREDELTFTKSAIIQNVEKQDGGWWRGDYGGKKQLWFPSNYVEEMIN'}
Warning
If atom chains are not arranged sequentially (A,A,A,B,B,A,A,A …), the first atom seq will be overwritten by the last one.
- get_index_select(selection, frame=0)[source]
Return index from the Coor object based on the selection string.
- Parameters:
- selfCoor
Coor object
- selectionstr
Selection string
- frameint
Frame number for the selection, default is 0
- Returns:
- list
a list of indexes
- property insertres
- property len
- property model_num
- property name
- property num
- property occ
- read(file_in)[source]
Read a pdb/pqr/gro/cif file and return atom information as a Coor object. It determines the file format based on the file extension and parses the lines accordingly. If the file extension is not recognized, it assumes it is a pdb file.
- Parameters:
- file_instr
Path of the pdb file to read
- Returns:
- None
Examples
>>> prot_coor = Coor('1y0m.pdb') Succeed to read file ...1y0m.pdb , 648 atoms found >>> prot_coor.read('1y0m.gro') Succeed to read file ...1y0m.gro , 648 atoms found
- remove_overlap_chain(cutoff=1.0, frame=0)[source]
Remove atoms that are closer than
cutofffrom another atom.- Parameters:
- cutofffloat, optional
Cutoff distance, by default 1.0 Angstrom
- Returns:
- Coor
A new Coor object with the overlapping atoms removed.
- reset_residue_index()[source]
Reset the residue index to the original index of the pdb file. This function resets the residue index in the model to the original index of the pdb file. It loops through each model in the Coor object and for each atom in the model, it compares the residue number with the last residue number seen. If they are different, it increments the residue number, and sets the residue number of the current atom to the new residue number. This effectively resets the residue index to the original index of the pdb file. The function does not return anything, it simply modifies the residue number of each atom in the Coor object.
- Returns:
- None
Change the residue index in the model
- property resid
- property residue
- property resname
- select_atoms(selection, frame=0)[source]
This method allows selecting atoms from the PDB file based on a selection string. The selection string follows a syntax similar to that used in the VMD molecular visualization software.
- Parameters:
- selfCoor
Coor object
- selectionstr
Selection string
- frameint
Frame number for the selection, default is 0
- Returns:
- Coor
a new Coor object with the selected atoms
- select_index(indexes)[source]
The select_index() function selects atoms from a Coor object based on the provided indexes and returns a new Coor object with the selected atoms.
- Parameters:
- selfCoor
Coor object
- indexeslist
List of indexes
- frameint
Frame number for the selection, default is 0
- Returns:
- Coor
a new Coor object with the selected atoms
- property total_len
- property uniq_resid
- write(file_out, overwrite=False)[source]
Write a pdb/pqr/gro/cif file from a Coor object. It determines the file format based on the file extension and writes the lines accordingly. If the file extension is not recognized, it assumes it is a pdb file.
- Parameters:
- file_outstr
Path of the pdb file to write
- overwritebool
If False, check if the file exists and don’t overwrite it. If True, overwrite the file without asking for confirmation.
- Returns:
- None
- property x
- property xyz
- property y
- property z
Model Class
- class pdb_numpy.model.Model[source]
Bases:
objectModel class for pdb_numpy
- Attributes:
- atom_dictdict
Dictionary containing the atom information
- lenint
Number of atoms in the model
- fieldnumpy.ndarray
Array containing the field of the atom
- numnumpy.ndarray
Array containing the atom number
- namenumpy.ndarray
Array containing the atom name
- resnamenumpy.ndarray
Array containing the residue name
- alterlocnumpy.ndarray
Array containing the alternate location
- chainnumpy.ndarray
Array containing the chain
- insertresnumpy.ndarray
Array containing the insertion code
- elemnumpy.ndarray
Array containing the element
- residnumpy.ndarray
Array containing the residue number
- uniq_residnumpy.ndarray
Array containing the unique residue id
- xnumpy.ndarray
Array containing the x coordinate
- ynumpy.ndarray
Array containing the y coordinate
- znumpy.ndarray
Array containing the z coordinate
- occupancynumpy.ndarray
Array containing the occupancy
- bfactornumpy.ndarray
Array containing the bfactor
- xyznumpy.ndarray
Array containing the x, y and z coordinates
Methods
select(selection)
Select atoms from the model
select_index(selection)
Select atoms from the model and return the index
- add_atom(index, name, resname, num, resid, uniq_resid, chain, xyz, bfactor=0, occupancy=0, altloc='', insertres='', elem='')[source]
Add an atom to the Model object.
- Parameters:
- selfModel
Model object
- atomAtom
Atom object
- property alterloc
- property beta
- property chain
- dist_under_index(sel_2, cutoff)[source]
Select atoms from the PDB file based on distance.
- Parameters:
- selfModel
Model object for the first selection
- sel_2Model
Model object for the second selection
- cutofffloat
Cutoff distance for the selection
- Returns:
- List
list of boolean values for each atom in the PDB file
- property elem
- property field
- get_index_select(selection)[source]
Return index from the PDB file based on the selection string.
- Parameters:
- selfModel
Model object
- selectionstr
Selection string
- frameint
Frame number for the selection, default is 0
- Returns:
- list
a list of indexes
- property insertres
- property len
- property name
- property num
- property occ
- property resid
- property residue
- property resname
- select_atoms(selection)[source]
Select atoms from the PDB file based on the selection string.
- Parameters:
- selfModel
Model object
- selectionstr
Selection string
- frameint
Frame number for the selection, default is 0
- Returns:
- Coor
a new Model object with the selected atoms
- select_index(indexes)[source]
Select atoms from the PDB file based on the selection indexes.
- Parameters:
- selfModel
Model object
- indexeslist
List of indexes
- frameint
Frame number for the selection, default is 0
- Returns:
- Coor
a new Coor object with the selected atoms
- select_tokens(tokens)[source]
Select atoms from the PDB file based on the selection tokens. Selection tokens are a list of tokens that can be either simple selection or nested selection. A simple selection contains only one keyword, operator, and values. A nested selection contains a list or sub-list of tokens.
- Parameters:
- selfModel
Model object
- tokenslist
List of nested tokens
- frameint
Frame number for the selection, default is 0
- Returns:
- list
a list of boolean values for each atom in the PDB file
- simple_select_atoms(column, values, operator='==')[source]
Select atoms from the PDB file based on the selection tokens. Selection tokens are simple selection containing only one keyword, operator, and values.
The keywords :
“resname”
“chain”
“name”
“altloc”
“resid”
“residue”
“beta”
“occupancy”
“x”, “y”, “z”.
The operators are:
“==”
“!=”
“>”
“>=”
“<”
“<=”
“isin”
- Parameters:
- selfModel
Model object
- columnstr
Keyword for the selection
- valueslist
List of values for the selection
- operatorstr
Operator for the selection
- frameint
Frame number for the selection, default is 0
- Returns:
- list
a list of boolean values for each atom in the PDB file
- property uniq_resid
- property x
- property xyz
- property y
- property z
Analysis module
- pdb_numpy.analysis.compute_pdockQ(coor, rec_chain=None, lig_chain=None, cutoff=8.0)[source]
Compute the pdockQ score as define in [1].
\[pDockQ = \frac{L}{1 + e^{-k (x-x_{0})}} + b\]where
\[x = \overline{plDDT_{interface}} \cdot log(number \: of \: interface \: contacts)\]\(L = 0.724\) is the maximum value of the sigmoid, \(k = 0.052\) is the slope of the sigmoid, \(x_{0} = 152.611\) is the midpoint of the sigmoid, and \(b = 0.018\) is the y-intercept of the sigmoid.
Implementation was inspired from https://gitlab.com/ElofssonLab/FoldDock/-/blob/main/src/pdockq.py
- Parameters:
- coorCoor
object containing the coordinates of the model
- rec_chainlist
list of receptor chain
- lig_chainlist
list of ligand chain
- cutofffloat
cutoff for native contacts, default is 8.0 A
- Returns:
- list
pdockQ scores
References
[1]Bryant P, Pozzati G and Elofsson A. Improved prediction of protein-protein interactions using AlphaFold2. Nature Communications. vol. 13, 1265 (2022) https://www.nature.com/articles/s41467-022-28865-w
- pdb_numpy.analysis.compute_pdockQ2(coor, pae_array, rec_chain=None, lig_chain=None, cutoff=8.0)[source]
Compute the pdockQ2 score as define in [1].
\[pDockQ_2 = \frac{L}{1 + exp [-k*(X_i-X_0)]} + b\]whith:
\[X_i = \langle \frac{1}{1+(\frac{PAE_{int}}{d_0})^2} \rangle - \langle pLDDT \rangle_{int}\]\(L = 0.724\) is the maximum value of the sigmoid, \(k = 0.052\) is the slope of the sigmoid, \(x_{0} = 152.611\) is the midpoint of the sigmoid, and \(b = 0.018\) is the y-intercept of the sigmoid.
Implementation was inspired from https://gitlab.com/ElofssonLab/afm-benchmark/-/blob/main/src/pdockq2.py
- Parameters:
- coorCoor
object containing the coordinates of the model
- rec_chainlist
list of receptor chain
- lig_chainlist
list of ligand chain
- cutofffloat
cutoff for native contacts, default is 8.0 A
- Returns:
- list
pdockQ scores
References
[1]Zhu W, Shenoy A, Kundrotras P and Elofsson A. Evaluation of AlphaFold-Multimer prediction on multi-chain protein complexes. Bioinformatics. vol. 39, 7 (2023) btad424 https://academic.oup.com/bioinformatics/article/39/7/btad424/7219714
- pdb_numpy.analysis.compute_pdockQ_sel(coor, rec_sel, lig_sel, cutoff=8.0)[source]
Compute the pdockQ score as define in [1]. Using two selection strings.
\[pDockQ = \frac{L}{1 + e^{-k (x-x_{0})}} + b\]where
\[x = \overline{plDDT_{interface}} \cdot log(number \: of \: interface \: contacts)\]\(L = 0.724\) is the maximum value of the sigmoid, \(k = 0.052\) is the slope of the sigmoid, \(x_{0} = 152.611\) is the midpoint of the sigmoid, and \(b = 0.018\) is the y-intercept of the sigmoid.
Implementation was inspired from https://gitlab.com/ElofssonLab/FoldDock/-/blob/main/src/pdockq.py
- Parameters:
- coorCoor
object containing the coordinates of the model
- rec_selstr
selection string for receptor
- lig_selstr
selection string for ligand
- cutofffloat
cutoff for native contacts, default is 8.0 A
- Returns:
- float
pdockQ score
References
[1]Bryant P, Pozzati G and Elofsson A. Improved prediction of protein-protein interactions using AlphaFold2. Nature Communications. vol. 13, 1265 (2022) https://www.nature.com/articles/s41467-022-28865-w
- pdb_numpy.analysis.dockQ(coor, native_coor, rec_chain=None, lig_chain=None, native_rec_chain=None, native_lig_chain=None, back_atom=['CA', 'N', 'C', 'O'])[source]
The dockQ function computes the docking quality score between a model and a native structure. It takes as input the Cartesian coordinates of the model (coor) and the native structure (native_coor) and various optional parameters such as the receptor chain (rec_chain), the ligand chain (lig_chain), the native receptor chain (native_rec_chain), the native ligand chain (native_lig_chain) and the list of backbone atoms (back_atom).
The docking quality score is computed as follows:
Align the receptor on the native receptor using the backbone atoms
Compute the RMSD between the ligand and the native ligand
Compute the RMSD between the receptor and the native receptor
The function returns the docking quality score as a float value.
The function first gets the shortest chain’s sequence to identify the peptide and receptor chains for both the model and native structure. It then removes hydrogens and non-protein atoms as well as alternate locations from the coordinates of both the model and the native structure.
The function then removes incomplete backbone residues from both the model and native structure coordinates. The ligand chain is then put at the end of the dictionary. The model is then aligned on the native structure using the model back atoms. The function computes the RMSD between the ligand and the native ligand and the RMSD between the receptor and the native receptor.
The same residue in common is set and the interface coordinates and interface native coordinates are selected. Finally, the docking quality score is returned.
- Parameters:
- coorCoor
Model coordinates
- native_coorCoor
Native coordinates
- rec_chainlist, default=None
List of receptor chain
- lig_chainlist, default=None
List of ligand chain
- native_rec_chainlist, default=None
List of native receptor chain
- native_lig_chainlist, default=None
List of native ligand chain
- back_atomlist, default=[‘CA’, ‘N’, ‘C’, ‘O’]
List of backbone atoms
- Returns:
- float
Docking quality score
- pdb_numpy.analysis.interface_rmsd(coor_1, coor_2, rec_chain, lig_chain, cutoff=10.0, back_atom=['CA', 'N', 'C', 'O'])[source]
Compute the interface RMSD between two models. The interface is defined as the distance between the ligand and the receptor below the cutoff distance. The RMSD is computed between the ligand and the receptor of the two models.
- Parameters:
- coor_1Coor
First coordinates
- coor_2Coor
Second coordinates
- rec_chainlist
List of receptor chain
- lig_chainlist
List of ligand chain
- cutofffloat, default=10.0
Cutoff distance for the interface
- back_atomlist, default=[‘CA’, ‘N’, ‘C’, ‘O’]
List of backbone atoms to use for the RMSD calculation
- Returns:
- float
Interface RMSD value
Notes
Both coor object must have equivalent residues number in both chains. Chain order must also be equivalent.
- pdb_numpy.analysis.native_contact(coor, native_coor, rec_chain, lig_chain, native_rec_chain, native_lig_chain, cutoff=5.0)[source]
Compute the native contact score between a model and a native structure.
The function returns two lists: fnat_list and fnonnat_list. fnat_list contains the fraction of native contacts, and fnonnat_list contains the fraction of non-native contacts.
The function first selects the atoms within the cutoff distance from the native receptor-ligand interface in the native structure, and then selects the atoms within the cutoff distance from the receptor-ligand interface in the model structure. It then compares the selected atoms in the model structure with the native atoms to determine the native contacts and non-native contacts.
The function uses the np.unique() function to get unique residue numbers, and then iterates over these residue numbers to compute the native contacts. It then computes the fraction of native contacts and non-native contacts for each model and adds them to the respective lists.
Finally, the function returns the fnat_list and fnonnat_list.
- Parameters:
- coorCoor
Model coordinates
- native_coorCoor
Native coordinates
- rec_chainlist
List of receptor chain
- lig_chainlist
List of ligand chain
- native_rec_chainlist
List of native receptor chain
- native_lig_chainlist
List of native ligand chain
- cutofffloat, default=5.0
Cutoff distance for the native contact
- Returns:
- float
Native contact score
- pdb_numpy.analysis.rmsd(coor_1, coor_2, selection='name CA', index_list=None, frame_ref=0)[source]
Compute RMSD between two sets of coordinates.
The RMSD (Root Mean Square Deviation) measures the similarity between two sets of coordinates by calculating the average distance between the corresponding atoms. It is given by the formula:
\[RMSD(v,w) = \sqrt{ \frac{1}{N} \sum_{i=1}^{N} (v_i - w_i)^2 }\]where v and w are the two sets of coordinates, and N is the total number of atoms.
- Parameters:
- coor_1Coor
First set of coordinates.
- coor_2Coor
Second set of coordinates.
- selectionstr, optional
A selection string specifying which atoms to include in the RMSD calculation. By default, it selects the alpha carbon atoms (‘name CA’).
- index_listlist, optional
A list of two arrays containing the indices of the atoms to include in the RMSD calculation. This option is provided as an alternative to the selection argument, and can be used if you need to calculate the RMSD for a custom set of atoms. If this argument is provided, the selection argument is ignored.
- frame_refint, optional
The frame number of the reference structure. By default, it is set to 0.
- Returns:
- float
RMSD value
DSSP module
- pdb_numpy.DSSP.add_NH(coor)[source]
Add NH atoms to a protein structure.
This function adds NH atoms to a protein structure by computing the coordinates of the hydrogen atoms using neighboring C and CA atoms. The new NH atoms are added to the protein structure.
- Parameters:
- coorCoor
Protein coordinates.
- Returns:
- TO FIX IS SHOULD RETURN A NEW COOR OBJECT
- pdb_numpy.DSSP.compute_DSSP(coor)[source]
Compute DSSP for a protein.
The compute_DSSP function takes in a coor parameter which represents the coordinates of a protein. It computes the secondary structure of the protein based on the input coordinates and returns a sequence of secondary structure elements. The output sequence consists of the following elements:
“H” represents a 4-helix (\(\alpha\)-helix)
“B” represents a residue in an isolated \(\beta\)-bridge
“E” represents an extended strand that participates in a \(\beta\)-ladder
“G” represents a 3-helix (\(3_{10}\)-helix)
“I” represents a 5-helix (\(\pi\)-helix)
“T” represents an H-bonded turn
“S” represents a bend
The function first adds NH hydrogen atoms to the protein coordinates and selects the \(\alpha\)-carbons of the protein. It then computes a distance matrix between all residues and uses this to compute the secondary structure of the protein. It first computes turns, then \(\beta\)-sheets, and finally assigns \(\alpha\)-helices, \(\pi\)-helices, and \(3_{10}\)-helices. Finally, it joins overlapping helices and returns the secondary structure sequence.
- Parameters:
- coorCoor
Coor object containing the protein coordinates.
- Returns:
- SS_seqnumpy.ndarray
A numpy array of shape (n_res,) where n_res is the number of residues
Note
96 % accuracy compared to DSSP (3eam)
omitted β-bulge annotation
- pdb_numpy.DSSP.compute_Hbond_matrix(model)[source]
Compute Hbond matrix for a protein.
- Parameters:
- modelModel
model containing the protein.
- Returns:
- Hbond_matnumpy.ndarray
A boolean matrix of shape (n_res, n_res) where n_res is the number of
Notes
The cutoff distance for the Hbond neighbor search is 8 Angstrom.
- pdb_numpy.DSSP.compute_bend(CA_sel)[source]
Compute bend for a protein.
- Parameters:
- CA_selModel
AtomGroup containing only CA atoms.
- Returns:
- bendnumpy.ndarray
- An array of boolean values indicating whether the corresponding
residue has a bend (True) or not (False). The length of the array is equal to the number of residues in the protein.
Notes
The bend is computed as the angle between the vectors connecting a residue’s CA atom to the CA atoms of the residues separated by two positions in sequence. A residue is considered to have a bend if the angle is greater than 70 degrees.
- pdb_numpy.DSSP.get_NH_xyz(model)[source]
computes and returns the nitrogen-hydrogen (NH) coordinates of a protein.
- Parameters:
- modelModel
Model object containing protein coordinates.
- Returns:
- NH_listnumpy.ndarray
2D array of shape (n_residues, 3) containing the NH coordinates for each residue in the protein. The first row contains None values to align with the residue numbering in the protein.
Notes
The function first selects the protein atoms excluding alternate locations B, C, D and E. It then extracts the C, CA and N atoms and the corresponding residue names. It computes the NH vectors for each residue and adds them to the N atom position. Finally, the NH coordinates are stored in a 2D numpy array where each row represents a residue and the columns correspond to the x, y, and z coordinates. The first row contains None values to align with the residue numbering in the protein.
- pdb_numpy.DSSP.hbond_energy(vec_N, vec_H, vec_O, vec_C)[source]
Compute HBond energy based on ON, CH, OH and CN distances.
\[E = q_1 q_2 \left( \frac{1}{r(ON)} + \frac{l}{r(CH)} - \frac{l}{r(OH)} - \frac{l}{r(CN)} \right) f\]where: \(q1 = 0.42 e\) and \(q2 = 0.20 e\) f is a dimensional factor \(f = 332\) and \(r\) is in Angstrom.
The function returns the calculated hydrogen bond energy in units of kcal/mol.
- Parameters:
- vec_Nnumpy.ndarray
Nitrogen coordinates.
- vec_Hnumpy.ndarray
Hydrogen coordinates.
- vec_Onumpy.ndarray
Oxygen coordinates.
- vec_Cnumpy.ndarray
Carbon coordinates.
- Returns:
- energyfloat
HBond energy (kcal/mol).
Geom module
- pdb_numpy.geom.angle_vec(vec_a, vec_b)[source]
Compute angle between two vectors.
- Parameters:
- vec_anumpy.ndarray
vector a
- vec_bnumpy.ndarray
vector b
- Returns:
- float
angle between vec_a and vec_b in radians
Examples
>>> angle = Coor.angle_vec([1, 0, 0], [0, 1, 0]) >>> print(f'angle = {np.degrees(angle):.2f}') angle = 90.00 >>> angle = Coor.angle_vec([1, 0, 0], [1, 0, 0]) >>> print(f'angle = {np.degrees(angle):.2f}') angle = 0.00 >>> angle = Coor.angle_vec([1, 0, 0], [1, 1, 0]) >>> print(f'angle = {np.degrees(angle):.2f}') angle = 45.00 >>> angle = Coor.angle_vec([1, 0, 0], [-1, 0, 0]) >>> print(f'angle = {np.degrees(angle):.2f}') angle = 180.00
- pdb_numpy.geom.atom_dihed_angle(atom_a, atom_b, atom_c, atom_d)[source]
Compute the dihedral anlge using 4 atoms.
- Parameters:
- atom_anp.array
Coordinates of the first atom
- atom_bnp.array
Coordinates of the second atom
- atom_cnp.array
Coordinates of the third atom
- atom_dnp.array
Coordinates of the fourth atom
- Returns:
- float
Diheral angle in degrees
Examples
>>> atom_1 = {'xyz': np.array([0.0, -1.0, 0.0])} >>> atom_2 = {'xyz': np.array([0.0, 0.0, 0.0])} >>> atom_3 = {'xyz': np.array([1.0, 0.0, 0.0])} >>> atom_4 = {'xyz': np.array([1.0, 1.0, 0.0])} >>> atom_5 = {'xyz': np.array([1.0, -1.0, 0.0])} >>> atom_6 = {'xyz': np.array([1.0, -1.0, 1.0])} >>> angle_1 = Coor.atom_dihed_angle(atom_1, atom_2, atom_3, atom_4) >>> print('{:.3f}'.format(angle_1)) 180.000 >>> angle_2 = Coor.atom_dihed_angle(atom_1, atom_2, atom_3, atom_5) >>> print('{:.3f}'.format(angle_2)) 0.000 >>> angle_3 = Coor.atom_dihed_angle(atom_1, atom_2, atom_3, atom_6) >>> print('{:.3f}'.format(angle_3)) -45.000
- pdb_numpy.geom.compute_unit_cell_vectors(alpha, beta, gamma, a, b, c)[source]
Compute unit cell vectors. Formula from:
\[\begin{split}a = \begin{bmatrix} \alpha \\ 0 \\ 0 \end{bmatrix} \;\; b = \begin{bmatrix} \beta \cos(\gamma) \\ \beta \sin(\gamma) \\ 0 \end{bmatrix}\end{split}\]\[\begin{split}c = \begin{bmatrix} c \cos(\beta) \\ \frac{c}{\sin(\gamma)} \left( \cos(\alpha) - \cos(\beta) \cos(\gamma) \right) \\ \frac{c}{\sin(\gamma)} \sqrt{1 - \cos^2(\alpha) - \cos^2(\beta) - \cos^2(\gamma) + 2 \cos(\alpha) \cos(\beta) \cos(\gamma)} \end{bmatrix}\end{split}\]- Parameters:
- alphafloat
alpha angle in degrees
- betafloat
beta angle in degrees
- gammafloat
gamma angle in degrees
- afloat
a length in Angstrom
- bfloat
b length in Angstrom
- cfloat
c length in Angstrom
- Returns:
- tuple
unit cell vectors (in Angstrom)
- pdb_numpy.geom.cryst_convert(crystal_pack, format_out='pdb')[source]
PDB format: https://www.wwpdb.org/documentation/file-format-content/format33/sect8.html Gro to pdb: https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2008-May/033944.html https://en.wikipedia.org/wiki/Fractional_coordinates
- Parameters:
- crystal_packstr
line of the pdb file containing the crystal information
- format_outstr, optional
format of the output, by default ‘pdb’
- Returns:
- str
line of the pdb/gro file containing the crystal information
Examples
>>> prot_coor = Coor() >>> prot_coor.read_file(os.path.join(TEST_PATH, '1y0m.gro')) >>> prot_coor.cryst_convert(format_out='pdb') 'CRYST1 28.748 30.978 29.753 90.00 92.12 90.00 P 1\n'
- pdb_numpy.geom.cryst_convert_mmCIF(data_mmCIF, format_out='pdb')[source]
PDB format: https://www.wwpdb.org/documentation/file-format-content/format33/sect8.html Gro to pdb: https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2008-May/033944.html https://en.wikipedia.org/wiki/Fractional_coordinates
- Parameters:
- crystal_packstr
line of the pdb file containing the crystal information
- format_outstr, optional
format of the output, by default ‘pdb’
- Returns:
- str
line of the pdb/gro file containing the crystal information
>>> prot_coor = Coor() ..
>>> prot_coor.read_file(os.path.join(TEST_PATH, '1y0m.gro')) >>> prot_coor.cryst_convert(format_out='pdb') ..
- ‘CRYST1 28.748 30.978 29.753 90.00 92.12 90.00 P 1n’