TY - JOUR
T1 - The protein common assembly database (ProtCAD)––a comprehensive structural resource of protein complexes
AU - Xu, Qifang
AU - Dunbrack, Roland L.
N1 - Publisher Copyright:
© The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research.
PY - 2023/1/6
Y1 - 2023/1/6
N2 - Proteins often act through oligomeric interactions with other proteins. X-ray crystallography and cryoelectron microscopy provide detailed information on the structures of biological assemblies, defined as the most likely biologically relevant structures derived from experimental data. In crystal structures, the most relevant assembly may be ambiguously determined, since multiple assemblies observed in the crystal lattice may be plausible. It is estimated that 10–15% of PDB entries may have incorrect or ambiguous assembly annotations. Accurate assemblies are required for understanding functional data and training of deep learning methods for predicting assembly structures. As with any other kind of biological data, replication via multiple independent experiments provides important validation for the determination of biological assembly structures. Here we present the Protein Common Assembly Database (ProtCAD), which presents clusters of protein assembly structures observed in independent structure determinations of homologous proteins in the Protein Data Bank (PDB). ProtCAD is searchable by PDB entry, UniProt identifiers, or Pfam domain designations and provides downloads of coordinate files, PyMol scripts, and publicly available assembly annotations for each cluster of assemblies. About 60% of PDB entries contain assemblies in clusters of at least 2 independent experiments. All clusters and coordinates are available on ProtCAD web site (http://dunbrack2.fccc.edu/protcad).
AB - Proteins often act through oligomeric interactions with other proteins. X-ray crystallography and cryoelectron microscopy provide detailed information on the structures of biological assemblies, defined as the most likely biologically relevant structures derived from experimental data. In crystal structures, the most relevant assembly may be ambiguously determined, since multiple assemblies observed in the crystal lattice may be plausible. It is estimated that 10–15% of PDB entries may have incorrect or ambiguous assembly annotations. Accurate assemblies are required for understanding functional data and training of deep learning methods for predicting assembly structures. As with any other kind of biological data, replication via multiple independent experiments provides important validation for the determination of biological assembly structures. Here we present the Protein Common Assembly Database (ProtCAD), which presents clusters of protein assembly structures observed in independent structure determinations of homologous proteins in the Protein Data Bank (PDB). ProtCAD is searchable by PDB entry, UniProt identifiers, or Pfam domain designations and provides downloads of coordinate files, PyMol scripts, and publicly available assembly annotations for each cluster of assemblies. About 60% of PDB entries contain assemblies in clusters of at least 2 independent experiments. All clusters and coordinates are available on ProtCAD web site (http://dunbrack2.fccc.edu/protcad).
KW - Cryoelectron Microscopy
KW - Crystallography, X-Ray
KW - Databases, Protein
KW - Multiprotein Complexes/chemistry
KW - Proteins/chemistry
UR - http://www.scopus.com/inward/record.url?scp=85145966657&partnerID=8YFLogxK
UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=purepublist2023&SrcAuth=WosAPI&KeyUT=WOS:000873868100001&DestLinkType=FullRecord&DestApp=WOS
U2 - 10.1093/nar/gkac937
DO - 10.1093/nar/gkac937
M3 - Article
C2 - 36300618
SN - 0305-1048
VL - 51
SP - D466-D478
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - D1
ER -