1
$\begingroup$

When i started programming against PDB i had a mixture of confusion & frustration with the fact that certain cif files contain two actual structures aka assemblies of the object; i take it for granted now.

However, i observe now (from years of usage) that in the case of the ribosome XRAY/NMR structs frequently have exactly 2 structures aka assemblies stuck together with very little differnce between those two assemblies (~2-3 atom difference in a ~150,000 atom structure). Is there a reason for that? Does it have to do with the way microscopy is done or is it just the artifact of the deposition process? Why not deposit them as separate structures or just one if they are near nigh identical?

Example is 4V5D with 2 assemblies: Assemblies data.

PS. The relevant gql snippet for rcsb data api:

{
  entry(entry_id:"4v5d"){
    assemblies{
      rcsb_assembly_info {
        assembly_id
        atom_count
        branched_atom_count
        branched_entity_count
        branched_entity_instance_count
        hydrogen_atom_count
        modeled_polymer_monomer_count
        na_polymer_entity_types
        nonpolymer_atom_count
        nonpolymer_entity_count
        nonpolymer_entity_instance_count
        num_heterologous_interface_entities
        num_heteromeric_interface_entities
        num_homomeric_interface_entities
        num_interface_entities
        num_interfaces
        num_isologous_interface_entities
        num_na_interface_entities
        num_prot_na_interface_entities
        num_protein_interface_entities
        polymer_atom_count
        polymer_composition
        polymer_entity_count
        polymer_entity_count_DNA
        polymer_entity_count_RNA
        polymer_entity_count_nucleic_acid
        polymer_entity_count_nucleic_acid_hybrid
        polymer_entity_count_protein
        polymer_entity_instance_count
        polymer_entity_instance_count_DNA
        polymer_entity_instance_count_RNA
        polymer_entity_instance_count_nucleic_acid
        polymer_entity_instance_count_nucleic_acid_hybrid
        polymer_entity_instance_count_protein
        polymer_monomer_count
        selected_polymer_entity_types
        solvent_atom_count
        solvent_entity_count
        solvent_entity_instance_count
        total_assembly_buried_surface_area
        total_number_interface_residues
        unmodeled_polymer_monomer_count
      }
    }
  }
}
$\endgroup$
10
  • 1
    $\begingroup$ ..... Refinement was conducted using CNS first through a rigid body refinement of each of the two 70S molecules in the asymmetric unit; an additional rigid body refinement where each domain of the ribosome, the tRNAs and ribosomal proteins were defined as separate rigid-body groups, followed by two rounds of energy minimization and B-factor refinement. .... ncbi.nlm.nih.gov/pmc/articles/PMC2679717 $\endgroup$
    – pippo1980
    Commented Feb 23, 2023 at 11:00
  • 1
    $\begingroup$ PS your graphql link returns an error $\endgroup$
    – pippo1980
    Commented Feb 23, 2023 at 11:02
  • 1
    $\begingroup$ As a result, the biological assembly may either be composed of one copy of the macromolecule/complex or it may be composed of two or more symmetry related molecules/complexes coming together to form a larger assembly. Copies of the macromolecule or complex take on slightly different conformations and occupy unique positions in the crystal asymmetric unit. As a result, each of the different positions of the macromolecule/complex may correspond to structurally similar but not identical biological assemblies. $\endgroup$
    – pippo1980
    Commented Feb 24, 2023 at 6:38
  • 1
    $\begingroup$ see pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/… , I believe nowadays all the molecules present into the AU are shipped in the cif $\endgroup$
    – pippo1980
    Commented Feb 24, 2023 at 6:39
  • 1
    $\begingroup$ interesting too ncbi.nlm.nih.gov/pmc/articles/PMC4157440 Molecular replacement with a large number of molecules in the asymmetric unit $\endgroup$
    – pippo1980
    Commented Feb 24, 2023 at 6:45

0