When i started programming against PDB i had a mixture of confusion & frustration with the fact that certain cif
files contain two actual structures aka assemblies of the object; i take it for granted now.
However, i observe now (from years of usage) that in the case of the ribosome XRAY/NMR structs frequently have exactly 2 structures aka assemblies stuck together with very little differnce between those two assemblies (~2-3 atom difference in a ~150,000 atom structure). Is there a reason for that? Does it have to do with the way microscopy is done or is it just the artifact of the deposition process? Why not deposit them as separate structures or just one if they are near nigh identical?
Example is 4V5D
with 2 assemblies: Assemblies data.
PS. The relevant gql snippet for rcsb data api:
{
entry(entry_id:"4v5d"){
assemblies{
rcsb_assembly_info {
assembly_id
atom_count
branched_atom_count
branched_entity_count
branched_entity_instance_count
hydrogen_atom_count
modeled_polymer_monomer_count
na_polymer_entity_types
nonpolymer_atom_count
nonpolymer_entity_count
nonpolymer_entity_instance_count
num_heterologous_interface_entities
num_heteromeric_interface_entities
num_homomeric_interface_entities
num_interface_entities
num_interfaces
num_isologous_interface_entities
num_na_interface_entities
num_prot_na_interface_entities
num_protein_interface_entities
polymer_atom_count
polymer_composition
polymer_entity_count
polymer_entity_count_DNA
polymer_entity_count_RNA
polymer_entity_count_nucleic_acid
polymer_entity_count_nucleic_acid_hybrid
polymer_entity_count_protein
polymer_entity_instance_count
polymer_entity_instance_count_DNA
polymer_entity_instance_count_RNA
polymer_entity_instance_count_nucleic_acid
polymer_entity_instance_count_nucleic_acid_hybrid
polymer_entity_instance_count_protein
polymer_monomer_count
selected_polymer_entity_types
solvent_atom_count
solvent_entity_count
solvent_entity_instance_count
total_assembly_buried_surface_area
total_number_interface_residues
unmodeled_polymer_monomer_count
}
}
}
}