4
$\begingroup$

What do the consecutive A and B in the 4th column identify in a PDB file downloaded from rcsb.org?

The molecule name is 3CZ3 (consider the RNA molecule only). On visualizing it, A and B denote the two strands of a dsRNA molecule. I always thought the 6th column (the one having E) is where the chain ID is defined.

ATOM      3  P  A  C E   1       4.436  40.379 -15.386  0.50129.95           P  
ATOM      4  P  B  C E   1      15.290  -0.357  16.460  0.50123.26           P  
ATOM      5  O1PA  C E   1       2.978  40.404 -15.652  0.50130.05           O  
ATOM      6  O1PB  C E   1      14.336  -0.918  15.481  0.50123.33           O  
ATOM      7  O2PA  C E   1       4.862  41.771 -15.118  0.50129.91           O  
ATOM      8  O2PB  C E   1      16.239  -1.432  16.825  0.50123.24           O  
ATOM      9  O5'A  C E   1       5.166  39.855 -16.710  0.50129.89           O  
ATOM     10  O5'B  C E   1      14.451   0.058  17.755  0.50123.38           O  
ATOM     11  C5'A  C E   1       5.056  40.545 -17.954  0.50129.97           C  
ATOM     12  C5'B  C E   1      14.669  -0.584  19.010  0.50123.36           C  
ATOM     13  C4'A  C E   1       6.424  40.853 -18.544  0.50130.09           C  
ATOM     14  C4'B  C E   1      15.719   0.121  19.855  0.50123.10           C  
ATOM     15  O4'A  C E   1       6.820  42.211 -18.203  0.50130.44           O  
ATOM     16  O4'B  C E   1      17.036  -0.442  19.636  0.50123.30           O  
ATOM     17  C3'A  C E   1       7.593  40.026 -18.018  0.50129.97           C  
ATOM     18  C3'B  C E   1      15.991   1.586  19.561  0.50122.75           C  
ATOM     19  O3'A  C E   1       7.657  38.724 -18.580  0.50129.64           O  
ATOM     20  O3'B  C E   1      14.921   2.417  19.964  0.50121.76           O  
ATOM     21  C2'A  C E   1       8.783  40.893 -18.422  0.50130.06           C  
ATOM     22  C2'B  C E   1      17.243   1.792  20.405  0.50123.19           C  
ATOM     23  O2'A  C E   1       9.124  40.767 -19.792  0.50129.87           O  
ATOM     24  O2'B  C E   1      16.985   1.905  21.792  0.50123.34           O  
ATOM     25  C1'A  C E   1       8.240  42.290 -18.111  0.50130.23           C  
ATOM     26  C1'B  C E   1      17.990   0.488  20.131  0.50123.67           C  
ATOM     27  N1 A  C E   1       8.694  42.824 -16.759  0.50130.14           N  
ATOM     28  N1 B  C E   1      19.160   0.657  19.188  0.50124.11           N  
ATOM     29  C2 A  C E   1       9.911  43.528 -16.654  0.50130.12           C  
ATOM     30  C2 B  C E   1      20.401   1.113  19.686  0.50124.21           C  
ATOM     31  O2 A  C E   1      10.611  43.722 -17.658  0.50130.10           O  
ATOM     32  O2 B  C E   1      20.537   1.369  20.893  0.50124.22           O  
ATOM     33  N3 A  C E   1      10.300  43.991 -15.435  0.50130.18           N  
ATOM     34  N3 B  C E   1      21.445   1.262  18.825  0.50124.14           N  
ATOM     35  C4 A  C E   1       9.545  43.780 -14.352  0.50130.19           C  
ATOM     36  C4 B  C E   1      21.293   0.984  17.526  0.50124.09           C  
ATOM     37  N4 A  C E   1       9.982  44.261 -13.181  0.50130.17           N  
ATOM     38  N4 B  C E   1      22.357   1.153  16.733  0.50124.08           N  
ATOM     39  C5 A  C E   1       8.309  43.068 -14.428  0.50130.11           C  
ATOM     40  C5 B  C E   1      20.047   0.524  16.993  0.50124.23           C  
ATOM     41  C6 A  C E   1       7.934  42.617 -15.634  0.50130.15           C  
ATOM     42  C6 B  C E   1      19.022   0.378  17.850  0.50124.31           C  
ATOM     43  P  A  G E   2       8.321  37.573 -17.683  0.50129.52           P  
ATOM     44  P  B  G E   2      14.673   3.755  19.127  0.50120.94           P  
ATOM     45  O1PA  G E   2       8.217  36.290 -18.413  0.50129.58           O  
$\endgroup$

2 Answers 2

6
$\begingroup$

Your counting of columns possibly is influenced by awk* counting whitespace. But the .pdb file format's count of columns takes columns as in characters on a punch chard. In this perspective, the atom block is organized with 80 characters per line as fixed format (similar to the time when Fortran still was FORTRAN.) Similar to typewriters, with an added enumeration on top, this looks as the following

         1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
ATOM     32  N  AARG A  -3      11.281  86.699  94.383  0.50 35.88           N  
ATOM     33  N  BARG A  -3      11.296  86.721  94.521  0.50 35.60           N
ATOM     34  CA AARG A  -3      12.353  85.696  94.456  0.50 36.67           C

To quote wwPDB details further:

COLUMNS        DATA  TYPE    FIELD        DEFINITION
-------------------------------------------------------------------------------------
 1 -  6        Record name   "ATOM  "
 7 - 11        Integer       serial       Atom  serial number.
13 - 16        Atom          name         Atom name.
17             Character     altLoc       Alternate location indicator.
18 - 20        Residue name  resName      Residue name.
22             Character     chainID      Chain identifier.
23 - 26        Integer       resSeq       Residue sequence number.
27             AChar         iCode        Code for insertion of residues.
31 - 38        Real(8.3)     x            Orthogonal coordinates for X in Angstroms.
39 - 46        Real(8.3)     y            Orthogonal coordinates for Y in Angstroms.
47 - 54        Real(8.3)     z            Orthogonal coordinates for Z in Angstroms.
55 - 60        Real(6.2)     occupancy    Occupancy.
61 - 66        Real(6.2)     tempFactor   Temperature  factor.
77 - 78        LString(2)    element      Element symbol, right-justified.
79 - 80        LString(2)    charge       Charge  on the atom.

So, let's overlay your file with the definition:

         1         2         3         4         5         6         7         8
12345678901234567890123456789012345678901234567890123456789012345678901234567890

ATOM      3  P  A  C E   1       4.436  40.379 -15.386  0.50129.95           P  

This brings A you describe to position 17, altLoc. And E (on position 22) to chainID, in line with your current use.

The two snippets copy the documentation Coordinate Section of wwPDB's documentation Atomic Coordinate Entry Format Version 3.3. The root of this document is available open access as Protein Data Bank Contents Guide: Atomic Coordinate Entry Format Description, as version 3.3 (last revision by 2022-08-29).

* One noteworthy extension to awk to work with .pdb files is bioawk. There might be some functional overlap with bioperl, and biopython.

$\endgroup$
6
$\begingroup$

On visualizing it, A and B denote the two strands of a dsRNA molecule.

No, A and B denote two alternate conformations. For example, the dsRNA made of chain G and H almost has two-fold symmetry, and the protein binding to it has two-fold symmetry, so the RNA binds in either orientation. The occupancy for all the RNA atoms is 0.50 in each orientation (partially obscured in the pdb text file by running into the high B-factors in the next column).

The 3D viewers I tried ignore the second orientation when using a cartoon representation. I had to switch to ball-and-stick, and then I was able to visualize the second orientation in Jmol separately (with the command "display %B"). Alternatively, you can use the cartoon representation and use the command "configuration 2" to switch to the B-conformation.

enter image description here

I always thought the 6th column (the one having E) is where the chain ID is defined.

Yes, that's correct. The alternate conformation A of chain G makes a duplex with the alternate conformation A of chain H, and the alternate conformation B of chain G makes a duplex with the alternate conformation B of chain H.

Non-unit occupancies are a murky area in macromolecular crystallography, and software is often not fully capable of dealing with it.

$\endgroup$
1
  • 1
    $\begingroup$ You aren't kidding about "software is often not fully capable of dealing with it." I only heard about this in Spring 2022, so for ~20 years, Open Babel and Avogadro didn't know it. Avogadro2 should now add alternate conformations correctly in the latest releases. $\endgroup$ Commented Feb 1, 2023 at 0:35

Not the answer you're looking for? Browse other questions tagged or ask your own question.