One way you can compare Google's chips and IBM's chips is by computing detection fractions. Error correcting codes are built out of "detectors", which are sets of measurements with predictable parity. When you run an error correcting code, each detector produces a detection event or doesn't, and you use these events to correct the errors that caused them. The detection fraction is the average number of detection events you see, divided by the number of detectors. James Wootton advocates for this idea in "Syndrome-Derived Error Rates as a Benchmark of Quantum Hardware". This is a nice simple comparison that is independent of classical postprocessing factors such as the choice of decoder.
A notable difference between the Google and IBM chips is that the IBM chip is far more spread out. The Google chip is a square grid, with each qubit having four neighbors. The IBM chip is a heavy hex grid, with most qubits having two neighbors and the rest having three. This makes it hard to run the surface code on IBM chips, because you need additional gates to move information around and form the necessary detectors. This makes the detection fraction (and the logical error rate) worse, for a given gate error rate. It's not clear whether you would still use the surface code, or use a specialized code. IBM proposed a specific code called the heavy hex code, last year Microsoft published the honeycomb code which can be placed on a heavy hex lattice, and just this month Google published new ways to compile the surface code where one effect is that the overhead incurred to run on a heavy hex lattice is not as bad as it used to be.
In terms of actual experimental data, the closest comparison I can think of doing is to compare the honeycomb code detection fractions from "Measurements of Floquet code plaquette stabilizers" on IBM's chip and the surface code detection fractions from Google's experiment which you can compute from the published dataset.
Figure 5 of "Measurements of Floquet code plaquette stabilizers" indicates the best achieved detection fraction was 35% on ibm_hanoi. I think this is a 27 qubit system. I think the largest device tested was a 127 qubit device ibmq_washington, which had a detection fraction of 45%. The threshold error rate for the honeycomb code corresponds to roughly a detection fraction of 15%.
![enter image description here](https://cdn.statically.io/img/i.sstatic.net/a2m8D.png)
Looking at the distance-25 25-round X-basis experiment from the google data shows it has a detection fraction of around 16%:
$ wget -q https://zenodo.org/record/6804040/files/google_qec3v5_experiment_data.zip
$ unzip -aq google_qec3v5_experiment_data.zip
$ cat surface_code_bX_d5_r25_center_5_5/detection_events.b8 | python -c "import sys; bs = sys.stdin.buffer.read(); print(sum((b >> k) & 1 for b in bs for k in range(8))/len(bs)/8)"
0.15932483333333333
Keep in mind that these are different codes, and also they represent very different levels of effort going into the specific experiment. But I do think the huge difference in detection fraction (16% vs 35%-45%) is key. Note that in the past Google has literally made their chips smaller, going from 72 qubit bristlecone to 53 qubit sycamore, increasing quality at the cost of quantity. So, speaking very roughly, I would say the most visible distinction between IBM and Google's strategies is that they are picking different tradeoffs between quality and quantity.
Disclaimer: am on google team, but opinions are my own as derived from public data