there some particular design theory or constraint that made a 32-bit word size attractive for IBM to migrate to?
It all comes down to the most basic data type, addressing constrains and, less important, reuse of existing memory technology.
- The byte size had to be a multiple of 4, as needed to accommodate BCD numbers without wasting space.
- So 8 was chosen as byte size it can hold a text character or two BCD digits with least possible waste.
- The Machine introduced not only the idea of 8 bit bytes, but also byte addressing as basic address granularity (*1)
- As a result word length had to be a power of 2 of the number of bytes within(*2).
- Words (binary arithmetic) within the /360 were primarily meant for address handling
- The first multiple of 1 is two, so 16 bits would have made sense as word/register size
- Except, the /360 design called for more addressable memory than just 64 KiB (*3)
- The next larger word size would be 32 bits - which in turn was quite future proof (*4).
Bottom Line, 32 bit is the first logic choice to satisfy all of the above: a word size that allows byte addressing with a byte size that can hold an integer amount of BCD symbols without waste (*5).
As a side effect, 32 bits also allowed the reuse of existing 36-bit core module design as 32 data bits plus 4-bit ECC.
BTW: I don't think 7-bit ASCII justified 8 bit bytes.
At that time computers were not about text processing - especially not a ISA with a major use in replacing tabulating machinery. Here text was used to print table headers and item names, not much more. Text storage was of much lesser concern. It was about "unit price ✕ units sold ✕ tax rate" - all done in BCD. That's where the market was and the dollars would be spent, not fancy university projects or fantasy stuff like word processing. :))
*1 - Before that, basically all machines had word addressing; byte manipulation was done using extraction instruction - or rather complicated combined word and bit(field) addressing.
*2 - While power of two is not important for the number of bits within bytes or words, it is essential for addressing - at least with machines using binary addressing. So if a byte within a word has to be addressable without special means, the number of bytes in a word must be power of two.
*3 - Only the low end -30 (and the extremely reduced -20) maxed out at 64 KiB. The -40 was expandable to 256 KiB, while the -50 already started at 64 KiB and went up to 512 KiB.
*4 - It was so gigantic, that they decided to use only 24 bits thereof - and it took more than 10 years until the first /370 could be ordered with 16 MiB - and more than 25 years before 32 bits were reached.
*5 - The next bigger size would be 12-bit bytes and 48-bit words, which of course would mean that character storage would be quite wasteful. And no, putting a 6-bit character into a 12-bit byte would not help, as it would take away the ability to address characters: they would need packing and unpacking again. Not to mention that 6 bits was already seen as insufficient.