Partial answer (I have a pretty good lower bound; it may in fact happen to be exact, but I don't have a proof of that; but it's bedtime):
There are
two separate modes of mark-inflation here. One is the rounding; the other is the fact that averages of averages are not necessarily averages. So, e.g., suppose a student has marks of 20, 40, 60. They can average 20 and 40 to get 30, and average this with 60 to get 45. No rounding involved at all, but they have gained 5 points just by "changing the weightings".
Now, let's first note that
a student never has reason to average groups larger than 2. Any time you would have done (a+b+c)/3, for instance, you can do ((a+b)/2+c)/2 where c is the best of the three marks; without rounding, this is never worse because it gives more weight to c and less to a,b; the extra "inner" rounding in the latter case can't make things worse; and then the final rounding step is applied to a number at least as large in the second case, and hence produces a number at least as large in the second case. All of this obviously generalizes to groups larger than 3.
So now let's consider
the possible "grouping trees" a student can construct. The leaves of each tree are the individual marks; then each time the student combines two marks two nodes get a new parent; the last such parent constructed, corresponding to the final averaging, is the root of the tree. The leaves are labelled with the original marks; the other nodes are labelled with the results of the averaging-and-rounding operations that created them; so the root is labelled with the final mark. In what follows we'll write (ab) for the result of averaging-and-rounding a,b; a,b might be original marks (leaf nodes) or already-averaged marks (non-leaf nodes). Bits of tree larger than one node together with its children will have notations like ((a(bc))d), meaning that we average-and-round b,c; then we average-and-round a together with that; then we average-and-round that together with d. Of course (ba) = (ab).
We'll try to
limit the trees that need to be considered as candidates for the best tree for a given set of original marks, by finding patterns of grouping such that some other grouping is definitely no worse. So, suppose we have the pattern ((ab)c), where c is not the largest of a,b,c. The order of a,b doesn't matter, so let's suppose a>=b and a>=c. Then I claim (a(bc)) >= ((ab)c). Why? Well, write R() for the rounding operation; I'm claiming R((a+R((b+c)/2))/2) >= R((R((a+b)/2)+c)/2); since R is monotonic this will be true if a+R((b+c)/2) >= R((a+b)/2)+c; suppose R adds an amount h to (a+b)/2 and an amount k to (b+c)/2; then I am claiming a+b/2+c/2+k >= a/2+b/2+h+c or (a+b)/2 - (b+c)/2 >= h-k; in other words, I am claiming that the amount by which (a+b)/2 is bigger than (b+c)/2 is at least the amount more by which it's rounded up. Is that right? Well, since (a+b)/2 >= (b+c)/2, if it is rounded up more it must be to a strictly bigger value, which means that its unrounded advantage must be at least 10 - 1.5; but its rounding advantage can't be bigger than 1.5, which is definitely smaller. On the other hand, if (a+b)/2 isn't rounded up more, i.e., if h-k <= 0, then trivially (a+b)/2 - (b+c)/2 >= h-k. So this is true either way, which is what we needed. We have proved that (a(bc)) >= ((ab)c) if a is the largest of a,b,c.
Now
consider the largest of the original marks. (Call it z.) At some point it must get averaged with something, giving (yz); y may or may not be one of the original marks. If we ever average that with something else, giving (x(yz)), then we know that ((xy)z) would have been at least as good; so in a best-possible tree we can assume this never happens. In other words: the largest of our original marks first enters in at the root of the tree, when it is averaged with ... whatever we did with everything else.
But
we can apply the same logic to "everything else", showing that the largest of those is averaged in at the end; and so on recursively, with the final conclusion being that the only tree we need to consider is the one that begins by averaging the two worst papers, then averaging that result with the third-worst, etc.
Now
in the absence of rounding, we would be done. The cunningly-grouped average gives weights 2^-(n-1),2^-(n-1),2^-(n-2),2^-(n-3),...,2^(-1) to the papers, in increasing order of original mark, versus 1/n for all of them; the mean of the original marks gives weights 1/n,1/n,...,1/n; so to maximize the difference we should give each paper a mark of zero if its non-uniform weight is below 1/n and a mark of 100 if its non-uniform weight is above 1/n; the number of "good" papers will then be k=floor(log2(n)), the total cheaty mark will be (1-2^-k).100, versus k/n.100 without clever grouping, and the difference will be (1-2^(-k)-k/n).100. For instance, if n=10 then k=3 and the difference is (7/8-3/10).100 = (0.875-0.300).100 = 57.5 marks. For large enough n, the difference becomes as close as we please to the full 100 marks, though of course it never reaches that figure. (With rounding the final score will be 100 for large enough n, but the gain will never be 100 because in order to get this we do need to score some marks somewhere and so the naive average can never be 0.)
Fortunately for the student, but unfortunately for me,
there is also the rounding-up to consider. The impact of this in the k'th step from the end (where r=0 for the last step, r=1 for the step before that, etc.; there are n-1 average-and-round steps, so r ranges from 0 to n-2) is at most 1.5 times 2^-r. The total benefit of all rounding is less than 3 marks. In order to get any rounding up at all we need the pre-rounding value to be at least 8.5. It feels as if these facts should make it possible to prove that most of the marks should be exactly zero, but I haven't yet managed to get my ducks sufficiently in a row for this.
It seems like it may be difficult
to figure out the max-gain strategy in the presence of rounding, which feels as if it could complicate things a lot. But there is at least one easy thing to do: all those 100s could equally well be 98.5s, since they will immediately be rounded up to 100. And of course we should round things up at the end. So a lower bound on the maximum gain with n papers is R((1-2^-k).100) - k/n.98.5 where k=floor(log2(n)) and R is the rounding function. For instance, for n=10 the maximum gain is at least 57.95, and for n=128 the maximum gain is at least 94.61328125. But this doesn't take into account the effects of rounding at intermediate stages (a bit of simple-minded programming suggests that in fact this effect is literally zero for all n up to 10000, with the naively-optimal original marks), nor the possibility of changing our original marks in any way more sophisticated than using 98.5 instead of 100 (which I suspect doesn't actually help). So at present I suspect that the upper bound earlier in this paragraph may be tight, but I have nothing resembling a proof.