This problem is very similar to a problem I answered here, but this was a special case of this problem so I will go again from the start.
Formalization of the problem
The shape of the waffle game is not important, so you can assume by reading it top to bottom left to right that the game is represented by a sequence of letters, some letters can eventually be repeated. Given this string, we want to reach the solution string (which is assumed to be known) by making only transposition of letters.
If all letters are distinct, the problem is known as the transposition number. The transposition number is $n-s$ where $s$ is the number of cycles in the cycle decomposition of the permutation that send the puzzle string to the solution string.. Basically, for each cycle in the decomposition, we perform a transposition of two adjacent elements in the cycle. When doing so, if the cycle has a lenghth greater than $2$, exactly one element will be put in its correct position. If the cycle has length $2$, it is a rule 2) transposition: both elements will get to their correct position.
For example, if the puzzle string is $BCAEFD$ and the solution string is $ABCDEF$, the permutation is a composition of two length three cycles. To remove the first cycle ($BCA$), we can first permute the two first letters, bringing $B$ in place, then swap $A$ and $C$, which puts both in the correct position, so we needed only two transpositions. Similarly, the other cycle will require two transpositions, for a total of $4$ transpositions.
In the general case, let $1$, $2$, $3$, ..., $n$ be the $n$ distinct letters that appears in the string (ok, these are numbers, not letters, but whatever...). Let's suppose letter $i$ appear $n_i$ times.
We will call canonical string the string $11...122...2...nn...n$, where letter $i$ appear $n_i$ times. By reordering the order of letters so that the solution string become the canonical string, the problem can be reduced to find reach the canonical string from a given string.
Lets call a valid labeling a labeling of a permutation using labels
$1_1, 1_2, ..., 1_{n_1}, 2_1, ..., 2_{n_2},...n_1,...n_{n_n}$, so that a letter $k$ can only get a label $k_i, 1\leq i\leq n_i$. Our problem becomes equivalent to finding the minimum transposition number over all valid labelings $\sigma$ of our permutation where the identity is labeled $\sigma_0 = 1_11_2...1_{n_1}2_12_2...2_{n_2}...n_1n_2...n_{n_n}$.
The takeaway is that an optimal solution will correspond to a set of transposition cycles that partitions the elements of the string
We can formalize this problem as a graph problem. We create a directed graph $G(V, A)$ where $v_i$ represent letter $i$. We have an arc between vertices $v_i$ and $v_j$ each time there is a position in the string such that it is the letter $i$ in the puzzle string and letter $b$ in the solution string (we can have several arcs in parallel). A cycle in this graph correspond to a valid cycle of transpositions. An optimal solution is obtained by a decomposition of edge-disjoint cycles with the most possible cycles. Note that this graph is Eulerian (indegree = outdegree), so a cycle decomposition always exist.
Rules 1) and 2) are sound
Unfortunately, this problem is NP-hard. However, we can answer your question:
Both rules 1) and 2) are sound in the sense that each time you perform one of the corresponding transpositions, there exist an optimal solution performing this transposition.
For rule $1$, this is due to the fact that the corresponding transposition must belong to a cycle of the decomposition. Since we start to transpose anywhere on the cycle, it does no harm to start on this particular transposition.
For rule $2$, an optimal decomposition might not perform a given rule 2 transposition. In the graph, a rule 2 transposition corresponds to a cycle of length $2$, so two arcs $e_1$ and $e_2$ in opposite direction. It is easy to see that if two distinct cycles $C_1$ and $C_2$ contain respectively $e_1$ and $e_2$, then we can rearrange the decomposition so that we have the 2-cycle $e_1e_2$, and a second cycle made by glueing $C_1\setminus e_1$ and $C_ 2\setminus e_2$. This transformation does not increase the number of cycles in the decomposition.
An example where the optimal solution is non-trivial
I believe that when you do not get the minimum, it is based on situations where there are no rules 1) or 2) possible and that you are making suboptimal choices for the cycles.
An example where we could make suboptimal choices:
Consider the puzzle string $CFADBECEAFBD$ and the solution string $AABBCCDDEEFF$.
We obtain the graph:
On optimal decomposition in cycles would be the Four 3-cycles $BCD$, $DFE$, $ABF$ and $AEC$. A not optimal solution would be for example a decomposition in the cycles $ABCDF$, $BFED$ and $AEC$. There is no rule 1) or 2) available, so we might be tempted to perform the transposition of the 3rd and 5th positions, which brings a $B$ into position, but then, there would be no way to achieve an optimal solution...