17
$\begingroup$

This type of puzzle goes by many names: Nonogram, Picross, and Griddlers are all mentioned on the Wikipedia page, Simon Tatham calls it Pattern, I was introduced to it as Descartes Rainbow, ...

The puzzle starts with a given arrangement of numbers around a grid. Let's assume it's a square, $n\times n$ grid, and that the pattern is a simple black and white one, no colours. Every possible pattern of black cells in the $n\times n$ grid gives some arrangement of numbers, but there are some different patterns which give the same arrangement of numbers. A good puzzle starts with an arrangement of numbers which has one unique solution. But if I choose a random (solvable) starting problem, what's the probability that it will have a unique solution?

Among all possible starting positions, how many have a unique solution?


Important note: I don't know the answer to this question. I don't even know if anyone knows: it could be an open mathematical problem. I'm also not expecting it to be an easy thing to answer off the cuff: it's more likely to be something found in a mathematical paper or so. But we have a good subcommunity of maths puzzle gurus here, and maybe someone will be better able to find the answer than I.

$\endgroup$
7
  • $\begingroup$ Just to check I understand: you're looking at numbers-around-the-outside that correspond to some configuration within the grid, and asking what fraction of those correspond to exactly one configuration? I wonder whether the following related question might be more approachable: consider all within-the-grid configurations; what fraction of those have around-the-grid numbers that correspond to just one within-the-grid configuration? (Probably both are very tough, though.) $\endgroup$
    – Gareth McCaughan
    Commented Jun 13, 2019 at 13:20
  • $\begingroup$ @Gareth Yes, but both problems are interesting, and potentially mutually entangled (a solution to one, or its proof, might quickly yield a solution to the other). I'd consider an answer to the second question at least a partial answer to this one. $\endgroup$ Commented Jun 13, 2019 at 13:22
  • $\begingroup$ I don't even have a confident guess at whether either Rand's fraction or mine (1) -> 0 as n->oo, (2) -> 1 as n->oo, (3) -> something else as n->oo, (4) doesn't converge but is bounded away from 0 and 1, or (5) doesn't converge and is sometimes close to 0 or 1 for large n. I'd guess either 1 or 3. $\endgroup$
    – Gareth McCaughan
    Commented Jun 13, 2019 at 13:35
  • 1
    $\begingroup$ This is a very great question, I hope there would be progress on this. (And if your colleague does find a solution, please update us here, too!) $\endgroup$
    – justhalf
    Commented Jun 13, 2019 at 14:02
  • 1
    $\begingroup$ @Namaskaram That would definitely make for an interesting answer, yes please! Partial answers are encouraged on this site, as long as they're not too partial (e.g. just one clue solved for a whole crossword). $\endgroup$ Commented Oct 29, 2021 at 12:39

4 Answers 4

9
$\begingroup$

Absurdly partial answer

Terminology: an "inner configuration" is an arrangement of filled and empty cells in our $n\times n$ grid; an "outer configuration" is the corresponding arrangement of run-lengths displayed outside the grid. Let $I_n=2^{n^2}$ be the number of possible inner configurations, $O_n$ be the number of distinct outer configurations arising from these, and $U_n$ be the number of these corresponding to a unique inner configuration. The question asks for $p_n:=U_n/O_n$; I suspect $q_n:=U_n/I_n$ may be less impossible to find, though probably both are hard. Obviously $O_n\leq I_n$ and so $q_n\leq p_n$.

For $n=0$: obviously $I_0=O_0=U_0=1$ so $p_0=q_0=1$.

For $n=1$: there are just two inner configurations, and each corresponds to a different outer configuration. So $I_1=O_1=U_1=2$ and $p_1=q_1=1$.

For $n=2$: there are two "bad" configurations, the ones with two diagonally-opposite empty cells. All other configurations either have an asymmetry making them easily solvable, or have all cells in the same state which is also easy. So $I_2=16,O_2=15,U_2=14$ and therefore $p_1=\frac{14}{15},q_1=\frac{7}{8}$.

The sequence of $U_n$ is A242876 in OEIS. Not very many terms are computed there, which suggests (at least) that others agree with my feeling that this is a hard problem.

$\endgroup$
4
  • $\begingroup$ The way I read the question, it seems to be asking for $U_n$ instead of $p_n$. (And did you mean $O_n \leq I_n$ instead of $U_n \leq I_n$?) $\endgroup$
    – justhalf
    Commented Jun 13, 2019 at 14:07
  • $\begingroup$ And thanks for posting the OEIS number. Seems like a hard problem indeed. $\endgroup$
    – justhalf
    Commented Jun 13, 2019 at 14:13
  • 1
    $\begingroup$ @justhalf It asks for both $U_n$ ("how many have ...") and $p_n$ ("what's the probability ..."). But you're right that I meant O not U when comparing the fractions; I'll fix that. [EDITED to add:] Done now. $\endgroup$
    – Gareth McCaughan
    Commented Jun 13, 2019 at 14:21
  • $\begingroup$ Is there OEIS for $O_n$? Maybe that is easier. $\endgroup$
    – justhalf
    Commented Jun 13, 2019 at 14:38
5
$\begingroup$

Answer in construction

From an algorithm point of view, the ways you can attempt to crack the problem are either too strenuous for decent results in an amount of time or redundant in comparison to a decent mathematician... so I don't expect anything useful to be yielded from using a computer algorithm.

That said, I have a simple starting point that may or may not be useful in the long run

Lets consider that a Nonogram can be represented as bits where $1=coloured$ etc. Nonagrams contain numbers outside the grid (lets casually call this $A_x$, where $A_x$ is an array of $S$ length) which validly represent the bits in the respective row or column. It is possible to determine if a Nonogram is unique by finding if there are other solutions that have the exact same $A_x$. Obviously, it is difficult to do this mathematically, lets break the problem down and see if it gets us anywhere {I intend for my answer to help others reach a final, accepted answer (if that even exists). I probably won't be able to continue past a certain level of maths}


Lets start by just thinking of a single trail of bits (called $B_i$). This trail is of length $N$ and can be represented by $A_x$. How many combinations of $B_i$ share $A_x$? It is easy to see that for $A_x$ with $S = 1$ there are $N-A_1+1$ of $B_i$. For $S>1$ it gets complicated since there can be a variable number of $0$ bits padding between $1$ clusters. What is certain is that there is always at least one $0$ bit between clusters. Consider a strategy where the padding can be distributed in many combinations. Let $T = \sum\limits_{r=1}^{S}A_r$ and $D=N-T$. The difference $D$ represents the total padding between clusters, it can be distributed for the $S+1$ paddings in between clusters in different ways. There are $S-1$ paddings that must be a size of at least one. This leaves $D-S+1$ to be distributed between all $S+1$ padding. The formula for this would then be ${D+1} \choose {S}$

So $A_x$ has ${N+1-\sum\limits_{r=1}^{S}A_r} \choose {S}$ corresponding $B_i$ of length $N$.

This also implies that $N+1-\sum\limits_{r=1}^{S}A_r=S$ when there is only one unique $B_i$. This tells us that ALL valid Nonograms, where all of the rows and columns satisfy this rule, have a unique solution! (for future reference, lets call these stuffed Nonograms)


Lets have a closer look into the stuffed Nonograms since it may give us a very vague idea of what the final function (if it exists) will look like. An equation to identify stuffed Nonograms can be derived if we rearrange the rule to $N=S-1+\sum\limits_{r=1}^{S}A_r$. The square Nonogram will have $2N$ of $B_i$ that follows this rule so by summing everything, $2N^2=\sum\limits_{i=1}^{2N}(S_i-1+\sum\limits_{r=1}^{S_i}A_{i_r})$ or the nicer equation $2N^2+2N=S_T+A_T$ using $S_T$ and $A_T$ to represent the amount of total items $A_x$ and the summation of all of them respectively. There is a constraint that $S_T\geq2N$ and $A_T\geq2N$.

Logically it can be concluded that stuffed Nonograms are Nonograms that have a completely/almost full unique grid as the solution. They have peppered padding of length one (single holes). This can be proven by revisiting $B_i$. Let $B_i$ have $C$ padding of one so that $T = N - C$ and $S = C + 1$. Subbing into $N+1-T=S$ we see no contradiction since $N+1-(N-C)=C+1$ so $C+1=C+1$ $\therefore$ peppered padding of one is always unique! There is an indirect implication that there cannot be any front or back padding (since it creates more solutions) so stuffed Nonograms must at least have a full outer rim of one. It is for this reason that the only area of stuffed Nonograms with importance is $(N-2)^2$. Now consider this; if the area is unwound to become many $B_i$ and the placement of the peppered padding is considered in a converse way of thinking then the amount of corresponding $B_i$ from $A_x$ formula can be repurposed to count the total amount of stuffed Nonograms possible. The formula will initially over approximate since the rewrapping of some $B_i$ configurations will lead to alignment of padding resulting in padding of length greater than one which is invalid. To overcome this, an extra formula that counts the number of invalid wraps must be developed.

Edit: The following two paragraphs are now redundant since I've managed to find what appears to be the solution by changing my approach. The paragraphs are still necessary to fully understand the tech used in my new approach so for historic reasons I'm keeping them for the now. (Thinking of shortening this answer by moving the "holes in grid" problem into its own Q&A question)

Lets over approximate. Consider a single $B_i$, the padding is all of length one so all of $A_x$ (in this perspective) will contain $1$. $S$ can vary from $0\to \lceil\frac{N-2}{2}\rceil$ (thanks odd numbers) and $T=S$. The formula for the amount of possible $B_i$ must then be $\sum\limits_{r=0}^{\lceil\frac{N-2}{2}\rceil} {{(N-2)+1-r} \choose {r}}$. This is actually equivalent to the Fibonacci series (thanks to @justhalf's insight). The function $Fi(x)$ will be used to represent the series from now on. There is $N-2$ of $B_i$ in the relevant area which can have different combinations at once. This is the same as a combination lock of length $N-2$ so the overall formula for the amount of configurations in this case is $Fi^{(N-2)}(N)$.

Now to count all of the invalid wraps. Lets use an easy model and say that the $B_i$ are stacked on top of each other. Invalid wraps must occur when a padding of length greater than one is created from stacking $B_i$ that have padding which align. Alignment can only happen between neighbour $B_i$ and all that is required is for padding to exist at the same position locally within the two $B_i$. Take $G$ to equal the amount of $B_i$ and $L$ to equal the length of all $B_i$ (just incase we can revisit this formula). There are $G-1$ seams between $B_i$ where alignment can occur and it is possible for more than one set of neighbour $B_i$ to align at once.

Edit: This is the new approach

Lets consider the valid combinations at one seam. In order for valid combinations to happen the padding must never overlap. Lets say that the top $B_i$ at this seam can be anything where it is valid within itself. It is logical to conclude that the bottom $B_i$ must only be valid when its own padding is contained within the clusters of the top $B_i$ whilst still being internally valid. It is for this reason that the bottom $B_i$ can be considered as any $A_x$ that fragments into small locally valid $B_i$! We can now further abuse the $A_x$ formula! The formula currently depends on arrangement of $A_x$ so to fix that to make the combinations countable it is now ${{L+1-\sum\limits_{r=1}^{S}A_r} \choose {S}}S!$ (permutations). The clusters can all be any combination of locally valid $B_i$ so this is $\prod\limits_{i=1}^{S}Fi(A_i+2)$. This gives us the final formula for the seam combinations (for $A_x$) to be ${{L+1-\sum\limits_{r=1}^{S}A_r} \choose {S}}S!\prod\limits_{i=1}^{S}Fi(A_i+2)$. The total combinations can be found by summing combinations for $A_x$'s. Initially, I thought it wouldn't be possible to account for multi-seam combinations however I think that by inversion, the next seam in the stack has the same possible combinations regardless of the affects of the seam beforehand. This would create a simple combination lock of length $G$. I will be thoroughly impressed if anyone manages to put the whole formula together!


Another group of unique, valid Nonograms exists that can immediately be counted. The group consists of Nonograms where the rows/columns can only be completely full, empty or something inbetween (the latter arises from the overlap of rows and columns that are full or empty which creates a 'different image'). For future reference, lets call these net Nonograms.

Anyone familiar with solving Nonograms will know how useful completely full/empty rows/columns are. It is a good idea to take this concept to its extremes because it may be used as a part of the final solution to split Nonograms into smaller Nonograms (well, it is way more complicated than that but still). Consider just the rows of net Nonograms, it is possible to use binary to represent the information. It is for this reason that $2^N$ is the amount of combinations of rows. The same can be said for the columns. If the information for the rows and columns is 'overlapped' then we have $2^{2N}-2$ total net Nonograms! The $-2$ takes into account the two repeated combinations (all full or all empty). Gonna need proof for how overlap is resolved and why it still leads to unique, valid Nonograms but I think it is safe to say that this was a simple idea for the books...


Ideas

This section is going to be used to document facts/ideas ranging from basic to abstract, I just want to put concepts on the record so that this can be used as a helpful guide when looking for the answer.

  • The final function will be a recurrence relation.
  • The answer to "How many invalid solutions exist for $A\times B$ Nonograms with a specific set of $A_x$?" will inevitably lead to the answer for "How many valid solutions for an $N\times N$ Nonogram exists?" and "How many unique $N\times N$ Nonograms exist?"
  • The answer to the above "invalid solutions exist..." will likely contain recursion.
  • The groups of Nonogram currently defined in my 'answer' both tend to zero as $N\to \infty$ when considering the approximate probability of the groups out of all possible random configurations $2^{N^2}$
$\endgroup$
6
  • $\begingroup$ You don't seem to mention the relation between row and column anywhere in your answer, which is necessary in any answer, I believe. I'm not sure if I follow your argument. $\endgroup$
    – justhalf
    Commented Jun 17, 2019 at 1:41
  • $\begingroup$ I'm responding to your second section (where you defined stuffed Nonogram). I feel like it is a simplification to try to answer How many combinations of Bi share Ax? simply by looking at a column or a row independently (which to my reading is what you seem to be doing), because uniqueness is also determined by the interaction between the numbers in the rows and in the columns. But it is also possible that I misunderstood your explanation. Hope you can clarify. $\endgroup$
    – justhalf
    Commented Jun 17, 2019 at 17:16
  • $\begingroup$ What you call stuffed Nonograms seems to be simply the number of possible bit arrangement of length $N$, and you count it by separating the cases on the number of partitions (running sequence of colored boxes) it has. So your final sum would be equal to $2^N$. Or, since you add padding, it would be $2^{N-2}$. $\endgroup$
    – justhalf
    Commented Jun 17, 2019 at 18:19
  • $\begingroup$ Ahh, I got what your stuffed Nonogram means. It is referring to the case where just based on looking at only the numbers at a given row/column, we can determine the solution for that row/column. This would be the same as the number of ways to put any number of uncolored boxes on the grid, such that no uncolored boxes touch each other or the boundary. This is a very specific case of Nonogram. Then the sequence cannot be 1, 1, 1, 6. As for $N=3$ there are two possibilities (one hole and no hole). $\endgroup$
    – justhalf
    Commented Jun 17, 2019 at 20:44
  • 1
    $\begingroup$ Just to let you know, in 1D, the solution to number of possible ways to put boxes that do not touch each other is the Fibonacci series. So I'm guessing in 2D it would be much more complicated. $\endgroup$
    – justhalf
    Commented Jun 17, 2019 at 22:52
2
$\begingroup$

This is an attempt to answer your secondary question:

[I]f I choose a random (solvable) starting problem, what's the probability that it will have a unique solution?


One answer is found in the paper On Determining Paint by Numbers Puzzles with Nonunique Solutions by Ryan Mullen (Journal of Integer Sequences, vol. 12, 2009, Article 09.6.5).

A natural starting approach to try to solve a nonogram is to see whether the colors of any cells in a row (or column) are determined by the sequence of numbers given for that row (or column), which shall be called a "hint sequence". The worst-case scenario is that no cell in a row has its color determined by its hint sequence (in which case one is compelled to use the data from the columns and the rest of the rows). With this motivation, the author makes the following definition:

We define a forceless sequence to be a sequence such that no partial coloring can be determined without the aid of the rest of the puzzle.

If a nonogram puzzle has $n$ columns, how many forceless sequences are possible? Denote this number by $G_n$. The first few values of $G_n$ are listed in the OEIS entry A304179. The author also shows (see Theorem 2.1) that the number of hint sequences possible for an $n$ column puzzle is $F_{n+2}$, where $(F_n)_{n \geq 1}$ is the Fibonacci sequence given by $$ F_1 = F_2 = 1, \quad F_{n} = F_{n - 1} + F_{n - 2} \text{ for all } n \geq 2. $$ Thus, the probability that a hint sequence of an $n$ column puzzle (picked uniformly at random from the set of all possible $n$-column hint sequences) is a forceless sequence is $G_n/F_{n+2}$.

Now, the author provides estimates for the growth of $G_n$ as follows (see Theorems 3.1 and 3.2): there exist constants $c_1, c_2 > 0$ such that $$ c_1 \frac{\phi^n}{n} \leq G_n \leq c_2 \frac{\phi^n \ln n}{n} $$ for all $n \geq 2$, where $\phi = \frac{1+\sqrt{5}}{2}$ is the golden ratio. In the paper The Asymptotics of Useless Hint Sequences in Nonograms, by Daniel Berend and Shira Zucker (Journal of Integer Sequences, vol. 21, 2018, Article 18.4.2), the authors tightened the above estimates to show that there exist constants $C_1, C_2 > 0$ such that $$ C_1 \frac{\phi^n}{n} \leq G_n \leq C_2 \frac{\phi^n}{n} $$ for all $n \geq 1$. Since it is known that the Fibonacci sequence grows as $F_{n} \approx \phi^n / \sqrt{5}$, the probability $G_n / F_{n+2}$ grows as $1/n$ upto some multiplicative constant (i.e., $G_n/F_{n+2} = \Theta(1/n)$).


A different model of randomness is considered in the paper Nonograms: Combinatorial questions and algorithms by Daniel Berend, Dolev Pomeranz, Ronen Rabini, and Ben Raziel (Discrete Applied Mathematics, vol. 169, 2014, pp. 30–42). In this paper, the authors primarily develop an improved algorithm for a "line-solver", that is, an algorithm that takes a row (or column) hint sequence as an input and finds out which cells in that row (or column) have their colors determined, as well as what those colors should be.

Additionally, they establish that for a "randomly" chosen $n \times n$ puzzle, it is highly likely for large $n$ that a line solver is unable to determine the color of even one cell. Specifically, they use the following definition of a "random" puzzle:

Our interpretation is that we start with an uncoloured grid, and colour each cell in white or black with probabilities of $1/2$ each, distinct cells being independent. … Note that this is very different from taking all puzzle descriptions and picking one of them uniformly at random. (The latter possibility is more in line with Mullen’s interpretation of randomness for single lines.) To emphasize the difference, we shall henceforth refer to our version of a random puzzle as a randomly generated puzzle. (Namely, the grid is coloured uniformly at random, but some puzzles have a much larger probability of being obtained than others.)

With the above definition, the authors prove the following:

Theorem 2.Let $C$ be the number of cells of an $n \times n$ randomly generated puzzle whose colour is determined by a line solver. Then $P(C = 0) \underset{n \to \infty}{\longrightarrow} 1$ exponentially fast.

This is obviously a very different estimate from the one mentioned previously due to Mullen and Berend–Zucker, but there is no contradiction because the probability spaces in these two cases are quite different.

The authors also establish a probabilistic estimate for how many $m \times n$ puzzles have multiple solutions.

Theorem 4.There exist effective positive constants $\delta, \varepsilon > 0$ such that the probability of a randomly generated $m \times n$ puzzle ($m, n \geq 2$) to have less than $2^{\delta m n}$ solutions is less than $2^{−\varepsilon m n}$.

In particular, their proof shows that

the theorem is valid with the choices $\delta = 2^{−18}$, $\varepsilon = 2^{−20}$.

So, if a nonogram puzzle is picked randomly by picking a picture in the $m \times n$ grid uniformly at random, then (for large puzzles) it is highly unlikely that a randomly picked puzzle does not have many solutions.


Addendum

This question seems to be fairly complicated. For instance:

  • In contrast with Theorem 2 of Berend–Pomeranz–Rabini–Raziel mentioned above, their Theorem 3 shows that if one considers all the hint sequences together, then there is at least one cell whose color can be specified with a probability bounded away from zero:

    Theorem 3.For sufficiently large $m$ and $n$, a randomly generated $m \times n$ puzzle has, with probability at least $1/6$, at least one cell whose colour is uniquely determined. Moreover, one can find such a cell (with probability at least $1/6$) in constant time.

  • Solving a general nonogram is known to be NP-hard, though it is possible to find fast algorithms in special cases (Kees Joost Batenburg and Walter A. Kosters, Solving nonograms by combining relaxations, Pattern Recognition, vol. 42, 2009, pp. 1672–1683).
  • Suppose a specific nonogram is known to have a solution. Determing whether the solution is unique is an NP-complete problem (Nobuhisa Ueda and Tadaaki Nagao, NP-completeness Results for NONOGRAM via Parsimonious Reductions, Technical Report TR96-0008, Tokyo Institute of Technology, 1996).
  • To the best of my knowledge, there is no known recurrence (let alone one that is amenable to efficient computation) for the number of $n \times n$ nonograms that have unique solutions, most probably because it is not known how (and whether) one can generate every $n \times n$ nonogram with a unique solution by piecing together information from smaller nonograms. This would explain why there are very few entries in the OEIS entry A242876 mentioned in Gareth McCaughan's answer.

All in all, my intuition is that it is, at best, quite optimistic to hope for an exact count for the number of $n \times n$ nongrams that are uniquely solvable. Probabilistic estimates are more feasible, in my opinion.

$\endgroup$
2
  • 1
    $\begingroup$ Thanks, great late answer! For the record, I mention this paper which I came across after doing this search in an attempt to answer my own question. (I've had those two tabs and this one open for the last two years, in case one day I found time to do some research into it and a self-answer. After this nice answer with citations to the literature, I think I'll finally close them.) $\endgroup$ Commented Nov 4, 2021 at 9:36
  • $\begingroup$ @Randal'Thor Oh, I recall that one! It was one of several other papers I combed through to write up this answer, but it didn’t make it to the final version of my answer. Had a lot of fun researching this, thanks for pointing me to your question :) $\endgroup$
    – Namaskaram
    Commented Nov 4, 2021 at 10:37
1
$\begingroup$

I doubt it could be calculated exactly except for very small grids, but it should be possible to Monte Carlo it.

One has to be careful about bias. I don't think there is any way to choose the numbers directly such that all solvable number patterns are equally likely.

Instead you could choose a random grid picture such that every picture is equally likely (i.e. each cell 50% probability of black). Take the corresponding number pattern, and see how many solutions it has by running it through a solver. Of course this selection method will be biased towards number patterns with multiple solutions, so to compensate you can give this a weight of $1/k$, where $k$ is the number of solutions.

Then you just generate lots of random bit grid patterns. Let $t$ be the total weight of all the patterns produced, and $u$ the total weight of the uniquely solvable patterns encountered. Then $u/t$ should approach the probability you are looking for if you go on long enough.

$\endgroup$
3
  • 1
    $\begingroup$ "I doubt it could be calculated exactly except for very small grids" - why? This seems like exactly the type of problem that could potentially be approached using mathematical techniques of combinatorics. (If an answer doesn't exist in the literature and nobody on PSE can find it, I will literally ask a colleague mathematician who may be able to solve it and publish the result.) $\endgroup$ Commented Jun 13, 2019 at 13:27
  • 1
    $\begingroup$ It feels hard to me too. The unique-solution constraint is a complicated one. Compare with the problem of enumerating partitions, which seems kinda related but much much easier; there's an explicit formula for that but it took Hardy and Ramanujan to find it. (In fact they didn't quite find the explicit formula; Rademacher found that by improving on their work.) $\endgroup$
    – Gareth McCaughan
    Commented Jun 13, 2019 at 13:33
  • 1
    $\begingroup$ @Randal'Thor The problem of deciding whether or not a number pattern has a unique solution has the feel of an NP-hard problem to me. If you have an ambiguous number pattern, and want to apply a tweak to make it uniquely solvable, such a tweak can be done almost anywhere on the grid. The fact that local changes have global effects is one feature usually seen in NP-hard problems. $\endgroup$ Commented Jun 14, 2019 at 6:04

Not the answer you're looking for? Browse other questions tagged or ask your own question.