4
$\begingroup$

Table of contents

  • [$1.$] Definition
  • [$2.$] Implication. (Motivation.)
  • [$3.$] Question. & Computed data.
  • [$4.$] Solutions of simplified variations.
  • [$5.$] Progress on solving the question.
  • [$6.$] Characterization of critical region?


[$1.$] Definition.

$T(k, n)$ = length of the longest consecutive run of sums of $k$-subsets of first $n$ primes.
Where $n\ge 0$ and $k=0,\dots,n$ and specially $T(0, n)=T(n, n)=1$.

Notice that this triangle is symmetric: $T(k,n)=T(n-k,n)$.

Example: If $n=4$, we have first four primes: {2,3,5,7}. Then for example, all possible $k=2$ subsets are: {2,3},{2,5},{2,7},{3,5},{3,7},{5,7}, and their sums are, when sorted: "5,7,8,9,10,12".

The longest consecutive streak there is "7,8,9,10", of length four $\implies T(2,4)=4$.


[$2.$] Implication. (Motivation.)

$T(k,n)$ is the length of longest consecutive run of sums of $k$-subsets of first $n$ primes.

Let $t_{k,n}$ be the smallest (first) sum of this longest consecutive run. It holds:

If $N$ is the number such that all prime gaps below it are $\le T(k,n)$, then all numbers in the interval $[t_{k,n}+p_{n+1},N]$ are "trivially" a sum of exactly $k+1$ distinct primes, where $p_{n+1}$ is the $(n+1)$th prime.

Example: If $(k=9,n=12)$, we get $T(9, 12)=42$ and $t_{k,n}=138$, where $p_{13}=41$. All prime gaps below $N=15683$ are $36\lt42$. This means all numbers $\in[179,15683]$ can be "trivially" represented as sums of exactly $10$ distinct primes.

Alternatively, we just simply observed that $9$-combinations of first $12$ primes are sufficient to cover all prime gaps in the range $[179,15683]$, and thus reach any number in that range when combined with some prime $p_{(i\gt 12)}$, since the set of those combinations contains $42$ consecutive values, which is more than enough, since largest prime gap in that range is $36\lt 42$.

This example was used in an answer to the linked question. You may notice that this is a generalization of the linked answer, which was an inspiration for defining $T(k,n)$.

We could now continue for example, to observe same $k$ but different $n$, to cover more ranges of numbers that can be "trivially" represented as a sum of exactly $k+1$ distinct primes.


[$3.$] Question. & Computed data.

Is a "closed form" for calculating (determining) the values of $T(k, n)$ possible?

Due to symmetry $T(k, n)=T(n-k, n)$, we can assume that $k\le \lfloor n/2 \rfloor +1$.

We already specially defined $k=0$. Moving on, it is not hard to see:

  • $T(1,n)=1;n=1$, and $T(1,n)=2,n\ge2$.
  • $T(2,n)=1,2,4,4;n=2,3,4,5$, and $T(2,n)=5,n\ge 6$.

But for $k= 3$ already, a closed form does not appear to be easy:

$$T(3,n)=1,2,4,6,10,18,22,22,40,42,46,60,66,70,70,70,100,100,106,120,132,\dots$$

Computed data for $n=0,\dots,100$ (rows) and all $k=0,\dots,n$ (columns) is listed here. - Thanks to vujazzman's suggestion of using dynamic programming, instead of re-calculating every step over and over, where I was initially wasting time.

Note that, if a "closed form" is possible, it must depend on prime gaps in some way.


[$4.$] Solutions of simplified variations.

Lets generalize the definition, to use some set $\mathbb A$, instead of the set of primes $\mathbb P$.

That is, define $T(k,n;\mathbb A)$ where $\mathbb A$ is some countable set of natural numbers, as the longest consecutive run of sums of $k$-subsets of first $n$ elements of the given set.

Then, for example, if we use natural numbers, we have a simple closed form:

$$T(k, n; \mathbb N)=k(n-k)+1$$

Another example, let $D=\{2,3,5,7,9,\dots\}$ be the set of odd numbers $\gt 1$ and $2$. Then:

$$ T(k, n; D)= 2[k(n-k)-n+2]$$

For $k,n\gt 0$. Otherwise, for either $n=0$ or $k=0$, we define it as $1$.

We can now keep removing numbers from the last example. That is, define $\mathbb P|_{r}$ as the set of first $r$ primes, and all numbers not divisible by them. Then, $ D = \mathbb P|_{1}$, and $\mathbb P|_{\infty}=\mathbb P$.

I've looked into finding closed forms of some $\mathbb P|_{r},r\in \mathbb N$, and observed that there are patterns related to finite subsets of prime gaps. (Based on computed terms.)

This motivated be to write $T(k, n)$ triangle as a $m\times m$ table, then transform it by subtracting consecutive terms horizontally and vertically. This is discussed in the next section.


[$5.$] Progress on solving the question.

We define a $m\times m$ table $T$, as a matrix obtained from $k=0,\dots,m$ and $n=k,\dots,k+m$ values of $T(k,n)$. Let $i,j=0,\dots,m$ be the indices of rows/columns.

Now, we get $T'$ by taking the differences of consecutive terms horizontally (vertically) of $T$, then $T''$ by taking differences of consecutive terms vertically (horizontally) of $T'$.

We can now use $T''$ to reconstruct $T'$ to reconstruct $T$, and finally to get $T(k,n)$.

Now we want to solve for a pattern in terms of the table (matrix) $T''$. Most of its values are now given explicitly as sequences of consecutive prime gaps, whose starting value (offset) is given by the row (column). A closed form seems possible!

But there is a problem. There exists a "critical" region of terms which are neither trivial (zero), nor given by prime gaps. These terms are the only thing now preventing a formulation of a "closed form".

I have made a script to compute and color $m=75$ table $T''$ in excel: (click to open, then click to zoom in, to see exact values, in this 2400x2400 picture of the table)

enter image description here

Where $\color{red}{\text{red}}$ region are the terms given by prime gaps, $\color{green}{\text{green}}$ are trivial (zero) terms, and black terms represent the "critical region", where I do not see any clear patterns.

My question here, now boils down to,

Can we find a "closed form" for the "critical region" terms? (To combine it with the pattern for prime gaps, and obtain a "closed form" pattern for the entire table $T''$.)

That is, can we calculate the terms in and near the critical region, without relying on computing the longest runs of subsets? - And instead, calculate them by defining pattern sequences, similar to prime gaps? (Prime gaps exactly represent the red region - can we solve the black region?).


[$6.$] Characterization of critical region?

If we can't easily fully characterize the critical region, are there things that we can say about it?

I have made some observations about the critical region, but I wasn't able to characterize it fully. Let "strips" refer to rows/columns of $T''$. I have observed that:

Sum invariant of strips. Looks like, the sums over individual strips (rows/columns), are invariant, regardless if terms belong to expected prime gap pattern or to the critical region pattern.

That is, we know the critical region terms differ from expected prime gaps. - Now, we also know that they still maintain the sum of those expected prime gaps.

For example, observing the included table image, in column $\text{G}$ we have an example of an isolated part of a critical strip, with values $(10,0)$, in $18,19$th rows, instead of the expected prime gaps $(4,6)$. But both, sum to the same expected value: $10+0=4+6$.

Or another example, in column $\text{E}$, in rows $15-21$, we have ciritcal terms $(20,12,-18,0,20,-4,0)$ instead of expected prime gaps $(2,6,4,2,6,4,6)$, and both sequences sum to $30$.

This seems to hold in all strips, and in individual isolated parts, if we observe a sufficient amount of surrounding terms. This means, we could assume all critical terms are prime gaps, and get a closed form approximation for $T''$, that will only be incorrect in the critical region.

The only exceptions seem to be, parts where critical region is "very mixed" with the trivial region.

This means, we can establish a "closed form" $T(k, n)$ approximation, that is ("almost") exactly correct only if $k$ is sufficiently close to $n$. (Which is most of the time, if you observe the ratio of areas of red and black regions in the included table image.)

We still don't have an exact "closed form", that is, a full characterization of $T(k, n)$, but, we can now analyse the asymptotics of $T(k, n)$, with such an approximation.

What remains, is, to characterize the beginnings and endings of isolated critical region strips, and structure inside them. - So far, I only know about the invariant sum property.

$\endgroup$
3
  • $\begingroup$ Did you consider the situation where $k$ is fixed and $n$ goes to infinity? The Goldbach conjecture says all numbers can be written as a sum of $3$ (not necessarily distinct) primes. This should be translatable into some lower bounds for $T(3,n)$ for $n$ sufficiently large. $\endgroup$
    – quarague
    Commented Oct 8, 2019 at 10:47
  • $\begingroup$ @quarague - very interesting idea! but "not necessarily distinct" is an obstacle though... In the definition of $T(k,n)$ we're always adding $k$ distinct primes. $\endgroup$
    – antkam
    Commented Oct 8, 2019 at 20:41
  • $\begingroup$ @quarague Looks like $T(k, n)$ can be approximated with sums of sums of offsetted prime gaps sequences. This approximation should be almost exactly correct, except for sufficiently (very) small $k$ (relative to size of $n$). $\endgroup$
    – Vepir
    Commented Oct 16, 2019 at 11:53

1 Answer 1

1
$\begingroup$

Write $A(n,k)$ to be the set of $k$ subset sums of the first $n$ primes (so that $T(n,k)$ is the length of the longest contiguous subsequence in $A(n,k)$). Then $A(n,k) = A(n-1,k) \cup (A(n-1,k-1) + p_n)$. One can structure a program to compute $A(n,k)$ via this recursive relationship in a dynamic programming style.

Edit: As pointed out in the comments, the original complexity bound I gave is wrong; the largest element of $A(n,k)$ is at most $k p_k \sim kn\log n \le n^2 \log n$. If $A(n,k)$ is represented as a dense bit vector, then all $T(n,k)$ for $k\le n\le N$ could be computed in $O(N^4\log N)$ time. Since only the $A(n,k)$ for the previous and current $n$ needs to be stored, the space needs are $O(N^3\log N)$. This still allows computation of the values in the regime OP wants within minutes.

$\endgroup$
7
  • $\begingroup$ This will be faster, but I think space is a problem regardless? Storing dynamically last two rows of $A(n,k)$ will be $O(2N+1)\sim O(N)$ "sets of $k$ sums", but there are $O(2^N)$ of "sums" in the $N$*th* row in total (sum binomial coefficients $k=0...N$), so we will be storing $O(2^N)$ sums of primes in the last $N$*th* row? - without dynamically storing $A(n,k)$, the space comes back to storing $O((N,N/2))$ sums when calculating only $T(n, k)=T(N,N/2)$, where $(N,N/2)$ is the largest binomial coefficient in the $N$*th* row; This is still too much sums to store for large $N$? - My idea be... $\endgroup$
    – Vepir
    Commented Oct 8, 2019 at 8:12
  • $\begingroup$ ...My idea behind posting this question was to try to find mathematical results that will allow us to calculate $T(n, k)$ without going over all of $(n, k)$ sums (subsets), where $(n,k)$ is the binomial coefficient. For example, for $k=1,2$ we have $T(n,k)=2,5$ always for all $n\ge 2,6$, respectively, so for $k=1,2$ and all $n$, we have $O(1)$ as these constant results can be proven. Similarly, I wonder if $T(n,k=3)=1,2,4,6,10,18,22,22,40,42,\dots$ or larger $n$ have some underlying structure so we do not need to examine all $k$-subsets and their sums? $\endgroup$
    – Vepir
    Commented Oct 8, 2019 at 8:53
  • $\begingroup$ But do correct me If I'm missing anything. (Also, too late to edit the last sentence in above comment: "larger $k\ge 3$" should be there instead of "larger $n$".) $\endgroup$
    – Vepir
    Commented Oct 8, 2019 at 9:09
  • $\begingroup$ @Vepir - just a comment on the space requirement: any sum in $A(n,k)$ must be $< k p_n \approx k n \log n$, so the space requirement for the entire row $A(n,*)$ is $< n^3 \log n$, i.e. never exponential. Also, in case it helps, you can store consecutive segments by a data structure with just two numbers. Once a segment forms, the recurrence never breaks it apart. $\endgroup$
    – antkam
    Commented Oct 8, 2019 at 13:42
  • 1
    $\begingroup$ @Vepir - you have $2^n$ summations, but they cannot all result in different integers. Each sum is $< k n \log n$, so the number of unique sum-values (for any $A(n,k)$ set) is at most also $k n \log n$, because these are all positive integers. E.g. consider the central coeff $k=n/2$. There are ${n \choose n/2}$ sums, which is an exponential number of summations, but the sums are all integers in the range from $2$ to $< k n \log n$, so the number of unique sums (which you have to store) is $< k n \log n /2 $ for $A(n, k)$. Perhaps I'm missing something? $\endgroup$
    – antkam
    Commented Oct 8, 2019 at 14:42

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .