Background
Some crypto algorithms rely principally on the difficulty of factoring two prime numbers $p$ and $q$. For the purpose of this discussion, assume $p<q$.
(The top of this article is background and and explanation with examples to frame my question; the questions for the SE community are at the bottom.)
What can we learn if we setup a quadratic equation "to win" as follows:
1. $(x-p)(x-q)=0$ such that the zeros of the function represent $p$ and $q$.
This factors and simplifies as:
2. $x^2-(p+q)x-pq$ where the term $pq$ is publicly known, but $p$ and $q$ are unknown.
We can solve for $p$ and $q$ by getting zeros from the quadratic equation, so solving for $p$ and $q$ is reduced to finding the unknown value of $(p+q)$:
3. $ -b \pm\sqrt{b^2-4pq}\over{2} $ where $b$ is $(p+q)$.
Thus, if we can find $(p+q)$ given $pq$ then we can find $p$ and $q$. Since $p$, $q$, and $(p+q)$ are unknown, lets rename for convenience as follows:
3a. $b = p+q$
Noting that pq is public, lets define it as $N$:
3b. $N = pq$
Thus, $b$ is the only unknown to solve for $p$ and $q$:
4a. $p = -b - \frac{\sqrt{b^2-4N}}{2} $
4b. $q = -b + \frac{\sqrt{b^2-4N}}{2} $
Since we don't know $b$, lets create function $F_p(b)$, which is the same as (4a) when $b=p+q$:
4c. $F_p(b) = -b + \frac{\sqrt{b^2-4N}}{2} $
Let us say that $b$ is "optimal" when $b=p+q$ since we can, then, trivially solve for $q$. Thus, when we guess $b$, $F_p(b)$ provides a guess of what $p$ would be if $b$ were optimal.
To understand the behavior of $F_p(b)$, we graph over $b=0..(p+q)$ with contrived and known values of $p$ and $q$ to see if any information about $(p+q)$ can be discerned. Of course when $b$ is optimal (we guess right), then $F_p(p+q) = p$ and we can trivially solve for $q$.
Example 1: When $\frac{q}{p} < 4$, we find $q < 2\sqrt{N}$, thus $q$ is bound by an asymptote, reducing the factoring search space:
For the purposes of graphing, lets choose:
- $p = 131$
- $q = 337$
Therefore, optimally:
- $b = 131+337 = 468$
The graph of $y=F_p(b)$ from $b=0..468$ is as folows:
$y=F_p(b)$ from 0..468" />
The inflection point can be found by taking the derivative and finding the asymptote:
5. $\frac{b}{2\sqrt{b^2 - 4N}} - \frac{1}{2}$ : derivative of $F_p(b)$
The derivative's asymptote is infinite as the bottom approaches 0, so solving $2\sqrt{b^2 - 4N}=0$ for $b$ we get:
6. $2\sqrt{N}$ : Asymptote of $F_p(b)$ given $N$.
Plugging in $N$ into (6) we find that the asymptote is at $b\approx 420.22$
Note that both $p$ and $q$ are on the left side of the asymptote which reduces our search space for $b$ as $b < 2\sqrt{N}$. This is possible because we can find the asymptote by (6) without actually knowing $p$ and $q$.
Example 2: When $\frac{q}{p} > 4$, we find $q > 2\sqrt{N}$, thus $q$ is unbound by an asymptote:
For this example lets choose $q$ as the next larger prime than $4p$:
- $p = 131$
- $q = 541$
Therefore, optimally:
- $b = 131+541 = 671$
Plugging in $N$ into (6) we find that the asymptote is at $b\approx 532.43$. Thus, $q$ is beyond the asymptote.
Summary
In Example 1 we could simply take $2\sqrt{pq}$ from (6) and, since $\frac{q}{p}<4$ then the search space for $q$ is limited by the asymptope from (6).
In Example 2, $q$ is beyond the asymptote from (6) and thus $q$ is harder to find as it is unbound by this evaluation.
In both examples, $2\sqrt{pq} < (p+q)$ so a good lower starting guess for $b$ is $2\sqrt{pq}$
Questions:
I know in practice that $p$ and $q$ are chosen on the order of $2^{2048}$ or bigger, so it is the relationship between the size of $p$ and $q$ with respect to each other I would like to understand, and therein lies my question:
- Is this behavior of $\frac{q}{p}<4$ that leads to a weaker product a known property?
- Is it related to quadratic residues? If so, how?
- If this is a weakness, has an attack been proposed?
- Does $2\sqrt{pq} < (p+q)$ always hold true as a lower-bound to begin the search for $b$? Counter-example?
- In modern cryptography is it already best practice to choose $q$ at least some notably larger multiple from $p$?
- What is the recommended practice? If different that $4p$, how was the proposed distance chosen?
- Is the search for $(p+q)$ going to be more, less, or equal in complexity to current prime factorization algorithms?