Links

Large Integer Factorization Algorithm and Its Practice

08/02/2023

Large Integer Factorization

In 1977, RSA (Rivest–Shamir–Adleman), a cryptosystem based on factorizing large integers was born. It was named after its three creators, Ron Rivest, Adi Shamir, and Leonard Adleman, using their surnames' initials.
With the significant contributions of the RSA algorithm to public-key cryptography and its profound impact worldwide, the three creators were then awarded the Turing Award in 2002.
Until today, the integer factorization problem remains one of the most crucial subjects in almost all aspects of public-key cryptography security.
The security of the RSA algorithm depends on large number factorization, which is
to factorize a large integer into a product of several large prime factors.
What's interesting is that, although this is a "well-known" challenge that has been practically demonstrated, it is still unknown whether it is a problem at
PP
class or
NPNP
class.
Some typical large integer factorization algorithms targeting small factors are:
  • Pollard's rho algorithm
  • Pollard’s p-1 algorithm
  • ECM algorithm (known as Lenstra elliptic curve factorization or elliptic-curve factorization method)
In this article, we will first introduce Pollard's p-1 algorithm and the ECM algorithm. Additionally, we will present a specific implementation of the ECM algorithm called GMP-ECM.

Pollard’s p-1 Algorithm

Powersmooth: If all the prime powers
pvp^v
of the number
mm
are B-smooth, that is
pvBp^v \le B
,then
m!m!
is B-smooth.
Pollard’s p-1 algorithm is an integer factorization algorithm using Fermat's little theorem, invented by John Pollard in 1974. According to Fermat's little theorem,
aK(p1)1modpa^{K(p-1)} \equiv 1 \mod p
, if
pp
is a factor of
nn
which is to be factorized, then we can express it as
pnp|n
. For
pp
, if
d=ap110modpd = a^{p-1} - 1 \equiv 0 \mod p
, then
dd
is a multiple of
pp
, therefore
p=gcd(d,n)=gcd(ap11,n)p = gcd(d, n) = gcd(a^{p-1} - 1, n)
For data selection, the Pollard p-1 algorithm employs random numbers. If no result is obtained through random selection, it then generates
dd
from the prime numbers (less than
BB
(B-powersmooth)) until the prime number list is exhausted.

STEPS

Input:
nn
(composite number)
Output:
nn
or a failed non-trivial factor
  1. 1.
    Select a smoothness bound
    BB
    .
  2. 2.
    Definition
    M=qBqlogqBM = \prod_{q \le B} {q^{\left \lfloor log_q{B} \right \rfloor }}
    ,
    qq
    is a prime number.
  3. 3.
    Randomly select a number
    aa
    that is coprime with
    nn
    .
  4. 4.
    Calculation
    g=gcd(aM1,n)g = gcd(a^M - 1, n)
  5. 5.
    • If
      1<g<n1 < g < n
      , return
      gg
      .
    • If
      g=1g=1
      , choose a larger value for
      BB
      and return to Step 2, or return failure.
    • If
      g=ng=n
      , choose a smaller value for
      BB
      and return to Step 2, or return failure.

ECM Algorithm

In Pollard's p-1 method, it requires that
nn
has a factor
pp
, and that
p1p-1
has relatively small factors, allowing us to find a factor of
nn
using
p=gcd(ap11,n)p=gcd(a^{p-1} - 1, n)
. However, when the factors of
p1p-1
are large, Pollard's p-1 method is less likely to successfully factorize
nn
.
To improve this method, the ECM (Elliptic Curve Method) introduces random elliptic curves to transform the original multiplicative group into a group consisting of points on the elliptic curve.
The elliptic curve method is a number-theoretic algorithm that utilizes the properties and arithmetic rules of points on an elliptic curve to search for factors that satisfy certain conditions.

Overview of ECM Algorithm

Assuming
pp
is a prime factor of
nn
, considering the elliptic curve
Ea,bmodpE_{a,b} \mod p
and according to Hasse's theorem, the order
gg
of the elliptic curve satisfies the following condition:
g(p+1)<2p|g-(p+1)|<2 \sqrt{p}
. When the parameters
aa
and
bb
of the elliptic curve vary, the order of the elliptic curve will fall within the range
[p+12p,p+1+2p][ p + 1 - 2\sqrt{p}, p + 1+ 2 \sqrt{p}]
When using ECM to find factors of
nn
, there are three steps:
  1. 1.
    Select a
    BB
    and calculate
    m=lcm(1,2,,B)m = lcm(1,2,\dots ,B)
    .
  2. 2.
    Select a random curve out of
    Z/NZZ/NZ
    and a random point
    PP
    on the curve.
  3. 3.
    Calculate
    mPmP
    . During the calculation, if the addition of points on the corresponding elliptic curve cannot be calculated, then a factor of
    nn
    has been found. Otherwise, you can select another curve for calculation, or return failure.
The principle behind the ECM algorithm finding factors of
nn
lies in the fact that when a randomly chosen elliptic curve order
gg
is B-smooth, during the calculation of
mPmP
, if we encounter
gcd(v,p)=pgcd(v,p) = p
or
gcd(v,q)=qgcd(v,q) = q
, it will result in a non-trivial factor of
gcd(v,n)gcd(v,n)
.

GMP-ECM: An Implementation of ECM Algorithm

The GMP-ECM implementation makes optimizations based on the original ECM method.
The overall algorithm process in GMP-ECM has been optimized. In GMP-ECM, the calculation of factors is divided into two steps, each step selecting two values
(B1,B2)(B1, B2)
.
The algorithm is divided into two stages, where calculations are performed using
B1B1
and
B2B2
, respectively. The calculations in stage 1 are similar to the ECM algorithm.

Stage 1

First, calculate the product of the point
PP
on the elliptic curve
Ea,bE_{a,b}
:
Q=ΠπB2π(logB1)/(logπ)P0Q = \Pi_{\pi \le B_2} \pi^{\left \lfloor (log B_1)/(log \pi) \right \rfloor} P_0
In Stage 1, GMP-ECM has optimized the selection and calculation of the elliptic curve.

Random Elliptic Curve Generation

During the factorization, when the order of the randomly generated elliptic curve varies within the range
[p+12p,p+1+2p][ p + 1 - 2\sqrt{p}, p + 1+ 2 \sqrt{p}]
, and the random elliptic curve is B1-smooth, it can factorize
nn
.
In GMP-ECM, certain methods are employed for elliptic curve generation to ensure that the order of the generated elliptic curve satisfies certain properties, for example:
  • Suyama's form: Ensure that the order
    gg
    of the elliptic curve is divisible by 12.
  • Montgomery's form: Ensure that the order
    gg
    of the elliptic curve is divisible by 3.

Elliptic Curve Operation

In GMP-ECM, the Montgomery curve is used, and the multiplication or addition of points follows the following rules:
  • P(xP::zP),Q(xQ::zQ)P+QP(x_P :: z_P), Q(x_Q :: z_Q) \rightarrow P + Q
  • Calculate:
    xP+Q=4zPQ(xPxQzPzQ)2x_{P+Q} = 4z_{P-Q} * (x_P x_Q - z_P z_Q)^2
  • Calculate:
    zP+Q=4xPQ(xPzQzPxQ)2z_{P+Q} = 4x_{P-Q} * (x_P z_Q - z_P x_Q)^2
  • P(xP::zP)2PP(x_P :: z_P) \rightarrow 2P
  • Calculate:
    x2P=(xp2zP2)2x_{2P} = (x_p^2 - z_P^2)^2
  • Calculate:
    z2P=(4xPzP)[(xpzP)2+d(4xPzP)]z_{2P} = (4x_P z_P)[(x_p - z_P)^2 + d(4x_P z_P)]
    ,
    d=(a+2)/4d = (a+2) / 4
    ,
    aa
    is a parameter in the elliptic curve.

Stage 2

In the first stage of GMP-ECM, the main task is to calculate the point
QQ
on the elliptic curve
EE
. If it fails, meaning that
gcd(n,zQ)=1gcd(n, zQ) = 1
, then the search range is expanded to
[B1B2][B1,B2]
in the second stage. In this new range, there exists a prime number
π\pi
,
πQ=OEmodp\pi Q = O_E \mod p
.
The main idea of Stage 2 is the meet-in-the-middle strategy:
  • In Stage 2, we calculate two points:
    σQ=(xσ:yσ)\sigma Q = ( x_\sigma : y_\sigma)
    and
    τQ=(xτ:yτ)\tau Q = ( x_\tau : y_\tau)
    .
  • If
    σQ+τQ=OEmodp\sigma Q + \tau Q = O_E \mod p
    , it implies that
    xσ=xτmodpx_\sigma = x_\tau \mod p
    . Therefore, calculating
    gcd(xσxτ,n)gcd(x_\sigma - x_\tau, n)
    will obtain the factor
    pp
    .

Algorithm Complexity

The ECM method is expected to take time of
O(L(p)2+o(1)M(logn))O(L(p)^{\sqrt{2} + o(1)} M(\log n))
to find a factor
pp
of a number
nn
, where
L(p)=elogploglogpL(p) = e^{\sqrt{\log p \log \log p}}
.
Here,
M(logn)M(\log n)
represents the complexity of multiplication modulo
nn
.
In the actual calculation process, due to the algorithm's randomness, the time to find the factor
pp
of
nn
may vary. Additionally, the ability of the algorithm to factorize and the time required for factorization also depend on the values of
B1B1
and
B2B2
.
For the calculations, you can refer to the recommended values of
B1B1
and
B2B2
as specified in the GMP-ECM documentation.

Example

Take a large integer
NN
(2048 bit) as an example:
N = 0xA2862FB145AC7CE580E0BFF3CEFF42646050F8C611D36A3026E6EFB433B5CDD9EEFBA893E3F25C23A4951BA20992680162934CBDB6876CC791738C140C6EA6EC82938F7E18C5F0760367CF20883DCAB1D6885E222BBC7A1B5D434FD9FC148E387FA9AD69CE708FDDDE3E963F9AB2AF2046A37D7DBA21BC92E6F2A631C3E7770C77C95FAD6F1DC60D9F0645AEF2CC5D03D2151E35443FE0F90F1E2EAE58489C4450C8281EDCC58DDB409C306797C616954ACABCE4881E40ABDD2689A9F7596F29CD5CB42C752DA9306E7DB87CAC8957E3CA165421CF9E1B7220759A10588B386E33FD8E762E92C9F79D50150EF5FCA5F411DE23B8DFFE47A95D48ADDCF4797565
Use the tool which uses GMP-ECM to factorize:
ecm -c 2 25e4 1.3e8 -mpzmod threads: 2 mod: 1
A2862FB145AC7CE580E0BFF3CEFF42646050F8C611D36A3026E6EFB433B5CDD9EEFBA893E3F25C23A4951BA20992680162934CBDB6876CC791738C140C6EA6EC82938F7E18C5F0760367CF20883DCAB1D6885E222BBC7A1B5D434FD9FC148E387FA9AD69CE708FDDDE3E963F9AB2AF2046A37D7DBA21BC92E6F2A631C3E7770C77C95FAD6F1DC60D9F0645AEF2CC5D03D2151E35443FE0F90F1E2EAE58489C4450C8281EDCC58DDB409C306797C616954ACABCE4881E40ABDD2689A9F7596F29CD5CB42C752DA9306E7DB87CAC8957E3CA165421CF9E1B7220759A10588B386E33FD8E762E92C9F79D50150EF5FCA5F411DE23B8DFFE47A95D48ADDCF4797565
********** Factor found in step 2: 1021791499165844943393503 21 52761ms
Then, we successfully find the prime factor 1021791499165844943393503.