Saturday, April 17, 2021

BigInteger Factoring

On a recent Sunday afternoon I was writing some code snippets in LINQPad to play with Fermat's Little Theorem and Carmichael Numbers. As part of the experiment I wrote a simple method to factor integers, just using trial division up to the square root. Although it wasn't completely dumb trial division, as my sequence of trial divisors is {2,3,6n±1}, which produces: 2 3 5 7 11 13 17 19 23 25 29 31 etc.

Note: A wheel could be used for more efficient trial division, but for the purposes of my experiments it was overkill.

What surprised me was the overall speed of the trial divisor factoring using the BigInteger struct. I expected factoring time to become unreasonably slow at about 32-bits, but it was factoring 48-bit (15-digit) numbers in sub-second time.

As a further experiment I fed progressively larger primes into the factor method to see when it would hit the "exponential wall" and become impractical.

NLog10SecsFactors
100,0035.00.00PRIME
1,000,0036.00.00PRIME
10,000,0197.00.00PRIME
1,000,000,0079.00.00PRIME
50,000,000,02110.70.01PRIME
100,000,000,00311.00.01PRIME
500,000,000,02311.70.04PRIME
1,000,000,000,03912.00.02PRIME
5,000,000,000,05312.70.05PRIME
10,000,000,000,03713.00.07PRIME
50,000,000,000,05313.70.16PRIME
100,000,000,000,03114.00.24PRIME
500,000,000,000,05714.70.52PRIME
10,000,000,000,000,06116.02.29PRIME
20,000,000,000,000,00316.33.26PRIME
500,000,000,000,000,02117.716.29PRIME
1,000,000,000,000,000,00318.022.84PRIME
2,000,000,000,000,000,05718.332.53PRIME

As the table shows, the simple algorithm hits the 30-second mark at around 18-digit numbers and I stopped there. My PC is a few years old and not particularly powerful, and the factoring was single-threaded, so I'm astonished that the BigInteger class performed so well for so long.

The code

// Naive factoring loops
IEnumerable<BigInteger> Factor(BigInteger n)
{
  double sqr = Math.Sqrt((double)n);
  foreach (long div in Divseq())
  {
    if (n == 1)
    {
      break;
    }
    if (div > sqr)
    {
      yield return n;
      break;
    }
    while (n % div == 0)
    {
      n = n / div;
      yield return div;
    }
  }
}

// Endless sequence of test divisors: 2, 3, 6N±1 ...
IEnumerable<long> Divseq()
{
  yield return 2;
  yield return 3;
  for (long i = 6; ; i += 6)
  {
    yield return i - 1;
    yield return i + 1;
  }
}

Out of curiosity I used Visual Studio 2022's performance profiler to find out what was happening with objects and garbage collection while running tight loops that created BigInteger structs. It turns out that vast numbers of uint arrays are created to hold the internal values of the BigIntegers. You can see the GC reclaiming large amounts of memory every couple of seconds, which results in a typical sawtooth shape in the memory graph. Despite this GC stress, the performance is still surprisingly good.

Note that the BigInteger class is not really needed for the experiments shown above because the long (Int64) is large enough to hold all of the numbers involved. However, I was playing around with much larger numbers in other code not shown in this article. If you convert the Factor method above to use long instead of BigInteger then it runs consistently about 4 times faster.


No comments:

Post a Comment