Thinking through details out loud more time.

1. Sieve with small primes (<10K) marking off all numbers that are composite.

2. Perform Fermat (base 2) primality test

The set of numbers that are composite after 1. and 2. are PSP(2), each N will have at least one factor p < sqrt(N) for which

N % (p * ord2(p)) = p.

3. So we augment the first sieve by marking off mark off `i * (p * ord2(p)) + p` for large primes, up to sqrt(limit), this is faster than sieving with these numbers because ord2(p) ~ p.

Any number that passes that would pass 2. (e.g. PSP(2)) has some factor p and we've marked off all multiples that would fail 2. so we should be good!

---

I tried to understand what the saving in runtime of sieving by p=10,000 vs sieving all primes and I got a counter intuitive result.

From

Mertens' 2nd theorem I believe the runtime to sieve number less than limit up to P is `limit * (0.577 + log(log(P))`

When I evaluate this with limit=2^65 and P=10,000 vs limit=2^65 and P=2^32.5

2^65 * (0.577 + log(log(10,000)) = 4.3e19

2^65 * (0.577 + log(log(2^32.5)) = 5.7e19

It says it's only 30% harder to sieve all the numbers (at which point we can skip the Fermat test completely).

Maybe this doesn't work in the real world because of cache behavior? Or because the sieve is segmented?

Or am I missing something else?