My ideas was to stop doing stupid raytracing of spheres and planes and move forward to the real thing. I did read a couple of papers on how real men do raytracing, and I quickly learnt about modern raytracing, and their kd-trees, bih or bvh acceleration structures, and their super fast raytracing power, which where a couple million rays per second (per core) in modern CPUs. That was the time where some people started exploring raytracing in the GPU and reached some pretty decent 10 to 50 million rays per second. However I wanted to keep doing it in the CPU, as GPUs at the time had no more than one gigabyte of memory, which was completelly insuficient for the type of 3d models I wanted to render. In fact I was quite motivated to be able to move the medium-size and big size models I had at the time at work, whicih where in the order of 100 to 200 million triangles.
Following Wald's and others work I quickly got similar results to theirs, and I could render these massive models at 20 frames per second in quadcore machines. Luckily, I also had access to higher performance computers with up to 32 cores udner Windows 64 bit, so in those machines I could really push screen resolutions up. In fact I got perfectly linear scalability up to 64 cores, which I was very proud of, as that's not as a trivial thing to do as it seems to be.
I implemented my own API to write shaders for the tracer (light and surface shaders) as well as custom primitives. In the end the project was pretty nice, but it had one big problem, which was shared with all the other high performance raytracers at the time (and still today, as far as I know): it was very fast for primary and shadow rays, but very slow for other type of rays needed for montecarlo effects (ambient occlusion, global illumination, fuzzy reflections, depth of field, etc).
Ambient occlusion on the Atrium. Click to get enlarge.