Show HN: I've made a Monte-Carlo raytracer for glTF scenes in WebGPU
github.comThis is a GPU "software" raytracer (i.e. using manual ray-scene intersections and not RTX) written using the WebGPU API that renders glTF scenes. It supports many materials, textures, material & normal mapping, and heavily relies on multiple importance sampling to speed up convergence.
Very cool. I did a similar project with wgpu in Rust - https://github.com/bezdomniy/Rengin nice to find your projects to see where I can improve!
> "GPU "software" raytracer"
> WebGPU
> this project is desktop-only
Boss, I am confused, boss.
I'm using WebGPU as a nice modern graphics API that is at the same time much more user-friendly and easier to use compared to e.g. Vulkan. I'm using a desktop implementation of WebGPU called wgpu, via it's C bindings called wgpu-native.
My browser doesn't support WebGPU properly yet, so I don't really care about running this thing in browser.
That's a fascinating approach.
And it gets me a bit sad about the state of WebGPU, however hopefully that'll be resolved soon... I also on Linux am impatiently waiting for WebGPU to be supported on my browser.
Do you have a link that runs in the browser?
Nope, this project is desktop-only
You should try building it with Emscripten. SDL2 is supported.
[flagged]
This is a completely side question, but just because it always astonishes me how "real" raytraced scenes can look in terms of lighting, but it's too complex/slow for video games.
How far have we gotten in terms of training AI models on raytraced lighting, to simulate it but fast enough for video games? Training an AI not on rendered scenes from any particular viewpoint, but rather on how light and shadows would be "baked into" textures?
Because what raytracing excels at is the overall realism of diffuse light. And it seems like the kind of thing AI would be good at learning?
I've always though, e.g. when looking at the shadows trees cast, I couldn't care less if the each leaf shape in the shadow is accurate or entirely hallucinated. The important things seem to be a combination of the overall light diffusion, combined with correct nearby shadow shapes for objects. Which is seems AI would excel at?
This was a recent presentation from SIGGRAPH 2024 that covered using neural nets to store baked (not dynamic!) lighting https://advances.realtimerendering.com/s2024/#neural_light_g....
Even with the fact that it's static lighting, you can already see a ton of the challenges that they faced. In the end they did get a fairly usable solution that improved on their existing baking tools, but it took what seems like months of experimenting without clear linear progress. They could have just as easily stalled out and been stuck with models that didn't work.
And that was just for static lighting, not every realtime dynamic lighting. ML is going to need a lot of advancements before it can predict lighting whole-sale, faster and easier than tracing rays.
On the other hand ML is really really good at replacing all the mediocre handwritten heuristics 3d rendering has. For lighting, denoising low-signal (0.5-1 rays per pixel) lighting is a big area of research[0] since handwritten heuristics tend to struggle with such little amount of data available, along with lighting caches[1] which have to adapt to a wide variety of situations that again make handwritten heuristics struggle.
[0]: https://gpuopen.com/learn/neural_supersampling_and_denoising..., and the references it lists
[1]:https://research.nvidia.com/publication/2021-06_real-time-ne...
I think there will certainly be an AI 3D render engine at some point. But currently AI is used in 3D render engines to assist with denoising. https://docs.blender.org/manual/en/2.92/render/layers/denois...
At any reasonable quality, AI is even more expensive than raytracing. A simple intuition for this is the fact that you can easily run a raytracer on consumer hardware, even if at low FPS, meanwhile you need a beefy setup to run most AI models and they still take a while.
While some very large models may need beefy hardware, there are multiple forms of deep learning used for similar purposes:
Nvidia's DLSS is a neural network that upscales images so that games may be rendered quickly at lower resolutions, and than upscaled to the display resolution in less total time than rendering natively at the display resolution.
Nvidia's DLDSR downscales a greater-than-native resolution image faster than typical downscaling algorithms used in DSR.
Nvidia's RTX HDR is a post-processing filter that takes an sRGB image and converts it to HDR.
So, it is very likely that a model that converts rasterized images to raytraced versions is possible, and fast. The most likely road block is the lack of a quality dataset for training such a model. Not all games have ray tracing, and even fewer have quality implementations.
To be clear DLSS is a very different beast than your typical AI upscaler, it uses the principle of temporal reuse where real samples from previous frames are combined with samples from the current frame in order to converge towards a higher resolution over time. It's not guessing new samples out of thin air, just guessing whether old samples are still usable, which is why DLSS is so fast and accurate compared to general purpose AI upscalers and why you can't use DLSS on images or videos.
To add to this, DLSS 2 functions exactly the same as a non-ML temporal upscaler does: it blends pixels from the previous frame with pixels from the current frame.
The ML part of DLSS is that the blend weights are determined by a neural net, rather than handwritten heuristics.
DLSS 1 _did_ try and and use neural networks to predict the new (upscaled) pixels outright, which went really poorly for a variety of reasons I don't feel like getting into, hence why they abandoned that approach.
> So, it is very likely that a model that converts rasterized images to raytraced versions is possible, and fast.
How would this even work and not just be a DLSS derivative?
The magic of ray tracing is the ability to render light sources and reflections that are not in the scene. So where is the information coming from that the algorithm would use to place and draw the lights, shadows, reflections, etc?
I'm not asking to be snarky. I can usually "get there from here" when it comes to theoretical technology, but I can't work out how a raster image would contain enough data to allow for accurate ray tracing to be applied for objects whose effects are only included due to ray tracing.
I'm not convinced. We have "hyper" and "lightning" diffusion models that run 1-4 steps and are pretty quick on consumer hardware. I really have no idea which would be quicker with some optimizations and hardware tailored for the use-case.
The hard part is keeping everything coherent over time in a dynamic scene with a dynamic camera. Hallucinating vaguely plausible lighting may be adequate for a still image, but not so much in a game if you hallucinate shadows or reflections of off-screen objects that aren't really there, or "forget" that off-screen objects exist, or invent light sources that make no sense in context.
The main benefit of raytracing in games is that it has accurate global knowledge of the scene beyond what's directly in front of the camera, as opposed to earlier approximations which tried to work with only what the camera sees. Img2img diffusion is the ultimate form of the latter approach in that it tries to infer everything from what the camera sees, and guesses the rest.
Right, but I'm not actually suggesting we use diffusion. At least, not the same models we're using now. We need to incorporate a few sample rays at least so that it 'knows' what's actually off-screen, and then we just give it lots of training data of partially rendered images and fully rendered images so that it learns how to fill in the gaps. It shouldn't hallucinate very much if we do that. I don't know how to solve for temporal coherence though -- I guess we might want to train on videos instead of still images.
Also, that new Google paper where it generates entire games from a single image has up to 60 seconds of 'memory' I think they said, so I don't think the "forgetting" is actually that big of a problem since we can refresh the memory with a properly rendered image at least every that often.
I'm just spitballing here though, I think all of Unreal 5.4 or 5.5 has put this into practice already with their new lighting system.
> We need to incorporate a few sample rays at least so that it 'knows' what's actually off-screen, and then we just give it lots of training data of partially rendered images and fully rendered images so that it learns how to fill in the gaps.
That's already a thing, there's ML-driven denoisers which take a rough raytraced image and do their best to infer what the fully converged image would look like based on their training data. For example in the offline rendering world there's Nvidia's OptiX denoiser and Intel's OIDN, and in the realtime world there's Nvidia's DLSS Ray Reconstruction which uses an ML model to do both upscaling and denoising at the same time.
https://developer.nvidia.com/optix-denoiser
https://www.openimagedenoise.org
Yeah but that has something to do with
1) commercial hardware pipelinea being improved for decades in handling 3D polygons, and
2) graphical AI models are trained on understanding natural language in addition to rendering.
I can imagine a new breed of specialized generative graphical AI that entirely skips language and is trained on stock 3D objects as input, which could potentially perform much better.
The current approach seems to be ray tracing limited/feasible number of samples and upsampling/denoising the result using neural networks.
Its still hard to do realtime. You need so much gpu memory that a second GPU must be used at least today. The question is what gets achieved quicker. Hard calculated simulation or AI post processing. Or maybe a combination?
This is an interesting idea but please no more AI graphics generation in video games please. Games dont get optimized anymore because devs rely on AI upscaling and frame generation to get playable framerates and it makes the games look bad and play bad.
No its because hardware is not fast enough. Performance optimization is a large part of engine development. It happens at Epic as well.
No, it's because the software they write has higher requirements than computers at the time can provide. They bite off more than the hardware can chew. And they already know exactly just how much the hardware we have can chew... yet they do it anyway. You write stuff for the hardware we have NOW. "The hardware is not fast enough" is never an excuse. The hardware was there first, you write software for it. You don't write software for nonexistent hardware and then complain that the current one isn't fast enough. The hardware is fine (it always is). It's the software that's too heavy. If you don't have enough compute power to render that particular effect... then maybe don't render that particular effect and take technical considerations in your art style.
I agree thats true.
See: https://research.nvidia.com/labs/rtr/tag/neural-rendering/
Specifically this one, which seems to tackle what you mentioned: https://research.nvidia.com/labs/rtr/publication/hadadan2023...
It's a mega-kernel, so you'll get poor occupancy past the first bounce. A better strategy is to shoot, sort, and repeat, which then also allows you to squeeze in an adaptive sampler in the middle.
> // No idea where negative values come from :(
I don't know, but:
> newRay.origin += sign(dot(newRay.direction, geometryNormal)) * geometryNormal * 1e-4;
The new origin should be along the reflected ray, not along the direction of the normal. This line basically adds the normal (with a sign) to the origin (intersection point), which seems odd.
Poor's man way to find where the negatives come from is to max(0,...) stuff until you find it.
> A better strategy is to shoot, sort, and repeat
Do we have good sorting strategy whose costs are amortized yet? Meister 2020 (https://meistdan.github.io/publications/raysorting/paper.pdf) shows that the hard part is actually to hide the cost of the sorting.
> squeeze in an adaptive sampler in the middle. Can you expand on that? How does that work? I only know of adaptive sampling in screen space where you shoot more or less rays to certain pixels based on their estimated variance so far.
After reading this paper a bit more it seems that the it focuses on simple scenes and simple materials only, which a bit unfortunate. This is exactly where ray reordering overhead is going to be the most problematic.
They also do talk about the potential of ray reordering for complex scenes and complex materials in the paper (because reordering helps with shading divergence since "all" reordered rays are pretty much going to hit the same material).
So maybe ray reordering isn't dead just yet. Probably would have to try that at some point...
> It's a mega-kernel, so you'll get poor occupancy past the first bounce
Sure! If you look into the to-do list, there's a "wavefront path tracer" entry :)
> new origin should be along the reflected ray
I've found that doing it the way I'm doing it works better for preventing self-intersections. Might be worth investigating, though.
It probably works better when the reflected ray is almost tangent to the surface. But that should be an epsilon case.
[dead]
WebGPU projects that don't provide browser examples are kind of strange, then better use Vulkan or whatever.
WegGPU is a way nicer HAL if you're not an experienced graphics engineer. So even if you only target desktops, it's a valid choice.
On the web, WebGPU is only supported by Chrome-based browser engines at this point, and a lot of software developers us Firefox (and don't really like encouraging a browser monoculture), so it doesn't make a ton of sense to target browser based WebGPU for some people at this point.
It's not as much about experience as it is about trade-offs. I've worked a lot with Vulkan and it's an incredible API, but when you're working alone and you don't have the goal of squeezing 250% performance out of your GPU on dozens of different GPU architectures, your performance becomes pretty much independent of a specific graphics API (unless your API doesn't support some stuff like multi-draw-indirect, etc).
The answer is middleware engine, all of them with much nicer tooling available, without the constraints of a browser sandboxing design, for 2017 graphics APIs minimum common denominator.
See my answer to artemonster above.