JavaBill:Does that mean a 1/2 gigaflop GPU actually is a few hundred shaders that all add up to 1/2 gigaflop, as opposed to a 1/2 gigaflop being available in one stream? I guess had been picturing it all in one stream to do a massive compute intensive task.
No, each shader is itself a stream processor.. A stream processor is just another fancy name for a shader. So a few hundred shaders is a few hundred potential compute "streams" that could be running simultaneously. Generally speaking, you would run these "streams" to compute how light reflects on the object across hundreds of sample points. Often there are special textures called "bump maps" which basically scew the way light reflects off of the object at these surface points to give the illusion that there's depth to a completely flat surface.. For example, a tile floor on the ground in Half Life 2 is actually a completely flat, or sometimes slightly curved surface.. The bump map scews how light bounces off of it at certain points and then you have the illusion that the surface tiles have depth because the edges of the tiles reflect light differently than the center of the tile.. It's one of the many ways games "cheat" realitiy, but it's still far from perfect as light reflected off of these tiles do not go on to illumintae other objects as would happen in real life and would be rendered properly by a real ray-tracer.. The stream processors make a "pass" over the texture bump mapped floor and can compute exactly how much lighting to render at each point across the floor to make the floor tiles look like they have depth that it really does not. However, they don't actually go so far as to make complete traces of the rays of light as they bounce around as the hardware is just not fast enough for that.
The general goal of computer graphics is, the less "cheating" that's done, the more realistic things will look, and also the more computationally expensive it will be, but it's what people ultimately want.
Games still "cheat" like crazy to get that extra bit of quality with as little processing as possible and that's likely to be true well into the future. It's a "fool the eye" technique, but in most cases if you look closely enough at it, it just won't look realistic.. Especially if you know what you're looking for.
JavaBill:
While that rearranges my thinking a bit, it still seems to me that the GPU would be a great place to farm out a compute intensive task while I (as the CPU) go off and do something else entirely different, and then come back to the GPU later to get the results. I was thinking that was what the nVidia CUDA architecture (over simplified) was all about, maybe not.
That's exactly what it's for, and more.. OpenCL (not to be confused with OpenGL) does the same thing.. But the major selling point of these architectures is that in the end, you end up with a whole pile of raw data that can then be reprocessed / regurgitated instead of a rendered frame that you would traditionally get from a fixed pipeline which would be wiped every pass / frame. In other words, think of newer graphics cards as data being able to take "shortcuts" back into recomputation with other data instead of ending up at the end of the pipeline with nowhere to go except getting "flushed" (real term by the way) from the buffer / down the toilet when it's done with it...
To use an analogy... Graphics cards of yester-year with the fixed pipeline is just like an assembly line at a car manuf. plant.. They start at the beginning and when they get to the end you have your rendered frame..
Graphics cards of today and of the future, data may get half way through the assembly line, and then depending on it's result, that data can go back and rebuild it's own assembly line and send itself back through it again.. Now take each asssembly line and multiply that by the number of shaders / stream processors and then consider that all the data in each assembly line can be combined with data in other assembly lines after they've reached certain points in their processing which, combined, may also go back and configure the assembly line to process the combined data and again send themselves through it..
Gives you a better idea of how complex these graphics cards are becoming, but most of the really nasty stuff is handled by the architecture CUDA / OpenCL.. As long as your task is parallel, you let CUDA and OpenCL worry about the details of the assembly lines (each shader's work flow), that's what they're there for.
Onboard RAID vs. 3Ware RAIDI never recommend people run RAID-5 with onboard chipsets.