About the candidate games to run this kind of stress tests... the release date and the graphics quality is not an accurate indicator of the workload. In general we can say a game with nice graphics is going to have a big workload, but as usually in programming there are tenths of different ways to achieve something, where one of them is "the good one" (the kind of thing everybody is going to agree because achieves the goal in the most simple and efficient way), there are also a couple of "aceptable" solutions, and tenths of "bad ideas"
Sometimes it happens there is a average game that is generating a huge workload because the game developers implemented a few of that "bad ideas", lol
In the case of naughty dog, his game engine is demanding but is not caused by "bad ideas", there are some educational documents released by them where they explains how they was splitting the threads to the PS3 SPU's in uncharted (using a SPU dedicated to detect the collisions of poligonal objects, etc...)... what happens is they are "squeezing" the CELL very well because they are using the CELL SPUs for graphic tasks
And gran turismo... well... is a sony studio, we can assume the game company had direct help from sony sony engineers too, so they also knew very well how to squeeze the PS3 hardware

*In te last of us... the most demanding task is when the game engine render waters (there are 2 spots in the game with water that triggers huge temperatures), personally i consider his game engine for PS3 had a flaw with that... but the game engine from grand theft auto seems suffers from this same problem too. Lets say... the RSX and the game engines for PS3 was in the borderline to simulate realistic waters, game devs tryed to achieve the best quality but at the cost of generating huge workloads
-------
The tokins have a dual function, the total capacitance is reduced with time, but they also cleans the interferences/noise in the power rail, if you add the tantalums piggy backing you are increasing the total capacitance but you are not removing the noise (the tokins does it, this is why im suggesting to dont remove the tokins yet)
And old PSU could generate noise over an aceptable level (the tokins are not going to be able to clean it), you can use a PC ATX PSU temporally to do a test, there are a couple of tutorials in the forum, if this test solves the problem then the culprit is the PSU (not the tokins)
Anyway... is a CECHB/COK-001 and contains PS2 hardware components, so... while playing PS2 games (without netemu) a lot of the workload is offloaded to the PS2 EE/GS, and i guess the EE/GS is not taking power from the tokins
PS1 is emulated in the CELL but i guess is not so demanding
The error codes log does a "loop", after is filled with 32 errors it starts overwriting them in oder "older to newer"... but you can identify them with the timestamps
Basically, if you make an stress test resulting in a crash, at the next reboot you should dump the error codes and search the error with a timestamp of today
Eventually your tokins are going to throw the towell, the PS3 will not boot (andyou will not be able to dump the error codes with the app), if this happens and you fix it by piggy-backing a tantalum that would be another proof that the tokin have a problem (and yeah, it would be about time to remove them, lol)