PS3 (Research/Experimental) - NEC/TOKIN Capacitors Replacement - YLOD

The current consensus is that 3034 is either a bad CELL chip, or no power going to the it.
Well I would say that it's the latter. This console was working perfectly fine for a few power cycles after replacing one NEC/TOKIN. After cooling and sitting for a few days, maybe a week, it YLOD during a PS2 game. My guess is that the heat from the cap replacement temporarily restored the BGA connection and it allowed me to boot during that power cycle and as long as it was powered on. When it cooled the MB relaxed, and after a few power cycles it went back into it's previous broken position and YLOD on me. I doub't the PS2 game had anything to do with it. That narrative fits the evidence. My oscilloscope measurements indicate the caps are working. Also the error codes associated with bad NEC/TOKINs are not showing, so that's good evidence as well.

Now I'm rethinking the whole CPU is fine thing. They do run Hot, but as @squeept pointed out they don't usually heat up and cool down as much as the RSX does, so that's why the RSX tends to die more often. I was only going to reflow the RSX before. But this 3034, is probably indicating a BGA on the CPU?
I'm curious about your Southbridge error, though -- have you checked your fuses over there?
You misread that error, it's a 4002 (RSX data Error). The SB is fine.

I want to learn to reball, but now I'm pretty invested in PS3#2. I would feel pretty bad if I screwed it up like I did with PS3 #1. However, it's dead untill I do something, and practice is how you learn. So I will probably attempt a reflow next. I don't want to try a reball just yet. If the reflow doesn't work, then I'll try and lift the chips. It might be awhile until I work up the courage to try again. In the meantime I just recieved PS3 #3 (CECHA01 - doesn't read discs). It came in last night and I haven't tested it yet. My focus will probably shift to that one while I work up the courage to attempt a reflow on PS3#2.
 
Last edited:
You misread that error, it's a 4002 (RSX data Error). The SB is fine.
Ah, my bad! Today's been a very busy day for a sunday, i've been running in and out all day.

My oscilloscope measurements indicate the caps are working.
And that's one of the pieces of evidence I've been looking for. Yeah, if your caps are good, then power to them is good, and we can surmise all that's left "to be bad" is the RSX itself, as far as that error goes.
 
So the last few digits 34, indicate that the CELL and RSX cant communicate properly.

You usually see bad solder joints on the RSX causing the CELL to log the 3034 error in this situation.

The resistance test across the CELL and RSX via the GND and VCC points of the nec tokins can give a good indication of the state of each chip.

So, CELL and RSX = 3.0 ohms or above is in good shape anything less ,means its on its way out!

BAD nec tokins can only be tested when removed from the board - 900uf's or less means bad NEC tokin

Well done for getting this far! - the next scary part is reflow or removal - not for the faint hearted!

Well I would say that it's the latter. This console was working perfectly fine for a few power cycles after replacing one NEC/TOKIN. After cooling and sitting for a few days, maybe a week, it YLOD during a PS2 game. My guess is that the heat from the cap replacement temporarily restored the BGA connection and it allowed me to boot during that power cycle and as long as it was powered on. When it cooled the MB relaxed, and after a few power cycles it went back into it's previous broken position and YLOD on me. I doub't the PS2 game had anything to do with it. That narrative fits the evidence. My oscilloscope measurements indicate the caps are working. Also the error codes associated with bad NEC/TOKINs are not showing, so that's good evidence as well.

Now I'm rethinking the whole CPU is fine thing. They do run Hot, but as @squeept pointed out they don't usually heat up and cool down as much as the RSX does, so that's why the RSX tends to die more often. I was only going to reflow the RSX before. But this 3034, is probably indicating a BGA on the CPU?
You misread that error, it's a 4002 (RSX data Error). The SB is fine.

I want to learn to reball, but now I'm pretty invested in PS3#2. I would feel pretty bad if I screwed it up like I did with PS3 #1. However, it's dead untill I do something, and practice is how you learn. So I will probably attempt a reflow next. I don't want to try a reball just yet. If the reflow doesn't work, then I'll try and lift the chips. It might be awhile until I work up the courage to try again. In the meantime I just recieved PS3 #3 (CECHA01 - doesn't read discs). It came in last night and I haven't tested it yet. My focus will probably shift to that one while I work up the courage to attempt a reflow on PS3#2.
 
Yes, at the time i was trying to understand the logic behind all of this.

If anyone wants to improve the guide and add bits about windows install etc, please create a git pull request to add these changes!

I only use linux, so cant really help with windows stuff!

eepcsum is actually straight forward - so the expected result, you just flip the 4 digits and enter (as its an endian)

Exactly, you're not supposed to ground the diag point until after you enabled diag mode. You can't get into low-level diag until you've both enabled the flag and grounded the pin, in that order. Not at all convoluted :)

I understand your reluctance with the eep checksum stuff! One thing that confuses people with the guide is that the values are written for COK002, which are slightly different from a COK001 (which is what I own). When I made my changes, I accidentally used the COK002 values... doh! It took a few extra steps to get the checksum corrected, but I was able to get it going. Something to keep in mind is that the error message will tell you what needs be changed in order to fix the checksum -- it's understanding the message that gets tricky.

The reason you need to do this whole song and dance is that once you enable the flag, the syscon's checksum now different and it refuses to boot. Once you've successfully enabled diag mode and grounded diag, you actually need to set the flag back so the checksum is good and the PS3 boots normally.

I've been wanting to put a video together showing how this whole process, but my dead PS3 is stored away while I work on repairing other stuff. I've been itching to get back into things, so once I'm done with my current batch of hardware, I'll see if I can do a video.
 
Thank you so much for this in-depth tutorial of how to replace these capacitors. I've ordered 32 of the new capacitors that you suggested. (I'm working on an original backwards compatible PS3 btw) do you think it's a good idea to just go ahead and replace all the capacitors right off the bat? Also what gauge of wire do you suggest? 20 gauge? And if I do the capacitors individually as you suggested could you explain the electrical tape part more? Thank you
 
Last edited by a moderator:
So the last few digits 34, indicate that the CELL and RSX cant communicate properly.

You usually see bad solder joints on the RSX causing the CELL to log the 3034 error in this situation.

The resistance test across the CELL and RSX via the GND and VCC points of the nec tokins can give a good indication of the state of each chip.

So, CELL and RSX = 3.0 ohms or above is in good shape anything less ,means its on its way out!

BAD nec tokins can only be tested when removed from the board - 900uf's or less means bad NEC tokin

Well done for getting this far! - the next scary part is reflow or removal - not for the faint hearted!
2.8 and 2.9Ohms respectively, but they were about 3.0 Ohms before the NEC/TOKIN replacement. There could be a little bit of Flux residue still on there. I've cleaned it thoroughly, but there's no way to get it all off without soaking in an ultrasonic bath full of 99% IPA.

Okay then, the RSX is still prime suspect. I'll plan on re-flowing it then. I did ohm test the RSX I lifted off PS3#1, and it still seems good, but I lost a couple of components off the topside I'd need to replace. Does anyone know of a schematic that could identify what components these are? I'm hoping to fix it back up, in case I ever need a replacement.
WExYt3o.jpg
 
The resistance test across the CELL and RSX via the GND and VCC points of the nec tokins can give a good indication of the state of each chip.
I guess I understand this in principle, but not in practice -- or maybe I'm testing the wrong points. If you're testing GND and VCC across the tokins, then I assumed that all you're testing is the tokins...
 
No, there is some internal stuff going on inside the chips that results in that resistance. On PS3#1 I tested this resistance after I removed the CPU and RSX. The tantalum array was still as I left it. it jumps around like it's calculating capacitance/resistance across a bunch of components before stabilizing on non-sense. Without the chips it doesn't calculate the same, so it defiantly is reading something inside the chips.
 
No, there is some internal stuff going on inside the chips that results in that resistance. On PS3#1 I tested this resistance after I removed the CPU and RSX. The tantalum array was still as I left it. it jumps around like it's calculating capacitance/resistance across a bunch of components before stabilizing on non-sense. Without the chips it doesn't calculate the same, so it defiantly is reading something inside the chips.
Where exactly are you testing, though? That's what I want to be clear about.
 
+/GND on the Bypass caps (tokin or tantalum). After replacement, or before, the resistance should be greater than 2.5 in my experience. What @db260179 is saying is that anything less than 3 Ohms can give you an idea of the chip's health. I hadn't considered this, but it might make sense. I have noticed that resistance decrease over each additional time I worked on the motherboard (PS3#1 that I killed). Cleaning flux off THOROUGHLY helps, but each time it got smaller.

Now if you work the chips off the motherboard you can Ohm test them. I'd have to look that up again, as I'm not remembering which pads they are.
 
If I recall correctly from a lengthy discussion 10 some years ago on BGAmods about ohm testing the chip when out of circuit:

Ohm test fail necessarily means the chip is dead.
Ohm test pass does not necessarily mean the chip is alive.
Just something to keep in mind.

I'll plan on re-flowing it then. I did ohm test the RSX I lifted off PS3#1, and it still seems good, but I lost a couple of components off the topside I'd need to replace. Does anyone know of a schematic that could identify what components these are? I'm hoping to fix it back up, in case I ever need a replacement.

I no longer reflow anything because of the GPU on the PS3 and the APU on the PS4. Early models of the PS4 had oxidized pads from the factory. My guess is a bad batch / mix of argon on a production run or whatever inert they use these days. If my other guesses on the PS3 are right, the cracks have been present so long while they continued to work that they've oxidized naturally. Since most of the time, the defect appears to propagate at the actual pad on the GPU, the oxidation will stop it from wetting unless it is cleaned off manually.

If you're in the USA I can send you a dead GPU for the $3 it will cost to ship and you can cannibalize what you need. I don't have an LCR bridge so I can't get those (assuming) tiny readings accurate enough.

BAD nec tokins can only be tested when removed from the board - 900uf's or less means bad NEC tokin

All of the bad ones I found tested within spec when out of circuit. They needed to be under load to see a failure.
 
...Since most of the time, the defect appears to propagate at the actual pad on the GPU, the oxidation will stop it from wetting unless it is cleaned off manually.
Flux reduces (opposite of oxidize) metal oxides and restores the solder enough for it to flow. At least enough for you to lift the chip and clean it off. I was hoping it would be enough to re-wet to the pads too. If it isn't, then a reball would be necessary.

It is better to do the reball, so there's just one thermal strike, but there's more opportunity for things to go wrong. Leaded solder does provide better elasticity and longer BGA performance, so there's an argument against reflowing. That and cleaning all the flux under the chip is a PITA after a reflow.
 
PS3 #2 - Part 5: Measuring
(...continued from Part 4 here.)
I finally got around to getting the CPU measurements I think I said before I was measuring the CPU, but it was the RSX. I was confused. Anyway, I cut away some of the RF shield to expose just enough room to get my probe in. The RF shield cuts down on interference, so I wanted to keep it mostly intact.

Test Bench:
PS3_Test_Bench.jpg

[
CPU_Probe.jpg
CPU_YLOD1.png
CPU_YLOD2.png
CPU_YLOD_Rise.png
CPU_Plateau_normal.png
CPU_Plateau_Hires.png
CPU_Plateau_Hires_Zoomed.png

I found that the CPU voltage fluctuated in discrete steps. This must be the voltage regulators switching the voltage according to some internal boot process. Perhaps these correspond to those boot checks in the SYSCON log. I lined them up in this image just to see if the number of events matches the general power sequence in the SYSCON log and...
Capture3.JPG

CPU_YLOD_SYSCON_Overlay.png
As you can see they line up pretty good. After 104, it appears the CPU power sequence is done and it moves on to a Southbridge command (sountbridge transmission mode?). Anyway, this is where all hell broke loose. Communication between the CPU and RSX ("bit training") fails. There is "AfterBEOn2()", which sounds like the CPU past it's checks. If 104 -> 304 is the first time the system tries to open CPU communicate with the RSX, then that's where you'll first learn of an RSX problem. That could be 3034 error code. Then theres a Powseq Fail detected at state 304 -> 700 that must be the 4002 RSX error.

Here are both of them together:
RSX&BE2.png
RSX&BE.png


If the above narrative is plausible, then 104 is the area I should expect the RSX and CPU to be communicating, or at least they should be running and have stable power. Reading code is not my specialty! I'm really just guessing at what these commands might mean. However, I was thinking that that last plateau before the YLOD is probably the most important place to zoom into and have a closer look. So that's what these are:
Interesting_Signal_Normal.png

If you don't use HiRes Aquire mode, then the signal below is obscured...
Interesting_Signal_HiRes.png

Here are some measurements, for what it's worth.
Interesting_Signal_HiRes_measured.png


I'm not sure if that's the rectified sine wave. Maybe that's the VRM noise I was looking for? It's missing on the RSX (Blue). I'm not sure what I've found here actually. It's interesting. Perhaps I need to try adding more capacitance on the CPU?

Continued in part 6 here...
 
Last edited:
Hi All,

I've been trying to replace the tokins on my failing CECH-2001A (Slim) but i've been hitting roadblocks and I need your expertise.
I never had any problems with it, in fact, I played quite a lot in March and it never crashed or overheaded.
It was always in a well-ventilated area since I bought it new.
2 weeks ago I started it (it was off and unplugged since March), but after a few minutes on the XMB, it shut itself off with 3 beeps.
Everytime I tried to restart it, it would give be the 3 beep immediately, with no yellow light, only red.

I found this thread and read throught most of it as I wanted to find the ESR value before ordering the caps.
Since I have a DYN-001 with 4 Tokins E108, I ordered 16 x 330uF 6.3v with a 800mOhm ESR value (https://www.digikey.ca/en/products/...E800/399-11976-1-ND/5267679?itemSeq=343762032)

I cleaned the Tokins without scratching the board or removing parts and soldered the tokins in place. Jumper wire is gauge 14 stranded. I did the same setup for both sides, there is no shorts and everything seems clean. The setup is slim enough that I can put back the RF shield without problem.
2wTcYHb.jpg

*Flakes were removed prior reassembly*

After reassembly, I could play a few games for 15-20 minutes (Saints Row IV, Shadow of the collosus) I tried to load The Last Of Us but the PS3 crashed (freeze then 3-beep) after the start screen.
I thought it was overheating, so I monitored the temp with webMAN MOD while idle and playing, both RSX and CELL where always below 63C.

Now, it would sometime starts (XMB) for a few minutes, but most of the time it gives me an instant YLOD, then 3-beep.

From your experience, could this problem be attributed to a faulty cap, bad soldering or bad jumper cable? Could another chip be overheating since the problem seems to have a cooldown period (altough quite random)?
 
Those yellow caps are crap. Try replacing them with something better

Στάλθηκε από το POCOPHONE F1 μου χρησιμοποιώντας Tapatalk
 
@RIP-Felix

Not sure it helps anything, but here's the reballed working CECHA01. One image per post so I don't mix things up, sorry mods. I knocked it in to high-res mode, and I compromised and cut an extra ground clip and turned it in to a pin. This is from the opposite ends of a bypass cap under the CPU:

XM2TsNJ.jpg


Just looks like I killed all the noise. So, at least from the diagnosing bad caps side, no new information (stable voltage good, clear big waveform bad), but if I can remember how to trigger and capture the startup, I might have something interesting for you to use in tandem with the SYSCON stuff that may prove fruitful for diagnosing other issues. Lemme grab the GPU, then I'll dick with the triggered capture for a little bit. I can't do them side by side, unfortunately, since I'm holding my pins in place.

edit: having trouble getting GPU and I have to poop. I'll give it a few more shots, but since I'm almost certain it's just going to look like that too, I don't care that much. If there's anything else I can look at let me know. I can put off reassembly for a few hours, and it takes me a few more hours to clean everything, but I want to have it ready to start stress testing tonight then you'll have to wait for the next one.
 
Last edited:
Sorry that took a bit. Had to refill with snacks and stare at this trainwreck of an election for a bit.

Nice high res cleaned up working CECHA01 GPU:
AjE0ywK.jpg


This looks more like expected, so maybe I botched the CPU capture. Whatevs.
 
Last edited:
Back
Top