PS3 (Research/Experimental) - NEC/TOKIN Capacitors Replacement - YLOD

@RIP-Felix Here's what's giving me fits: I ran GT6 with 2 each missing on each chip before. So, if the bad caps aren't dragging the whole thing down, then it should be booting up just fine now. I've still got my money on it booting after a removal but before a replacement at some point.
 
@squeept You must be referring to the "non-BC" console you tested on page 31, the one in the youtube videos that got nuked. G and H are the only non-BC models with a 90nm RSX and all of the non-BC models have 65nm Cell-BE's. That means any non-BC models has a lower TDP and thus too is their switching noise reduced. If it was a J model onward, the RSX is 65nm or 40nm and doesn't even have switching VRM. Also the tokins went from 1200uF a pop to 1000uF, which is indicates the lower TDP produced less noise. It's certainly possible that the console you tested by removing Tokins could endure more noise because it produced less to begin with, and that the 4x tokins was overkill for its TDP.

I doubt that's the case for BC models. I think their tokins are even more necessary for stability. The question is how far out of spec can they be? We'll see. I just removed C6229 and decided to just added 2x18AWG solid core jumpers to the + rails to solve the current problem mentioned before. It's a laze solution and doesn't look great, but once it's hidden inside the console I won't care as long as it works. I haven't tested it yet, but if it boots (I doubt it...we'll see) then there is at least 2700uF worth of Tantalum + whatever is left in the remaining Tokin. The next 2 TaPol arrays will be 4x270uF = 1080uF each. So we should be able to get an estimate of the capacitance required needed to make it stable following this methodology.

I'll test tomorrow. Tonight I think I'll down a few cold ones and watch a BB game to unwind...
 
I'm gonna have to go re-read this whole thread at some point just to see what the hell I was talking about, but if that was what I'm remembering, then where do I find the facepalm emoticon for myself? That obviously doesn't apply to what is essentially a different system with a shared part. I just assumed that since I focus on BC, that's what my memory would have been from. I know at some point I did some kind of GT6 missing caps experiment....

Another very good reason I just up and sent the board off. Someone else with the right equipment getting their hands on a verified bad set of caps. Repeat some experiments with all that we've learned in 175 grueling pages so that it's not just me arguing. If I can't find that I did that already on an A01, my backordered caps just showed up. Almost every time I list one at this point, some ass is asking if I replaced them, so then I can just say yes for once and be done with it!

almost totally unrelated edit: if you were watching the Lakers.... F%$# LeBron. I'm from a shithole in Ohio and I used to live right by one of his favorite restaurants, and their fucking limos blocked the goddamn street all the time. They just stand in the street and talk, and there's no way around it and you can't do anything. Politely ask, get out and politely ask, honk, brights, yell, inch up until you're touching them: dirty looks, refuse to move, go back to ignoring you.
 
Last edited:
PS3 #7 - Part 9
(3rd TOKIN removed, C6229 - Side A, CPU side, next to the one that was damaged in shipping)
...continued from part 8 here.​

This tokin looked fine. No apparent damage. EDIT: I thought it worthy of mentioning now before I forget the resistance +/GND has increased from 2.3Ohms to 2.6Ohms. That could indicate that he tokins were shorting to some small degree. It could have been flux residue on the tokins rails too, IDK for sure. I just thought I'd mention it...
C6229 10x.jpg
C6229 40x (left).jpg
C6229 40x (right).jpg
After removal I decided to just add 18 gauge AWG solid core bridge wires to the Previous TaPol arrays. I was feeling a bit concerned about too much current passing through the resistor legs I used to attach them. 18AWG can transmit 2.3 Amps per conductor, so with 4 of them I can conduct 9.2A + 1.152A (4 resistor legs). 10.3A should be plenty of headroom. Here's the changes:
C6232 Jumper.jpg
C6231 Jumper.jpg

Now onto the results:
SYSCON, 3 RSX TOKIN removed (C6231, C6232 & C6229).PNG
First, the 80 1002 is the only error now. YLOD is back to Non-Instant (1-10s). In this case it's pretty consistently a 2.5s YLOD. This is enough time for the startup sequence to finish, hence consistent 80 1002 errors, but not enough time for the clock to update the timestamp. These results are consistent with my PS3#2 and PS3#4 which had Non-Instant YLODs that resulted in SYSCON errors without timestamps. Unlike them, however, this console is completing the startup sequence, which can be seen by the voltage drop onto the last plateau. This only happened on the working consoles I've tested (which didn't have BGA defects).
O-Scope, 2.7s YLOD EVENT.png
O-Scope, Startup Sequence (2.7s YLOD).png

Noise is noticeably higher in the startup again. No spoiled apple yet. There is one more RSX tokin left to remove, but this DATA supports the gradual decrease in capacitance leads to YLOD hypothesis so far. Sorry @squeept!

Now here is where things get a bit confusing...

Well I figured out why the CPU bad waveform disappeared. It was because I wasn't looking at the CPU on the second to last voltage plateau on the last measurement. The reason is because when I removed the 2nd tokin the noise was so bad the YLOD occurred in 300ms and there wasn't enough of the startup sequence to trigger on the same voltage plateau. Well when I installed the 2nd TaPol array, I left the triggering the same for the last round of O-scope measurements, which was NOT the same location as the first set of images. So it depend on when in the YLOD event you are looking at the voltage. I have been trying to trigger on the same second to last voltage plateau to keep these type of errors from happening, but I didn't notice it on the last set of images. This time I figured it out when I saw that the CPU looked fine...
O-Scope, RSX large peaks (DC Coupling & 1x probe).png
O-Scope, CPU Good Tokin Noise (DC Coupling & 1x probe).png

Then I zoomed out to check the YLOD event time. After I got the images I wanted I decided to zoomed back in on the correct plateau again (double check)....
O-Scope, RSX large peaks Target Plateou (DC Coupling & 1x probe).png
O-Scope, CPU Bad Tokin Noise (DC Coupling & 1x probe).png

Notice how the CPU waveform is back to the bad one? Well it was there all along, I was just looking earlier in the startup sequence, because I was zoomed in earlier for the 300ms YLOD and forgot to recenter around the target plateau. So I bet it was there too. So scratch that bit about the RSX bad tokins causing CPU tokins to look bad. This evidence contradicts that hypothesis. Actually it makes more sense to me. The RSC and CPU tokins are electrically isolated, so I thought it was a bit of a stretch that they would cross talk that much. Anyway, so that mystery is solved. It was just an error on my part. I should have made sure I was centered in the same place. I'll try to do so from now on.

Now how did the noise change? Well the RSX has gone up 62mVpp (from 85mVpp to 147mVpp). So the threshold for a stable startup sequence is somewhere between about 150mVpp. Less than that and the console would get to the "80" PWR ON state. More than that and it would sometimes crash during the startup phase "10" or trigger a power failure "09" 3004 even before that. For reference my BGA defects would always trigger a "40" 3034, which is later in the powup sequence. I'm working up a spreadsheet to keep these handy. I really think this is the tool that will help us correlate the length of the YLOD with the startup sequence step it is likely occurring. The results are starting to support @ElGris anecdotal reports about longer YLOD being associated with bad tokins. My results thus far seem to support that.

Moreover, I now believe the SYSCON is the best first step anyone can do to diagnose their PS3. If the tokins are bad, we now know the codes associated with the RSX (1002 and 3004). If 3004, it could be other things, but fuses and voltages can be checked with a multimeter to rule them out. Then it must be the Tokins. A 1002 is the smoking gun. That's a TOKIN! The timestamp, step number, and length of YLOD are also important clues. I feel like we could diagnose a bad tokin with just that information at this point. No oscilloscope needed. However, it's conclusive, if you have one.

Continued in Part 10 here...
 
Last edited:
@RIP-Felix Here's what's giving me fits: I ran GT6 with 2 each missing on each chip before. So, if the bad caps aren't dragging the whole thing down, then it should be booting up just fine now. I've still got my money on it booting after a removal but before a replacement at some point.

@squeept You must be referring to the "non-BC" console you tested on page 31, the one in the youtube videos that got nuked. G and H are the only non-BC models with a 90nm RSX and all of the non-BC models have 65nm Cell-BE's. That means any non-BC models has a lower TDP and thus too is their switching noise reduced. If it was a J model onward, the RSX is 65nm or 40nm and doesn't even have switching VRM. Also the tokins went from 1200uF a pop to 1000uF, which is indicates the lower TDP produced less noise. It's certainly possible that the console you tested by removing Tokins could endure more noise because it produced less to begin with, and that the 4x tokins was overkill for its TDP.

I doubt that's the case for BC models. I think their tokins are even more necessary for stability. The question is how far out of spec can they be? We'll see. I just removed C6229 and decided to just added 2x18AWG solid core jumpers to the + rails to solve the current problem mentioned before. It's a laze solution and doesn't look great, but once it's hidden inside the console I won't care as long as it works. I haven't tested it yet, but if it boots (I doubt it...we'll see) then there is at least 2700uF worth of Tantalum + whatever is left in the remaining Tokin. The next 2 TaPol arrays will be 4x270uF = 1080uF each. So we should be able to get an estimate of the capacitance required needed to make it stable following this methodology.

I'll test tomorrow. Tonight I think I'll down a few cold ones and watch a BB game to unwind...

I'm gonna have to go re-read this whole thread at some point just to see what the hell I was talking about, but if that was what I'm remembering, then where do I find the facepalm emoticon for myself? That obviously doesn't apply to what is essentially a different system with a shared part. I just assumed that since I focus on BC, that's what my memory would have been from. I know at some point I did some kind of GT6 missing caps experiment....

Another very good reason I just up and sent the board off. Someone else with the right equipment getting their hands on a verified bad set of caps. Repeat some experiments with all that we've learned in 175 grueling pages so that it's not just me arguing. If I can't find that I did that already on an A01, my backordered caps just showed up. Almost every time I list one at this point, some ass is asking if I replaced them, so then I can just say yes for once and be done with it!

almost totally unrelated edit: if you were watching the Lakers.... F%$# LeBron. I'm from a shithole in Ohio and I used to live right by one of his favorite restaurants, and their fucking limos blocked the goddamn street all the time. They just stand in the street and talk, and there's no way around it and you can't do anything. Politely ask, get out and politely ask, honk, brights, yell, inch up until you're touching them: dirty looks, refuse to move, go back to ignoring you.
Ok guys, hold on a second.
Those tests were indeed valid. No need to cast doubts on them, even if they were on slightly different system.

Why? Because some time ago I also went ahead and tried, on 90nm system (c models)
The results were the same. Console was able to boot just fine, with 2 tokins simply missing on the CELL side, and with 2 tokins missing on the RSX side.
Granted, I did not stress test with a demanding game or anything. I was happy enough to see the boot.

So that's at least half the capacitance and twice the ESR, and the machine can still boot.
That's where I got the 40/50/60% numbers ballpark when I started conjecturing a few posts earlier. (So not completely out of my rear)
I still stand by what I said though... Maybe the "bad apple" thing is being misinterpreted... Maybe when 1 tokin starts degrading... It proggressively drags the other 3 to the same level. In parallel.

So I will quote my other post. Because sadly I would still have liked to see the console boot without removing dodgy tokins. Without seeing that, the apple thing is still there laughing.
Oh well, I guess the apple test will be a bit less rigorous now.
Not the end of the world. And, since the 1002 is still there laughing, maybe the test can still have some value.

The competing hypothesis I'm talking about, is that the whole tokin array is degraded below a certain threshold and the noise is just too much now. Not necessarily just 1 of the tokins causing all the problems. (Especially since they weren't shorting, even after taking that violent hit squishing the layers)

So, in order to balance things again, 1 tantalum in parallel would probably no longer be enough now to bring the array back over this minimum threshold. The squished tokin was still pulling some weight, so now i'd expect the machine to require at least 2 or 3 tantalums more. Because you have the oscilloscope, you should be able to see what each of the additions do.


The point of the test is to see if the noise can get better (or good enough to run) "without" removing any suspicious tokins. This is competing with the notion that there's a "bad tokin" that absolutely needs to go.
Nice work.
Now, what you describe is looking very very similar to the L model I posted about a couple times.
I bet if you put another of those tantalum modules on top, the machine will boot.
I initially suggested 4+1. You tried 3+1 and it's almost working. What about 3+2?
It's more or less what I was talking about.

Let's say you now replaced 1 dodgy tokin with a working substitute. Ok. That's now a team of 3 dodgy tokins and 1 new replacement. But that 1 "good' module is almost, but still not able to bring the whole array over the boot threshold.
For the sake of argument, Let's assume the whole group of old tokins was working at say, ~40% of their full health capacity. And the system needs at least 60% to boot. You removed 1 dodgy (now ~30%) and replaced with 1 good (now ~50%?)
So you can still try adding a little bit more, without removing. The thing is almost working.

Of course these are just tests. As you mentioned, there are a number of reasons why this is not advised and is not going to perform like a proper replacement, instead of mixing like madmen. But the findings can be valuable.
 
Maybe it would be easier to conceptualize if I post the results table with each result:
Result Table.PNG

Each TaPol is either 4 or 5 caps (the first 2 arrays were five the next 2 are four). @Pacorretaco 2-2 had 5100uF which is 300uF above the nominal value. In reality the TOKINS were degraded, so the real difference is probably larger. That is approximately equivalent to having added a 330uF Tantalum in parallel with the 4 defective tokins to begin with. I know this methodology is confusing, but the results you're talking about are in there. And yes, that was the most stable attempt yet. The YLOD usually wouldn't happen or it would occur randomly after a bit (10s to a few minutes). I probably should have done more extensive testing before moving on, to see just how stable it was. The next one should be better, since it's got 3 known good arrays and one dodgy tokin, instead of 2 & 2. I'll hook up a HDD and the BD next time and try to trigger a YLOD if it doesn't come easy.

The total capacitance assumes the TOKINs are at 100% health, which of course these are not, but assuming they are gives us conservative estimates. Basically we use that total and the smaller total capacitance results to calculate a percentage the capacitance has to fall to cause a certain failure. Especially on the back end of this testing will that this become more accurate, because we know the TaPol are good. So for example, we can take the 4-0 and 3-0 results and compare the behavior to the 0-3 & 0-4 results. The total capacitance should be nearly the same, but if we get wildly different results then we have an idea of how much capacitance the tokin array must have lost. The results in between should give us an idea of the behavior range. 2-1 was always supposed to be the worst result in this testing method, because it would have the lowest true capacitance.

The drawback to doing it this way is that we can't know for sure what the true capacitance is, but the benefit is that we can infer it from the behavior. Also we can't drop the total capacitance more than 2-1. But since there was no way to measure the tokins capacitance in place, this was the best way I could come up with to infer it. Convoluted I know, but it's easier to conceptualize using the table above.

I think that Noise and length of YLOD are correlated well, but I'll do a statistical analysis of that when I get the last 3 data points. We can sort out the math and infer the tokin's true capacitance once testing is done.

I hope that all makes sense?
 
PS3 #7 - Part 10
(3 TaPol Arrays added to RSX (C6231 & C6231 ea. replaced with 5x & C6229 with 4x ETPSF270M6E)
...continued from part 9 here.
SYSCON, 3 TaPol Arrays on RSX (C6231, C6232 & C6229 are ea. 5x ETPSF270M6E).PNG

I played around with different commands in the SYSCON to see what other information I might find. Some commands didn't work, like "tmp" and others were a bit confusing. So I copied the code below in case you want to paroose, but the picture of the normal errlog and last errlog are above as normal.
Code:
Microsoft Windows [Version 10.0.18363.1316]
(c) 2019 Microsoft Corporation. All rights reserved.

C:\WINDOWS\system32>CD C:\Users\HTPC\Desktop\PS3\SYSCON

C:\Users\HTPC\Desktop\PS3\SYSCON>python ps3_syscon_uart_script.py COM4 CXRF
>$ AUTH
Auth successful
>$ lasterrlog
lasterrlog
Last Error Code:0xa0801002, Time:0x0b48c29d  2005/12/31 04:16:29
[mullion]$
>$ errlog
errlog
ofst[ 96]:err_code:0xffffffff, clock:0xffffffff
ofst[100]:err_code:0xa0093004, clock:0xffffffff
ofst[104]:err_code:0xa0093004, clock:0xffffffff
ofst[108]:err_code:0xa0093004, clock:0xffffffff
ofst[112]:err_code:0xa0093004, clock:0xffffffff
ofst[116]:err_code:0xa0093004, clock:0xffffffff
ofst[120]:err_code:0xa0093004, clock:0xffffffff
ofst[124]:err_code:0xa0093004, clock:0xffffffff
ofst[  0]:err_code:0xa0801002, clock:0xffffffff
ofst[  4]:err_code:0xa0093004, clock:0xffffffff
ofst[  8]:err_code:0xa0093004, clock:0xffffffff
ofst[ 12]:err_code:0xa0093004, clock:0xffffffff
ofst[ 16]:err_code:0xa0093004, clock:0xffffffff
ofst[ 20]:err_code:0xa0093004, clock:0xffffffff
ofst[ 24]:err_code:0xa0093004, clock:0xffffffff
ofst[ 28]:err_code:0xa0093004, clock:0xffffffff
ofst[ 32]:err_code:0xa0093004, clock:0xffffffff
ofst[ 36]:err_code:0xa0801002, clock:0x0b48878c  2005/12/31 00:04:28
ofst[ 40]:err_code:0xa0801002, clock:0x0b4887fa  2005/12/31 00:06:18
ofst[ 44]:err_code:0xa0801002, clock:0x0b488820  2005/12/31 00:06:56
ofst[ 48]:err_code:0xa0801002, clock:0x0b48883b  2005/12/31 00:07:23
ofst[ 52]:err_code:0xa0801002, clock:0xffffffff
ofst[ 56]:err_code:0xa0801002, clock:0xffffffff
ofst[ 60]:err_code:0xa0801002, clock:0xffffffff
ofst[ 64]:err_code:0xa0801002, clock:0xffffffff
ofst[ 68]:err_code:0xa0801002, clock:0xffffffff
ofst[ 72]:err_code:0xa0801002, clock:0xffffffff
ofst[ 76]:err_code:0xa0801002, clock:0xffffffff
ofst[ 80]:err_code:0xa0801002, clock:0xffffffff
ofst[ 84]:err_code:0xa0801001, clock:0x0b488a85  2005/12/31 00:17:09
ofst[ 88]:err_code:0xa0801002, clock:0x0b488ac6  2005/12/31 00:18:14
ofst[ 92]:err_code:0xa0801002, clock:0x0b48c29d  2005/12/31 04:16:29
[mullion]$
>$ disp_err
disp_err
CheckStop:     None
PLLUnlock:     0
RSX Int:       None
PowerSeq:      ff
[mullion]$
>$ bringup
bringup
[SSM] state: 0000 -> 0101
Bringup Mode #0 (0xFF)
[SSM] ssmCb_OnStartingBePowOn() called.
[SSM] Bringup mode : syspm_stat=00000000/00000000
[POWSEQ] PowerSeq_Setup called.
[SSM] state: 0101 -> 0201
[POWSEQ] AV Backend Setup
[SSM] state: 0201 -> 0102
[SSM] state: 0102 -> 0202
[SSM] state: 0202 -> 0103
[SSM] state: 0103 -> 0203
[SSM] ssmCb_BeforeBeOn() called.
[SSM] state: 0203 -> 0104
Psbd_SbTransMode_Half:0x20e2
>$ disp_err
[SSM] state: 0104 -> 0204
[SSM] state: 0204 -> 0105
[SSM] state: 0105 -> 0400
(PowerOn State)
[SERV NVS] READ CMD

Boot Loader SE Version 1.5.0 (Build ID: 1798,18531, Build Data: 2007-01-10_12:09:26)
Copyright(C) 2006 Sony Computer Entertainment Inc.All Rights Reserved.
[SERV SETCFG] XDR (CH0,CH1) ASSERT
[SERV SETCFG] XDR (CH0,CH1) DEASSERT
[INFO]: Connecting to Debug Device (SB UART)
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV THERM] NOTIFY_MODE CMD
[SERV NOTIF] CONTROL_LED
[SERV NOTIF] RING_BUZZER
[SERV NOTIF] CONTROL_LED
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
[SERV NVS] READ CMD
disp_err
CheckStop:     None
PLLUnlock:     0
RSX Int:       None
PowerSeq:      ff
[mullion]$
>$ lasterrlog
lasterrlog
Last Error Code:0xa0801002, Time:0x0b48c29d  2005/12/31 04:16:29
[mullion]$
>$ errlog
errlog
ofst[ 96]:err_code:0xffffffff, clock:0xffffffff
ofst[100]:err_code:0xa0093004, clock:0xffffffff
ofst[104]:err_code:0xa0093004, clock:0xffffffff
ofst[108]:err_code:0xa0093004, clock:0xffffffff
ofst[112]:err_code:0xa0093004, clock:0xffffffff
ofst[116]:err_code:0xa0093004, clock:0xffffffff
ofst[120]:err_code:0xa0093004, clock:0xffffffff
ofst[124]:err_code:0xa0093004, clock:0xffffffff
ofst[  0]:err_code:0xa0801002, clock:0xffffffff
ofst[  4]:err_code:0xa0093004, clock:0xffffffff
ofst[  8]:err_code:0xa0093004, clock:0xffffffff
ofst[ 12]:err_code:0xa0093004, clock:0xffffffff
ofst[ 16]:err_code:0xa0093004, clock:0xffffffff
ofst[ 20]:err_code:0xa0093004, clock:0xffffffff
ofst[ 24]:err_code:0xa0093004, clock:0xffffffff
ofst[ 28]:err_code:0xa0093004, clock:0xffffffff
ofst[ 32]:err_code:0xa0093004, clock:0xffffffff
ofst[ 36]:err_code:0xa0801002, clock:0x0b48878c  2005/12/31 00:04:28
ofst[ 40]:err_code:0xa0801002, clock:0x0b4887fa  2005/12/31 00:06:18
ofst[ 44]:err_code:0xa0801002, clock:0x0b488820  2005/12/31 00:06:56
ofst[ 48]:err_code:0xa0801002, clock:0x0b48883b  2005/12/31 00:07:23
ofst[ 52]:err_code:0xa0801002, clock:0xffffffff
ofst[ 56]:err_code:0xa0801002, clock:0xffffffff
ofst[ 60]:err_code:0xa0801002, clock:0xffffffff
ofst[ 64]:err_code:0xa0801002, clock:0xffffffff
ofst[ 68]:err_code:0xa0801002, clock:0xffffffff
ofst[ 72]:err_code:0xa0801002, clock:0xffffffff
ofst[ 76]:err_code:0xa0801002, clock:0xffffffff
ofst[ 80]:err_code:0xa0801002, clock:0xffffffff
ofst[ 84]:err_code:0xa0801001, clock:0x0b488a85  2005/12/31 00:17:09
ofst[ 88]:err_code:0xa0801002, clock:0x0b488ac6  2005/12/31 00:18:14
ofst[ 92]:err_code:0xa0801002, clock:0x0b48c29d  2005/12/31 04:16:29
[mullion]$
>$ geterrlog
geterrlog
*** Invalid Argument ***
[mullion]$
>$ powerstate
powerstate
ATA Power          : ON
PCI Power          : OFF
RSX Power          : ON
XDR Power          : ON
Eurus Power        : ON
SB Power           : ON
RSX Thermal Sensor : AVAILABLE
BE Thermal Sensor  : AVAILABLE
[mullion]$
>$ tmp
tmp
*** Invalid Argument ***
[mullion]$
>$ temp
temp
*** Unknown Command ***
[mullion]$
>$ tsensor
tsensor
*** Invalid Argument ***
[mullion]$
>$ shutdown
shutdown
[SSM] state: 0400 -> 0500
[POWSEQ] AV Backend Letup
[SSM] ssmCb_AfterBeOn() called.
[SSM] Shutdown mode ... req_wake_src = 000000F4, ctxt=00/00
[SSM] Shutdown mode : syspm_stat=00000000/00000000
[POWSEQ] PowerSeq_Letup called.
[SSM] state: 0500 -> 0000
(PowerOff State)
>$
O-Scope, Startup Sequence (No YLOD).png
O-Scope, RSX target plateau (DC Coupling & 1x probe).png
O-Scope, CPU Tokin Noise target plateau (DC Coupling & 1x probe).png

Notice that the bad CPU waveform is missing from the target plateau (2nd from the last one). I did manage to find it again, but it was at the first plateau after the voltage rise...
O-Scope, RSX not target plateau (DC Coupling & 1x probe).png
O-Scope, CPU Tokin Noise not target plateau (DC Coupling & 1x probe).png
Added the next array. 4x 270uF this time. Didn't take a picture this time...just more of the same. Here are the highlights:
  • No YLOD, I had 2 SYSCON errors.
    • 80 1001. This is the first CPU Tokin error. Only got it once, but I haven't stress tested yet.
    • 80 1002. The RSX tokin errors it was having consistently before have disappeared so far. The reason I got this error is because my oscilloscope probe fell over and shorted the +/GND. The console YLOD immediately giving this 80 1002. Otherwise while o-scoping with the console on and stable in the menu, I didn't get any errors besides that one 80 1001.
  • CPU noise disappeared again. This time I made sure I was zoomed in on the second to last voltage plateau so that I am looking in the same spot it was before, but this time it wasn't there. I did fin a place I could find the bad waveform (earlier), but while idling it would sometimes be visible and other times not. It really seems to be teetering on the edge of being bad. That's interesting, because it means a console with an intermittent YLOD might evade O-scope diagnosis of bad CPU tokins! You might really have to hunt for it if it's not immediately obvious (stress test or look around at different points in the startup sequence).
  • The console never experienced a YLOD except when I shorted +/GND on accident when the probe fell. The console appears to be stable in light use. So now I need to try stress testing it...
I have to say, I'm feeling apprehensive about stress testing with that inductor's casing broken in half like that. @squeept, do you think that will overheat and blow out? They are cooled by thermal pads, but that one only has half the casing left to make contact with the pad. The rest of the metal plates are exposed directly to the air. I suppose that the best time for something to blow out is when I have it open and not enclosed in a flammable plastic shell.

The other thing that has me concerned is that the console is heating up rather quickly! After about five minutes of testing the PSU was Hot AF and the fan kicked up to the second step. I don't remember the default temperature curve off the top of my head, wasn't the second step when the CPU/RSX reached 85C+? Anyway, the PSU seems hotter than I remember for a working console. I'm afraid the the increased noise is causing the PSU to work too hard and the rest of the console to heat up more than it should. And that's my dilemma...

...I know better than to try and stress test a console that's on the verge of overheating as is. It feels stupid to try and push my luck. I already have the idle tokin noise measurement at this point. I could test to see if I can get it to YLOD or become unstable and throw more 1001 errors. Maybe I should even, but the stakes are higher now that I know it's working. It was easier when I had nothing to loose. I'll leave it in this state for awhile, so we can decide how to proceed.

What do you guys think?
  1. Should I push it to the limits now?
  2. What's a good level in GT6 that's accessible from the beginning of the game?
EDIT:
...before moving on I went ahead and installed the BluRay drive and wifi/bluetooth module. First the BluRay drive won't communicate. The Blue LED doesn't illuminate at startup like it's supposed to. I get three beeps when I press the eject button, no led. On bootup in the menu, It's giving that error when there is a BD ribbon cable detached. Not sure if yet if @squeept sent me the wrong BD daughterboard, or the ribbon cable ports need cleaned, or the cable itself is damaged (I have more I can try later), or maybe the BD daughterboard was damaged in shipping (I didn't see any damage though). Not sure yet. So I'll have to get that figured out before I can proceed to stress testing. Also while testing the BD drive and controller pairing the console did YLOD on one restart. That was the second 80 1002. Otherwise is seems stable. I also played around with some SYSCON commands and updated the above post with those results.

Continued in part 11 here...
 
Last edited:
@RIP-Felix I don't think the coils should actually produce much heat of their own, maybe they just use those as extra heatsinks since they stick up and have a nice flat area to mate with the shielding. I'd be more concerned that it will develop a really bad whine since it has more spots making contact that could resonate and the rest of the intact casing likely isn't rock solid anymore.

If you don't have the whole case on, it will get a lot hotter everywhere. The fan is no longer pulling fresh air across everything, so the aluminum shielding isn't getting any active cooling. The fan pulls cold air from the desk, blows it on the main heatsinks, and that's it. I only leave them running like that long enough for some diagnostics or a single race of GT6 to see if it dies.

I'd be 100% sure I sent the right board if you hadn't asked, but now I'm only 98% sure.... I expect you'll find another component knocked off somewhere.
 
I tried another ribbon cable and daughterboard. It's neither. Its like the BD isn't getting power. I just disassembled and checked continuity of all the fuses. They're fine. So I'll have to lookup the BD circuit in the schematic and check the regulators, mosfets and caps. Oh goodie... troubleshooting! Just what I always wanted.
 
@RIP-Felix I woke up in the middle of the night with a realization that any stability testing will be kind of inconclusive since the CPU is still going to be unstable even when the GPU is 100% replaced caps. We should have been going back and forth in removal.
 
@RIP-Felix I woke up in the middle of the night with a realization that any stability testing will be kind of inconclusive since the CPU is still going to be unstable even when the GPU is 100% replaced caps. We should have been going back and forth in removal.
I think it will be okay to just focus on the RSX before moving onto the CPU. There may be a little cross-talk, I am experiencing random CPU tokin errors, but they are far less frequent than the RSX. I want to finish the RSX tests then I'll move onto the CPU. we should give some thought to that process. Since the CPU seems to be right on the edge as it is now. I'm not sure there is much more to be learned from it. Since it's teetering on the edge now, we already know that the threshold for the CPU errors seems to be about 40-60mVpp (my guess of 50mV was correct I guess. I did read that from some authoritative source, so I guess they were right). Anyway, I'm not sure I want to go through the trouble of doing a 1-by-1 removal process again. Maybe I should try removing the tokens 1-by-1 until they're all off. Then adding the Tantalum arrays 1-by-1 until the console boots?

Oh BTW, I figured out the BluRay problem. I'm an idiot that's the problem! After checking the circuit and verifying under the microscope everything looked undamaged, I figured it was fine since @squeept was testing it in game before sending it to me. So if nothing was damaged I figured it can't be the MB. So I changed the PSU and...that wasn't it either. LOL! So that's when I thought, hey maybe turn it on and check voltages at the connectors. Yup, that's when the 5v and 12v is sent to the BD conn. Okay so it is getting power. What gives?

My sanity that's what! Aparently I'm too dumb to realize that in order for the BD to work there needs to be a game in there! So I put one in and guess what happened? It works fine. Yeah, aparently I'm just that stupid. I assumed that error code I saw on the dashboard was the BD ribbon, didn't check the code and pressed the eject button expecting to see blue LEDs when there was no disc in there...COLOSSAL facepalm moment...lol! Apparently, I have been staring at YLOD consoles so long I forgot how a working console behaves! I'll have to get used to it I guess.

Moving on... Well the console is stable in menu and got into GT6 okay. I was entering a name to create a profile when it experienced a YLOD.
SYSCON, YLOD making name in GT6.PNG
Since the last SYSCON error's I reported the console has had 2 more. A CPU tokin and RSX Tokin. The CPU tokin occurred sometime during the all times I was resetting the console trying to find problem that wasn't there to begin with. The 1002 (RSX tokin) is what caused the YLOD in GT6. So I can now call this test and move on. the 1 tokin + 3 TaPol array is stable in menu & unstable in in game (Random YLOD). I added a column for YLOD type in the table.
Results Table, 1-3 test (Random YLOD).PNG
 
Last edited:
Well, I'm glad you're dumb. I was really starting to feel bad. The board showed up smashed after all the anticipation, and then if I sent the wrong PCB and caused even more headaches...
 
So I got all the equipment like u recommend. Going to attempt a reball. How do I send u a private msg?

Still sounds like solder cracks to me...or a dead GPU.
  1. What are the resistance readings between POS and GND rails on the CPU and RSX? This should be about 2.5 Ohms or higher. A smaller resistance indicates a dying chip or bad solder job. I found that cleaning off the flux more thoroughly increased this resistance somewhat, but it got smaller each time I applied hot air near the GPU. Each time you reflow, reball, use hot air to remove tokins or apply heat to install tantalums, the smaller this resistance gets - until the chip is cooked to death. At that point you need a new CPU/RSX.
  2. You said you attempted a reflow, but it may not have actually flowed. What equipment did you use? These details are pretty important for a successful result. I use an IR preheater set to a temperature necessary to raise the motherboard to 150C, then a Hot air gun and 45x45mm SMD square nozzle to bring the chip up to 240-250C. These temperatures are measured using 2 thermocouples. One taped directly next to the chip and the other under the chip on the backside of the motheboard. Even then there's a lag time between when the thermocouple tells me the temp is up and when the solder balls actually go molten. It takes experience to get them to actually flow. You can nudge the chip slightly to see if it moves, then you'll know. The equipment is not fancy (it's quite cheap actually) and I have to manually adjust the temperatures to simulate a profile, but it works with some experience. And I also use BGA rework flux which requires about 90 seconds above 180C to activate. It needs to be fully cleaned off after using or it could corrode the board. Also a reflow could make the solder go molten, but there's a good chance that the old solder and/or dirty pads won't wet. So you'll just get cold solder joints that break during cool down. For these reasons and #1 above, I recommend that you remove the chip and reball. If you go through all the trouble, you may as well go all the way.
 
@Kleon1876 I will pass down the knowledge of the ancients as it was passed down to me: get yourself a pile of scrap boards off of eBay to ruin before you try on something you care about. It's like woodworking. You're going to destroy everything you touch for awhile.
 
I think the lift went successful
@Kleon1876 I will pass down the knowledge of the ancients as it was passed down to me: get yourself a pile of scrap boards off of eBay to ruin before you try on something you care about. It's like woodworking. You're going to destroy everything you touch for awhile.
 

Attachments

  • 61B33D12-41E5-452C-9762-09145D95EEB2.jpeg
    61B33D12-41E5-452C-9762-09145D95EEB2.jpeg
    1.7 MB · Views: 116
  • 122DC62C-6664-48E6-B191-5AB4E7BEEFCC.jpeg
    122DC62C-6664-48E6-B191-5AB4E7BEEFCC.jpeg
    1.8 MB · Views: 123
PS3 #7 - Part 11
(All RSX NEC/TOKINs removed, 3 TaPol caps installed)
...continued from part 10 here.​

You guy's get the drill at this point:
C6230 (Before).jpg
C6230 (Delid).jpg

The nick on the tokin was a slip of my dental pick, not a defect.
C6230 (left).jpg
C6230 (right).jpg
C6230 (After).jpg
Back to 80 1002 and 2.8s YLOD. No 1002's at all.
SYSCON, all RSX TOKINs removed.PNG
O-Scope, 2.8s YLOD EVENT.png
O-Scope, Startup Sequence (2.8s YLOD).png

I decided to try the measure all function again and was pleasantly surprised to find how accurate it was when using probes in 1x. The Vpp match my cursor measurements below. I think I'll be using this from now on to get a maximum Vpp measurement as that's really what I need from these measurements.
O-Scope, RSX & CPU Meauure All (DC Coupling & 1x probe).png

Also note that there are 3 distinct peaks now. Largest (100mV)...
O-Scope, RSX largest peaks (DC Coupling & 1x probe).png

large (66mV)...
O-Scope, RSX large peaks (DC Coupling & 1x probe).png

...and Small (39mV)...
O-Scope, RSX small peaks (DC Coupling & 1x probe).png

I like this image because it shows all three in one pic:
O-Scope, RSX largest & Regular peaks (DC Coupling & 1x probe).png

Most of the time I did not notice the CPU noise as small as it in the above picture. That was the only capture with the good CPU waveform. Most had the bad waveform. Here is the above image zoomed out a bit to give you an Ideal of how often the largest peaks are mixed in with the large/small ones:
O-Scope, RSX largest & Regular peaks zoomed out (DC Coupling & 1x probe).png
I estimate the largest peaks are there about 1/3rd of the time. Regardless, that's the peak voltage, so that's what to use for calculating. I'm kinda wondering now if I have been missing larger peaks because I've been zoomed in too much in previous measurements. That could be a source of error, but there's noting to do about it now and I did try to capture anything I thought was noteworthy, so I'd like to think If I saw larger voltage spikes that occured more often like this, that I would have documented it.
O-Scope, CPU Bad waveform (DC Coupling & 1x probe).png
O-Scope, CPU Bad waveform 2 (DC Coupling & 1x probe).png
Results Table, 0-3 test (2.8s YLOD).PNG

I tried another power supply just to see if it would look different and it had the same results. I think it's interesting that even with 3780uF (True) the RSX is still too noisy to boot. Either the CPU tokins are screwing with it or BC consoles really are that sensitive to noise!

Continued in part 12 here...
 
Last edited:
I think the lift went successful
That's not easy, but you're not in the clear yet. Where I last tripped up was in getting the damn solder balls to adhere to the pads without trying to coalesce with their neighbors. They took some...convincing. Sadly, PS3#2 ended up with BGA bridging after the lead reflow. I haven't tried again yet, but maybe you will inspire me to give it another go.

Good luck to you sir!
 
almost totally unrelated edit: if you were watching the Lakers.... F%$# LeBron. I'm from a shithole in Ohio and I used to live right by one of his favorite restaurants, and their fucking limos blocked the goddamn street all the time. They just stand in the street and talk, and there's no way around it and you can't do anything. Politely ask, get out and politely ask, honk, brights, yell, inch up until you're touching them: dirty looks, refuse to move, go back to ignoring you.
Nah, haven't been a LeBron fan since "THE DECISION." If you're going to rip the hearts out of your hometown don't do it on national TV in a egocentric 75min special. I'm not even from Ohio and it hurt me to see that from afar. Tuns of talent, Zero tact!
 
I mean, it's not giving a clear sawtooth anymore, so I wonder if those two error codes just aren't that different from each other. We don't know the mechanism of what triggers these codes, so they may not really tell you which set of caps is screwing up, because the CELL waveform is still distinctly the bad one now.

I'm also still really pondering how physical damage caused this, because it had to be under load to fail before it left here.
 
Back
Top