PS3 (Research/Experimental) - NEC/TOKIN Capacitors Replacement - YLOD

RIP-Felix can you please compare and add few cell vdd power line values? That wasn't inspect before and need to explain a low resistance in vdd line for cell, probably same situation as rsx.
@Computer Booter you may be interested in this too. Here you go Victor. This is from that COK-001 I just tested the other day during booter's livestream. It was described as not displaying, but it came up fine for me. I guess they didn't know how to plug in an HDMI cable...LOL! It seems to be working...
COK-001_MB_Ohm_Test_points.jpg


Cell/RSX VDDC are ~2 ohms (pretty low, but it's not dead yet). I haven't done any stress testing though. Anyway, these are the main voltage lines. Ohm test results are quite comparable between consoles. I did the same for a couple other boards and got similar results. I think this will be a viable method to identify bad RSX/CELL. But I'd need to see a known bad RSX/CELL to be sure. Maybe you or @Computer Booter could ohm test a few boards that you suspect have dead RSX or CPU?
 
Last edited:
Hello, I'm looking to buy the PS3 Tantalizer - Beta Release (v0.3b) but I would like to know if it works without problems, I would like to know.
Yeah, it works.

I am refining the design to work out a manufacturing issue that caused it to fail to get made properly for one user. That has only happened once out of many orders and OSH park refunded the order. So you can buy with confidence. They just shipped me the test batch of my next beta iteration (v0.4b). This one will have v scoring to address this issue (hopefully). It should be arriving soon.
 
Can anyone tell me if after reballing the ps3 works perfectly or is there problems?
I want to send for reballing another ps3 bc is it worth it?
I don't care about the reballing value.

You should check its resistance. If it's too low (under 2 ohms) reballing won't help. If it's over 2, it's probably going to help it last for some time ( 6 months to 2 years), but I don't believe it's a long term fix.
 
Hiya Felix & Friends, here are my CXRF logs for:- becount, bringup, powerstate and errlog.

(PS3 = SEM-001)
Code:
Users\Bob_Top\Desktop\PS3\SYSCON>python ps3_syscon_uart_script.py COM3 CXRF
>$ AUTH
Auth successful
>$ becount
becount
Bringup : 1862 times
Shutdown: 1421 times
Power-on: 84day 06hour 42min 04sec
[mullion]$

>$ bringup
bringup
[SSM] state: 0000 -> 0101
Bringup Mode #0 (0xFF)
[SSM] ssmCb_OnStartingBePowOn() called.
[SSM] First Boot.
[SSM] Bringup mode : syspm_stat=00000000/00000000
[POWSEQ] PowerSeq_Setup called.
[SSM] state: 0101 -> 0201
[POWSEQ] AV Backend Setup
[SSM] state: 0201 -> 0102
[SSM] state: 0102 -> 0202
[SSM] state: 0202 -> 0103
[SSM] state: 0103 -> 0203
[SSM] ssmCb_BeforeBeOn() called.
[SSM] state: 0203 -> 0104
Psbd_SbTransMode_Half:0x20e7

>$ powerstate
[POWERSEQ] Error : BitTraining BE:RRAC:RX0:GLOBAL1:RX_STATUS
[SSM] state: 0104 -> 0304
[SSM] ssmCb_AfterBeOn2() called.
[SSM] PowSeq Fail : Detected !
[SSM] state: 0304 -> 0700
[POWSEQ] AV Backend Letup
[SSM] Shutdown mode : syspm_stat=00000000/00000000
[ERROR]: 0xa0404401
[ERROR]: 0xa0403034
[POWSEQ] PowerSeq_Letup called.
[SSM] state: 0700 -> 0600
(PowerOff State) (Fatal)
powerstate
ATA Power          : OFF
PCI Power          : OFF
RSX Power          : OFF
XDR Power          : OFF
Eurus Power        : OFF
SB Power           : OFF
RSX Thermal Sensor : UNAVAILABLE
BE Thermal Sensor  : UNAVAILABLE
[mullion]$

>$ errlog
errlog
ofst[ 80]:err_code:0xffffffff, clock:0x0b4f401f  2006/01/05 02:25:35
ofst[ 84]:err_code:0xa0403034, clock:0x0b4f401f  2006/01/05 02:25:35
ofst[ 88]:err_code:0xa0404401, clock:0x0b4f42b3  2006/01/05 02:36:35
ofst[ 92]:err_code:0xa0403034, clock:0x0b4f42b3  2006/01/05 02:36:35
ofst[ 96]:err_code:0xa0404401, clock:0x0b4f46d7  2006/01/05 02:54:15
ofst[100]:err_code:0xa0403034, clock:0x0b4f46d7  2006/01/05 02:54:15
ofst[104]:err_code:0xa0404401, clock:0x0b4f4942  2006/01/05 03:04:34
ofst[108]:err_code:0xa0403034, clock:0x0b4f4942  2006/01/05 03:04:34
ofst[112]:err_code:0xa0404401, clock:0x0b4f5024  2006/01/05 03:33:56
ofst[116]:err_code:0xa0403034, clock:0x0b4f5024  2006/01/05 03:33:56
ofst[120]:err_code:0xa0404401, clock:0x0b4fd071  2006/01/05 12:41:21
ofst[124]:err_code:0xa0403034, clock:0x0b4fd071  2006/01/05 12:41:21
ofst[  0]:err_code:0xa0404401, clock:0x0b4fe23b  2006/01/05 13:57:15
ofst[  4]:err_code:0xa0403034, clock:0x0b4fe23b  2006/01/05 13:57:15
ofst[  8]:err_code:0xa0801701, clock:0x0b4886d5  2005/12/31 00:01:25
ofst[ 12]:err_code:0xa0801601, clock:0x0b4886d5  2005/12/31 00:01:25
ofst[ 16]:err_code:0xa0404422, clock:0x0b488709  2005/12/31 00:02:17
ofst[ 20]:err_code:0xa0403034, clock:0x0b488709  2005/12/31 00:02:17
ofst[ 24]:err_code:0xa0404422, clock:0x0b488765  2005/12/31 00:03:49
ofst[ 28]:err_code:0xa0403034, clock:0x0b488765  2005/12/31 00:03:49
ofst[ 32]:err_code:0xa0404422, clock:0x0b4ee813  2006/01/04 20:09:55
ofst[ 36]:err_code:0xa0403034, clock:0x0b4ee813  2006/01/04 20:09:55
ofst[ 40]:err_code:0xa0404401, clock:0x0b616809  2006/01/18 20:56:41
ofst[ 44]:err_code:0xa0403034, clock:0x0b616809  2006/01/18 20:56:41
ofst[ 48]:err_code:0xa0093003, clock:0xffffffff
ofst[ 52]:err_code:0xa0093003, clock:0xffffffff
ofst[ 56]:err_code:0xa0093003, clock:0xffffffff
ofst[ 60]:err_code:0xa0093003, clock:0xffffffff
ofst[ 64]:err_code:0xa0093003, clock:0xffffffff
ofst[ 68]:err_code:0xa0093003, clock:0xffffffff
ofst[ 72]:err_code:0xa0404401, clock:0xffffffff
ofst[ 76]:err_code:0xa0403034, clock:0xffffffff
[mullion]$
>$

Notes: The dates are old because i never changed the date/time since i got this PS3. Same goes for each time i removed the battery, i never changed the date/time, also at the time of running this test i had the battery removed.

Error Code:- 3034 comes up the most, along side 4401 and 4422, i just can't tell if it's a problem with the BE CELL or with the RSX. In the "powerstate" the BitTraining comes up for BE, rather than RSX like other logs. Could it be the CPU is the problem? Werid because on a SEM-001 the CPU = 65nm and the RSX is the 90nm. I was thinking before to Reball the RSX, but now i'm thinking to start with BE CELL, will think about it.

I have another SEM-001 to test and also a DIA-001, will aim to post those logs / results here, all being well soon.
 
Last edited:
Error Code:- 3034 comes up the most, along side 4401 and 4422, i just can't tell if it's a problem with the BE CELL or with the RSX. In the "powerstate" the BitTraining comes up for BE, rather than RSX like other logs. Could it be the CPU is the problem? Werid because on a SEM-001 the CPU = 65nm and the RSX is the 90nm. I was thinking before to Reball the RSX, but now i'm thinking to start with BE CELL, will think about it.

I have another SEM-001 to test and also a DIA-001, will aim to post those logs / results here, all being well soon.

Guys, come on. The moment you see 3034, it's 99% RSX. The data errors won't even matter. It will show a bittraining error on BE's side because it failed to communicate with RSX. It doesn't matter which one shows up in there.

Honestly, this is just wishful thinking. There should be a sticky thread where it's explained what 3034 means. I don't mean to be harsh, but posting these logs will not help when the verdict is already clear, reballing CPU will not cure this and Felix isn't going to magically fix this either. I'm sorry, but that's just how it is with these boards. There's rarely a simple solution.
 
Last edited:
Guys, come on. The moment you see 3034, it's 99% RSX. The data errors won't even matter. It will show a bittraining error on BE's side because it failed to communicate with RSX. It doesn't matter which one shows up in there.

Honestly, this is just wishful thinking. There should be a sticky thread where it's explained what 3034 means.

That's even better then, i was hoping it was the RSX and not the CELL.

It's just when we read the SysCon Error page on PSDevWiki... this is what it says for error code 3034:-

"3034 = CELL / RSX Communication Error
This is the most common error seen in early Phat model PS3's with the hottest 90nm RSX and CELL processors. It is the hallmark of a BGA defect (such as a cracked solder ball). It is by no means limited to the early models, however. These arrors have been seen in every model of PS3 with varying frequency. The most reliable consoles appear to be those with a CPU/GPU of smaller manufacturing process, such as the Super Slim (SS) models (42xx and later) which have a 45nm CELL and 28nm RSX. The least reliable are the PS2 Backwards Compatable A-E Models, which have 90nm RSX/CELL.

The root cause is mechanical fatigue due to thermal cycling. The materials used to contruct the motherboard and processors have different properties. For example, the cooefficient of thermal expansion for FR4 Fiberglass used in the Motherboard and Processor Substrate is different than that of the copper BGA pads, which is different than that of the Lead-Free solder used to join them. This means they will expand and contract at different rates as the chip heats up and cools down, which applies shearing force to the BGA. Over many thermal cycle this deforms the solder balls and cause a defect (Such as a solder crack, torn trace, or the ball may pull away from the pad).

3034 is triggered when the voltage or data lines connecting the CPU/GPU are broken. There is often a data error (4XXX) that also appears, but not always. The most common cause is a BGA defect on the RSX, which usually requires a reball/reflow to repair. Something about the RSX construction or workload causes it to fail more frequently, but the CPU can fail too. However, it's not always a BGA defect. The bumps on either chip can fail, Flex IO traces (the data lines that connect the CPU/GPU) can be broken/scratched, or accumulated damage from wear and tear (electromigration) can also cause this error. The true percentage of consoles with BGA defects that can be fixed with a reball/reflow is unknown. However, there is evidence to suggest that the underfill used to reinforce the CPU/GPU die and RSX Ram bumps was not as effective when the PS3 was manufactured. This could explain many of the consoles who's reball fails prematurely afterwards.

If a reflow/reball of both the CPU/GPU fails, then the chip is beyond repair and needs replaced. The RSX can be replaced with the same model without modification. It can be replaced with a different model using a modchip that injects the correct RSX ID during boot. This has been nicknamed a "Frankenstein Mod." Since they are married to each other, the CPU can only be replaced if also replacing the chipset (NAND/NOR and SYSCON Chips). Since the CPU can't as easily be replaced, a dead CPU is usually considered unrepairable."

Suggesting it could be either RSX or CELL.
 
That's even better then, i was hoping it was the RSX and not the CELL.

It's just when we read the SysCon Error page on PSDevWiki... this is what it says for error code 3034:-

"3034 = CELL / RSX Communication Error
This is the most common error seen in early Phat model PS3's with the hottest 90nm RSX and CELL processors. It is the hallmark of a BGA defect (such as a cracked solder ball). It is by no means limited to the early models, however. These arrors have been seen in every model of PS3 with varying frequency. The most reliable consoles appear to be those with a CPU/GPU of smaller manufacturing process, such as the Super Slim (SS) models (42xx and later) which have a 45nm CELL and 28nm RSX. The least reliable are the PS2 Backwards Compatable A-E Models, which have 90nm RSX/CELL.

The root cause is mechanical fatigue due to thermal cycling. The materials used to contruct the motherboard and processors have different properties. For example, the cooefficient of thermal expansion for FR4 Fiberglass used in the Motherboard and Processor Substrate is different than that of the copper BGA pads, which is different than that of the Lead-Free solder used to join them. This means they will expand and contract at different rates as the chip heats up and cools down, which applies shearing force to the BGA. Over many thermal cycle this deforms the solder balls and cause a defect (Such as a solder crack, torn trace, or the ball may pull away from the pad).

3034 is triggered when the voltage or data lines connecting the CPU/GPU are broken. There is often a data error (4XXX) that also appears, but not always. The most common cause is a BGA defect on the RSX, which usually requires a reball/reflow to repair. Something about the RSX construction or workload causes it to fail more frequently, but the CPU can fail too. However, it's not always a BGA defect. The bumps on either chip can fail, Flex IO traces (the data lines that connect the CPU/GPU) can be broken/scratched, or accumulated damage from wear and tear (electromigration) can also cause this error. The true percentage of consoles with BGA defects that can be fixed with a reball/reflow is unknown. However, there is evidence to suggest that the underfill used to reinforce the CPU/GPU die and RSX Ram bumps was not as effective when the PS3 was manufactured. This could explain many of the consoles who's reball fails prematurely afterwards.

If a reflow/reball of both the CPU/GPU fails, then the chip is beyond repair and needs replaced. The RSX can be replaced with the same model without modification. It can be replaced with a different model using a modchip that injects the correct RSX ID during boot. This has been nicknamed a "Frankenstein Mod." Since they are married to each other, the CPU can only be replaced if also replacing the chipset (NAND/NOR and SYSCON Chips). Since the CPU can't as easily be replaced, a dead CPU is usually considered unrepairable."

Suggesting it could be either RSX or CELL.

Well, perhaps Felix was being a bit optimistic there hinting a rare CPU fault... I would love to see the statistics of 3034 being related to CPU.

Edit. The statistics are here, : https://www.psx-place.com/threads/f...nd-error-reporting.30100/page-119#post-320933

According to this there's actually a chance that your CPU has bad bga... But I'd still be surprised. I don't know how that conclusion was made, but I'd love to be wrong haha.
 
Last edited:
Well, perhaps Felix was being a bit optimistic there hinting a rare CPU fault... I would love to see the statistics of 3034 being related to CPU.

I guess on a SEM-001, with only a 65nm CELL and a 90nm RSX, it's more likely to be the RSX due to overheating issues with the larger sized die. At this point i would aim to start with reballing the RSX first to see what happens, then move to the CELL if that didn't stop the YLOD.
 
Last edited:
Perhaps I shoud expect that people will not read further than the first sentence. Had you read further you would have seen this...

"The most common cause is a BGA defect on the RSX, which usually requires a reball/reflow to repair. Something about the RSX construction or workload causes it to fail more frequently, but the CPU can fail too."

Maybe I should have led with that.
 
Perhaps I shoud expect that people will not read further than the first sentence. Had you read further you would have seen this...

"The most common cause is a BGA defect on the RSX, which usually requires a reball/reflow to repair. Something about the RSX construction or workload causes it to fail more frequently, but the CPU can fail too."

Maybe I should have led with that.

Yep my 3034 errors did come with error 4401 in the same time stamp. Suggesting RSX to be the problem. So i just put the mobo into sunlight to see into the BGA of both RSX and CELL. The RSX BGA are very dirty grey, squashed and odd shaped. Whereas the CELL BGA are shiny and all nicely rounded. I think it most probably is the RSX.

I think why people prefer first changing the NEC/Tokins, is because we can screw it up and there's no real damage done that can't be fixed (in general). However, a screw up with a reball / reflow can kill the CPU/GPU dead, for good.

Most people here love their PS3s and don't want its blood on their hands. :bitsbubba: Although my PS3s have taken plenty of my blood, sweat and tears lol.

EDIT ...would an X-Ray of the BGAs show anything? I have a friend who's a surgeon and said i could X-Ray my PS3 mobos at their clinic, but i wasn't sure if fractured solder balls (BGA) would show on an X-Ray?
 
Last edited:
...CPU will not cure this and Felix isn't going to magically fix this either. I'm sorry, but that's just how it is with these boards.
I'd agree with this statment. Most of the time, it's RSX related. The FlexIO Bit Calibration process involves both processors. That's the reason I say it can also be the CELL. However, I only made that point to impress upon people that the RSX is not the only way this error can occur. It is rare for the CELL BGA to be bad.

And yeah, I am by no means an oricle. Just some random dude trying to learn and help fix these cool consoles.
 
Guys, come on. The moment you see 3034, it's 99% RSX. The data errors won't even matter. It will show a bittraining error on BE's side because it failed to communicate with RSX. It doesn't matter which one shows up in there.

Honestly, this is just wishful thinking. There should be a sticky thread where it's explained what 3034 means. I don't mean to be harsh, but posting these logs will not help when the verdict is already clear, reballing CPU will not cure this and Felix isn't going to magically fix this either. I'm sorry, but that's just how it is with these boards. There's rarely a simple solution.

I don't disagree as far as diagnostics goes, and reball so far is the only fix! I wish people would stop jumping to recaps as the first fix because most of the time it's wasted effort... it cracks me up because it's far more effort (and cost) to remove the tokins and install new caps that it is buy a cheap reader, solder two wires and get the codes out. You don't even need internal mode for that!

I also think there's definitely more to know, and the more information the better... It might not help with the fix, but it might help with prevention.
 
I don't disagree as far as diagnostics goes, and reball so far is the only fix! I wish people would stop jumping to recaps as the first fix because most of the time it's wasted effort... it cracks me up because it's far more effort (and cost) to remove the tokins and install new caps that it is buy a cheap reader, solder two wires and get the codes out. You don't even need internal mode for that!

I also think there's definitely more to know, and the more information the better... It might not help with the fix, but it might help with prevention.

I'd only add that reball isn't even really a conclusive "fix". It's still a shot in the dark hoping that BGA is the cause for the error, when in reality nobody can be sure of that. Or let's say the console was dropped and you really got a bga problem, then you reball it and it works for some but sooner or later you could still end up with a bumpgate fault. So how many months/years of operation is enough to consider reballing a solid fix ? 1-2-3 years?
 
I'm sure you guys already know about this PDF, supposedly from Sony, maked "SCE CONFIDENTIAL - Error Log code" from 2007.

https://github.com/db260179/ps3syscon/blob/master/Syscon error log codes.pdf

For error code 3034 it says:- BE Error (IC1001)

View attachment 36240

The pdf wasn't meant to be taken literally. That's just an interpretation of how syscon sees the errors. In other words, these components aren't necessarily causes for the errors. They can just be associated with reporting it, so you need to dig deeper.

CPU detects a problem on one of the communication lines with RSX during a bittraining sequence, syscon assigns a code 3034 to it and shows that it came from a CPU. But neither CPU nor syscon understand why the communication error happened. They just show the end result of it.
 
Last edited:
Hello,
I have recently inherited a Fat PS3 with YLOD. The console was in good conditions with warranty sticker still applied, so I'm pretty confident nobody has messed with it (except myself :tranquillity:)
I believe the console belongs to the Warm-Start category, according to the faq.
This is the behavior I observe:
- from completely cold, the console shows YLOD between 3 and 12 secs approx.
- during a period of about 3 minutes of repeatedly trying to switch the console on, the time elapsed to YLOD becomes progressively longer, until the console becomes stable and stays powered on.
- after this time, all in all the console seems to be working fine. I also tried to play GT5 for 30 mins with no visible issue.
If I switch the console off then on when warm, the console will boot fine first attempt.
Confident of the stability, I went for jailbreak and extracted syscon logs with PS3 Advanced tools.
From the logs, I can see error 1002 (RSX VRAM Power Fail) as only error.
I read on this forum that 1002 is usually a sign of bad Tokins, but I also read that for Warm-Start YLOD consoles, the Tokins are usually not the culprits.
Any suggestion on how I could take the investigation further?
Logs below, and thanks in advance to all the experts in this thread!
Code:
Firmware Version: 4.88 (build 50731)
Platform ID: CokE10
Product Code: 00 87
Product Sub Code: 00 07
Hardware Config: 4E00FFFF0107BCBF
Syscon Fimware Version: 0E69.0001000400040002 (EEPROM: 0001000400040002)

Bringup Count: 1726, Shutdown Count: 1639
Runtime: 162 Days, 7 Hours, 24 Minutes, 52 Seconds

Error Log
01: A0801002  Wed Feb 16 01:58:24 2022
02: A0801002  Wed Feb 16 01:57:37 2022
03: A0801002  Wed Feb 16 01:57:02 2022
04: A0801002  Wed Feb 16 01:56:42 2022
05: A0801002  Wed Feb 16 01:56:34 2022
06: A0801002  Wed Feb 16 01:56:25 2022
07: A0801002  Wed Feb 16 01:56:06 2022
08: A0801002  Wed Feb 16 01:55:45 2022
09: A0801002  Wed Feb 16 01:55:31 2022
10: A0801002  Mon Feb 14 18:12:41 2022
11: A0801002  Mon Feb 14 18:12:33 2022
12: A0801002  Mon Feb 14 18:11:51 2022
13: A0801002  Mon Feb 14 18:11:43 2022
14: A0801002  Mon Feb 14 18:11:23 2022
15: A0801002  Mon Feb 14 18:11:15 2022
16: A0801002  Mon Feb 14 18:11:07 2022
17: A0801002  Mon Feb 14 18:10:50 2022
18: A0801002  Mon Feb 14 18:10:32 2022
19: A0801002  Mon Feb 14 18:10:15 2022
20: A0801002  Mon Feb 14 18:10:07 2022
21: A0801002  Mon Feb 14 09:12:57 2022
22: A0801002  Mon Feb 14 09:12:39 2022
23: A0801002  Mon Feb 14 09:12:33 2022
24: A0801002  Mon Feb 14 09:12:14 2022
25: A0801002  Mon Feb 14 09:12:06 2022
26: A0801002  Mon Feb 14 09:11:48 2022
27: A0801002  Mon Feb 14 09:11:31 2022
28: A0801002  Mon Feb 14 09:11:13 2022
29: A0801002  Mon Feb 14 09:10:55 2022
30: A0801002  Sun Feb 13 22:48:59 2022
31: A0801002  Sun Feb 13 20:52:27 2022
32: FFFFFFFF  Sun Feb 13 20:52:07 2022
 
The pdf wasn't meant to be taken literally. That's just an interpretation of how syscon sees the errors. In other words, these components aren't necessarily causes for the errors. They can just be associated with reporting it, so you need to dig deeper.

CPU detects a problem on one of the communication lines with RSX during a bittraining sequence, syscon assigns a code 3034 to it and shows that it came from a CPU. But neither CPU nor syscon understand why the communication error happened. They just show the end result of it.

Oh i see, o.k right, thanks for clearing that up for me. :cool2:
 
It's still a shot in the dark hoping that BGA is the cause for the error, when in reality nobody can be sure of that.
Yes! that's my main "issue" (that's probably too strong a word) with the state of things. Yeah, it's a likely "fix", but hardly conclusive, and as someone else already pointed out, expensive, risky and of limited "repeat".

I just had an idea, while I typed this reply -- It seems that all the traces for flexIO are exposed at the outer layers of the board. They're small, but not impossibly too small. Maybe we can expose them (by removing a bit of the mask) and 'scope them? That might tell us, for one thing, the direction of the 3034/bit training errors (as we were just discussing above) -- is it Cell -> RSX, or the other way around? I'm going to try it with my little pocket scope, and it might be good enough to see some traffic, but we might need a real 'scope to get meaningful data.
 
Back
Top