PS3 #11
(Tampered Tantalum Terror!)
Here we have PS3#11. This is a $60 e-bay reject decribed as, "YLOD. NEC Tokin swap has been done but it didn't solve the issue." I purchased it as a challenge for myself. I felt there was a decent chance they had attempted a tantalum install, when the real issue is a BGA/Bump defect requiring a reball to fix. That would be fine by me, since I bought it to Frankenstein Mod it with a 40nm RSX.
However, the first step is to diagnose. I'm recording these details to show you all my process.
Step 1: Inspection
Notes:
- Ports look fine, no missing, bent, or shorting pins. Normal corrosion, nothing horrendous.
- Case was not sealed (as described), but they didn't seat the lower shell correctly when they assembled it.
- The rubber foot that conceals the security screw was present! That's always nice to have. It's often missing. Speaking of the security screw, it's missing and so is the retention clip.
I have decided to always inspect the case and ports first. I learned this lesson the hard way when I got trolled by a bad HDMI port. I had gone way down the troubleshooting path, wasting time on fuses and voltages when it was the port. Since then, I have made it a point to inspect the case and ports first!
This serves to rule out bad connections. You don't want to plug an HDMI cord into a port with shorting pins! That's a good way to blow a fuse.
I'm not going to do any power tests. I want to preserve the errorlog and each test erases the oldest errors. So I will not do a power test until I have this thing on my test bench and only after the errorlog is already dumped!
So lets continue on with the teardown!
Step 2: Teardown
Missing screws...
Missing top cover retention clip and security screw.
Wifi antenna tape broken...
Wifi ribbon cable smashed under the RF shield instead of routed up to the card (this thing was reassembled quickly, without care for these details)...
BR ribbon cable not seated correctly...
IHS's swam from too much thermal paste and not pressuring them down before installing MB...
Motherboard is damaged more than the description led on.
They did indeed attempt a tantalum install. I measured resistance and surprisingly enough the RSX VDDC is at 3.1 ohms. Perfectly fine! So the tantalums are not shorting. Tantalums are those cheap, high ESR, AVX caps. Those are not appropriate for this mod. However, they are not hurting anything ATM, so I don't need to remove them yet.
More concerning is that the CPU/RSX have been delidded (poorly). There are numerous scrapes on the CPU which came dangerously close to wiping out traces...and may have. It appears they have covered the scrapes with glue/mask. It's possable someone attempted microsoldering trace repair and then covered with UV mask. IDK without being able to see under those blobs. There were a few blobs that came off after cleaning to reveal the scrape underneath. I got some closeups on them. The dammage I can see appears to have spared any traces, so if there are any broken traces, they're under the blobs.
As disconcerting as that is, let's continue..
Step 3 - SYSCON
I threw it on the test on the testbench and hooked up the SYSCON. I went for internal access so I could record the errlog with timestamps, becount, and bringup...
Code:
C:\Users\HTPC\Desktop\PS3\SYSCON>python ps3_syscon_uart_script.py COM5 CXRF
>$ AUTH
Auth successful
>$ becount
becount
Bringup : 1651 times
Shutdown: 760 times
Power-on: 80day 21hour 09min 20sec
[mullion]$
>$ errlog
errlog
ofst[ 88]:err_code:0xffffffff, clock:0x19c7ffda 2013/09/15 05:19:22
ofst[ 92]:err_code:0xa0801001, clock:0x1f1faa94 2016/07/18 14:56:20
ofst[ 96]:err_code:0xa0801001, clock:0x1f1fafd3 2016/07/18 15:18:43
ofst[100]:err_code:0xa0801001, clock:0x1f28ec2b 2016/07/25 15:26:35
ofst[104]:err_code:0xa0801004, clock:0x1f2aa9a2 2016/07/26 23:07:14
ofst[108]:err_code:0xa0801004, clock:0x1f304b20 2016/07/31 05:37:36
ofst[112]:err_code:0xa0801004, clock:0x1f304b9e 2016/07/31 05:39:42
ofst[116]:err_code:0xa0801004, clock:0x1f30b496 2016/07/31 13:07:34
ofst[120]:err_code:0xa0801004, clock:0x1f31f876 2016/08/01 12:09:26
ofst[124]:err_code:0xa0801001, clock:0x1f3bbc98 2016/08/08 21:56:40
ofst[ 0]:err_code:0xa0801001, clock:0x1f3bbe47 2016/08/08 22:03:51
ofst[ 4]:err_code:0xa0801004, clock:0x1f3c707a 2016/08/09 10:44:10
ofst[ 8]:err_code:0xa0801004, clock:0x1f3f27b0 2016/08/11 12:10:24
ofst[ 12]:err_code:0xa0801001, clock:0x1f498da9 2016/08/19 09:28:09
ofst[ 16]:err_code:0xa0801001, clock:0x1f499817 2016/08/19 10:12:39
ofst[ 20]:err_code:0xa0801001, clock:0x1f64e912 2016/09/09 03:29:22
ofst[ 24]:err_code:0xa0801001, clock:0x1f8221ab 2016/10/01 07:26:35
ofst[ 28]:err_code:0xa0801004, clock:0x1f824af9 2016/10/01 10:22:49
ofst[ 32]:err_code:0xa0801004, clock:0x1f824e4e 2016/10/01 10:37:02
ofst[ 36]:err_code:0xa0801001, clock:0x1f89fb4d 2016/10/07 06:21:01
ofst[ 40]:err_code:0xa0801001, clock:0x23972377 2018/12/02 23:12:55
ofst[ 44]:err_code:0xa0801001, clock:0x23972485 2018/12/02 23:17:25
ofst[ 48]:err_code:0xa0801001, clock:0x239725d5 2018/12/02 23:23:01
ofst[ 52]:err_code:0xa0801001, clock:0x23972708 2018/12/02 23:28:08
ofst[ 56]:err_code:0xa0801001, clock:0x23973722 2018/12/03 00:36:50
ofst[ 60]:err_code:0xa0801001, clock:0x23979c1e 2018/12/03 07:47:42
ofst[ 64]:err_code:0xa0801001, clock:0x24bacd8b 2019/07/12 04:48:11
ofst[ 68]:err_code:0xa0801001, clock:0x24bbf40f 2019/07/13 01:44:47
ofst[ 72]:err_code:0xa0801001, clock:0x2948916c 2021/12/12 10:54:36
ofst[ 76]:err_code:0xa0801001, clock:0x0b497301 2005/12/31 16:49:05
ofst[ 80]:err_code:0xa0403034, clock:0xffffffff
ofst[ 84]:err_code:0xa0403034, clock:0xffffffff
[mullion]$
>$ clearerrlog
clearerrlog
ERRLOG CLEARED
[mullion]$
>$ bringup
bringup
[SSM] state: 0000 -> 0101
Bringup Mode #0 (0xFF)
[SSM] ssmCb_OnStartingBePowOn() called.
[SSM] First Boot.
[SSM] Bringup mode : syspm_stat=00000000/00000000
[POWSEQ] PowerSeq_Setup called.
[SSM] state: 0101 -> 0201
[POWSEQ] AV Backend Setup
[SSM] state: 0201 -> 0102
[SSM] state: 0102 -> 0202
[SSM] state: 0202 -> 0103
[SSM] state: 0103 -> 0203
[SSM] ssmCb_BeforeBeOn() called.
[SSM] state: 0203 -> 0104
Psbd_SbTransMode_Half:0x20e2
>$
[POWERSEQ] Error : BitTraining RSX:RRAC:RX0:GLOBAL1:RX_STATUS
[SSM] state: 0104 -> 0304
[SSM] ssmCb_AfterBeOn2() called.
[SSM] PowSeq Fail : Detected !
[SSM] state: 0304 -> 0700
[POWSEQ] AV Backend Letup
[SSM] Shutdown mode : syspm_stat=00000000/00000000
[ERROR]: 0xa0403034
[POWSEQ] PowerSeq_Letup called.
[SSM] state: 0700 -> 0600
(PowerOff State) (Fatal)
[mullion]$
>$
We currently have a 3034 (by itself) and the bringup shows a BitTraining RSX:RRAC:RX0:GLOBAL1:RX_STATUS error.
The errorlog shows a long history of 1001 errors leading up to the 3034. To me that indicates the BGA was getting close to breaking as the solder cracks were propagating and changing the impedance of the FlexIO during operation. This causes the calibration to wander and fall out of regulation, triggering a CPU VRM error (1001) that occurred while the console was in the power on state (80) = A0801001. It's also possable that the CPU NEC/TOKINs were not adequately filtering noise and that they need replaced. I can test that theory with an oscilloscope later.
There were a few 1004 errors in there too. Often they happen when the console shuts down abruptly, such as an AC/DC power failure, unstable PSU, or when the VRM is unstable. It can occur when you are testing and flipping PWR on/off at the back rocker. I'm not concerned by it. It's just an unexpected shutdown. The 1001 errors are more significant.
However, the immediate problem is the 3034. Now that I have recorded all the error codes that were still in the log, I can clear it and test. I got 3034 with the same BitTraining error. Every time!
You might be asking yourself why I'm excited by that result? The reason is because it could have been much worse! I was about ready to write this console off as having a dead CPU. That's game over! If I had seen a 3032 or 3013, then that would indicate an issue with the CPU. We've seen those errors in CPU's killed from failed delids. So seeing a lone 3034 was welcome compared to what I was expecting to see! There might be hope for this MB!
Now, the BitTraining error is pointing to RX0. And there is a blob of glue/mask on the CPU above that FlexIO port. So that does concern me! I tried a bunch of pressure tests on both the CPU and RSX to see if I can confirm a BGA defect. No matter where I pressed I could not get a change. 3034 every time. That doesn't rule out a BGA defect, but I was hoping it would boot so I could rule out a broken CPU trace under that blob. If there is a broken trace there, then the pressure test wouldn't matter. So that's still a possibility.
Okay, that's where I'm at with this console. I'm debating on reflowing the RSX just to see if it works. If it does, then I'll Frankenstein this MB. If it doesn't, then I'll need to inspect those globs of glue. I really wish I could rule out the CPU, for peace of mind. But if there are repaired traces under the globs, I don't want to disturb it. The error is consistent with a BGA defect, usually the RSX, so I think that's where I should start.
Next up:
I did probe around all the voltage lines measuring resistance. I could not find any of them that seemed out of the ordinary. However, I didn't record the measurements. I will do that before I proceed. Same with measuring voltages and ripple on the oscilloscope. I want to record those before I attempt a reflow. I need to get measurements from a working COK-001 to compare with however. So that'll be in the next update. It's too much for now.
Continued on Frankenstein thread...