These photos are even worse than the one before. I cannot see the traces well. I could see them much better in the first picture... Try to clean off the paste and take another picture that is similar to the first one where copper traces can be seen properly. You need to determine if they are broken or not. Now that I had another look, they do seem damaged.
I forget not everybody realizes the tricks you've shared here and there. So if you don't mind, I'll just copy what you said before.
For the cookie boards (is it the same for others?) first you need to input the command "w 7202 2" into the syscon to activate SB UART port. Then you need to solder the RXD wire to the correct point.
After that you can connect to the South Bridge UART with Putty or any other tool. Settings are 115200 baud, 8N1. It will start printing the text automatically.
Hi, just make sure I did it right. Here's what I did:
Get two 3.3V USB-TTL adapter, first one /dev/ttyUSB0 connected to SYSCON 4 wires as we all did here, second one /dev/ttyUSB1 connected to SB_RxD and SB_TxD, and one common GND. Then start a serial program on my linux like:
Finally auth and bringup. This should let the serial port on /dev/ttyUSB1 i.e. the second serial port print out southbridge messages right?
I used a YLOD COK-002 board for testing, when it stopped at 3034 errors(RSX problem), the SB serial port only print out 1 or 2 garbage characters. Here's my SB serial port output:
Code:
picocom -b 115200 /dev/ttyUSB1
picocom v2.2
port is : /dev/ttyUSB1
flowcontrol : none
baudrate is : 115200
parity is : none
databits are : 8
stopbits are : 1
escape is : C-a
local echo is : no
noinit is : no
noreset is : no
nolock is : no
send_cmd is : sz -vv
receive_cmd is : rz -vv -E
imap is :
omap is :
emap is : crcrlf,delbs,
Type [C-a] [C-h] to see available commands
Terminal ready
��
Thanks for using picocom
I soldered the SB serial wire like this:
What could have gone wrong? Is the empty serial port means SB hasn't started to work yet(because of the RSX error)? Thanks a lot!
Hi, just make sure I did it right. Here's what I did:
Get two 3.3V USB-TTL adapter, first one /dev/ttyUSB0 connected to SYSCON 4 wires as we all did here, second one /dev/ttyUSB1 connected to SB_RxD and SB_TxD, and one common GND. Then start a serial program on my linux like:
Finally auth and bringup. This should let the serial port on /dev/ttyUSB1 i.e. the second serial port print out southbridge messages right?
I used a YLOD COK-002 board for testing, when it stopped at 3034 errors(RSX problem), the SB serial port only print out 1 or 2 garbage characters. Here's my SB serial port output:
Code:
picocom -b 115200 /dev/ttyUSB1
picocom v2.2
port is : /dev/ttyUSB1
flowcontrol : none
baudrate is : 115200
parity is : none
databits are : 8
stopbits are : 1
escape is : C-a
local echo is : no
noinit is : no
noreset is : no
nolock is : no
send_cmd is : sz -vv
receive_cmd is : rz -vv -E
imap is :
omap is :
emap is : crcrlf,delbs,
Type [C-a] [C-h] to see available commands
Terminal ready
��
Thanks for using picocom
Hi, I tried the bringup-shutdown loop multiple times, the only difference is syscon serial is spitting some more thermal errors, but SB UART is either silence or output one or two garbage characters. The syscon UART log is like:
Hi, I tried the bringup-shutdown loop multiple times, the only difference is syscon serial is spitting some more thermal errors, but SB UART is either silence or output one or two garbage characters. The syscon UART log is like:
You have RSX failure. Bga or bumpgate, either way it's something about RSX. Either replace it or reball . I'm not sure why are you trying to read SB log. I don't think it will print anything because your system is stuck in PowSeq fail. Sb log prints messages only if you don't get stuck in bittraining or any other POWSEQ errors.
You have RSX failure. Bga or bumpgate, either way it's something about RSX. Either replace it or reball . I'm not sure why are you trying to read SB log. I don't think it will print anything because your system is stuck in PowSeq fail. Sb log prints messages only if you don't get stuck in bittraining or any other POWSEQ errors.
Thanks guys. That makes sense. I did it to a RSX BGA failed COK-002 only to test it before I try the GLOD one. Now the boot sequence is a bit clearer to me, so yeah if all the steps and the soldering were right, I'd do it to that GLOD COK-002.
Thanks guys. That makes sense. I did it to a RSX BGA failed COK-002 only to test it before I try the GLOD one. Now the boot sequence is a bit clearer to me, so yeah if all the steps and the soldering were right, I'd do it to that GLOD COK-002.
I would also say that it is possible in some cases that GLOD is also caused by RSX error. I had a board like that. There was no errors in the syscon log. The SB log was printing that xmb is loaded and there were no issues with the HDMI chip. But RSX wasn't setting the resolution, so it was GLOD. Quite an unusual fault it was.
Yes that is my "special glod" were XMB is on but no display on any port, exchanged rsx working fine.
Only here SB uart port is helping us to understand our problem.
Hello,
and I have a non-working fat - CECHC03 with a motherboard COK-002.
Before its first opening, it gave the GLOD.
After thorough cleaning and replacement of thermal paste - instant YLOD.
After diagnosis give these codes: A0232102, A0403034, A0902120, A0404002.
I read that they are for reballing.
I reflow with a hot air gun
After diagnosis give only this code: A0232102
Again instant YLOD.
In my opinion, something happened with cleaning (with vacuum cleaner), because it changed from GLOD to YLOD without doing anything else.
And can this code A0232102 be understood where it comes from. I will be grateful for any help.
Hello,
and I have a non-working fat - CECHC03 with a motherboard COK-002.
Before its first opening, it gave the GLOD.
After thorough cleaning and replacement of thermal paste - instant YLOD.
After diagnosis give these codes: A0232102, A0403034, A0902120, A0404002.
I read that they are for reballing.
I reflow with a hot air gun
After diagnosis give only this code: A0232102
Again instant YLOD.
In my opinion, something happened with cleaning (with vacuum cleaner), because it changed from GLOD to YLOD without doing anything else.
And can this code A0232102 be understood where it comes from. I will be grateful for any help.
You are not the first to have reported a 23 2102 (Fatal RSX Error) in connection with 3034 before. I have it in my notes that it can be related, such as a failed reflow/reball. There are alot of errors that can be caused bad RSX solder connection.
Your reflow method is very important to achieve an optimal results. You must control temperatures. Strict cleaning and reflow procedures.
Reflows suck because they can't remove the oxidation on pads. So the solder won't adhere to them even if it goes molten. So the best you can hope for is a console that works for a few months, if at all.
The only viable substitute for a reball is to replace the RSX with a more reliable model, because even if you do achieve a good reball, bumpgate will troll you sooner or later. @DeadEnd has finally convinced me of that. Having said that, it should last at least a year after a successful reball.
What I can tell you is that a step #23 is earlier in the power on sequence than #40. 23 is when the console checks for the GPU/SB/CPU. Before you were passing that check and moving onto bittraining step #40.. So your reflow did more damage than good.
What I can tell you is that a step #23 is earlier in the power on sequence than #40. 23 is when the console checks for the GPU/SB/CPU. Before you were passing that check and moving onto bittraining step #40.. So your reflow did more damage than good.
Not necessarily more damage. Notice, he had this error before as well. His reflow has eliminated 3034 and all the rest, so technically it wasn't entirely fruitless.
Hello,
and I have a non-working fat - CECHC03 with a motherboard COK-002.
Before its first opening, it gave the GLOD.
After thorough cleaning and replacement of thermal paste - instant YLOD.
After diagnosis give these codes: A0232102, A0403034, A0902120, A0404002.
I read that they are for reballing.
I reflow with a hot air gun
After diagnosis give only this code: A0232102
Again instant YLOD.
In my opinion, something happened with cleaning (with vacuum cleaner), because it changed from GLOD to YLOD without doing anything else.
And can this code A0232102 be understood where it comes from. I will be grateful for any help.
I must ask if the error logs were cleared after reflowing ? Maybe the errors are the same, they just couldn't fit into the log. Also bringup command could be interesting to see. But I guess if it doesn't get to bittraining, then it might not show anything new.
Not necessarily more damage. Notice, he had this error before as well. His reflow has eliminated 3034 and all the rest, so technically it wasn't entirely fruitless.
I must ask if the error logs were cleared after reball ? Maybe the errors are the same, they just couldn't fit into the log. Also bringup command could be interesting to see.
I was/am assuming that when a #23 fatal error is encountered, the system does not proceed to bittraining. @amxcs can you post the errorlogs? Did you record the timestamps so we can see if those 232102's happened at the same time as the 3034/4002's? If they did, then my assumption that 232101 will YLOD then and there without proceeding is wrong.
OTOH, if these 3034/4002's were previous errors to the 232102's, then the 3034/4002's were/are probably still there, but the POS doesn't get that far. So maybe you didn't do more damage. You just didn't get anywhere with the reflow.
Power Control Topology - Part 2 (Power Good & Voltage Regulation)
Wikipedia said:
The Power Good signal (power-good) is a signal provided by a computer power supply to indicate to the motherboard that all of the voltages are within specification and that the system may proceed to boot and operate
The PlayStation 3 Power Supply Unit (PSU) outputs 2 voltages from which all other system voltages are derived - 12V_MAIN and 5V_EVER. I made the following flowchart of a COK-001 to illustrate.
12V_MAIN is the one from which most other system voltages are derived (5V, 3.3v, 1.8v, 1.7, 1.2, 1.0 etc). The CPU/GPU/SB, are all powered from this 12V_MAIN. But each one of them has a chain of chips before the correct voltage can get to them. The purpose of these chips is to provide stable voltage and control of when that voltage is delivered.
For example, the CPU's 1.0V_BE_VDDC is produced by its Voltage Regulation Module (VRM). It includes IC6103, a buck controller which drives IC6104, IC6105, IC6106 buck converters. That controller is enabled by the syscon chip at the appropriate time to turn on the CPU. IC6107 is involved in the coordination of this effort. So too are the NEC/TOKIN bulk filtering capacitors. And also the array of MLCC bypass caps. Collectively they constitute the CPU's Voltage Regulation Module.
The RSX has it's own VRM, the SB too. Most subsystems have have an IC controller that enables their voltage at the appropriate time in the Power On Sequence. The Voltage flowchart I made above is the general manner in which these subsystems recieve their voltages. Not every IC in the PS3 is listed, just the major ones involved in producing the system voltages marked by a square box. Also that image was based off the COK-001 Service Manual. It does not apply to all models, but can be used as a general guide for them. Obviously, the PS2 Hardware section will not be included in non-backward compatible models.
How does this relate to power good? Well, each system voltage in a box in that image has a controller IC that is monitoring it's output voltage. If that voltage is within regulation, it will report power good to the SYSCON "system control." If any one of them falls out of regulation and reports no power good, the syscon will refuse to boot.
The voltage flowchart above combined with the MB Side B "Jumper Lead" Test Pad locations below should make diagnosing and troubleshooting easier.
I made this one from scratch, with high enough resolution to read the Labels and to still upload to the forum. I was really pushing the limit here. This will make it easier for you to search the service manual when you find a voltage that's missing. When I say this took a long time to make, it's an understatement.
Speaking of time consuming projects. Here's more fruit from that effort.
ATA = Hard Drive Interface
BC = PS2 Bridge Chip
BEI = Redwood Broadband Engine Interface Controller (AKA RC)
CE = Control Enable
CELL BE = CPU ("Cell Broadband Engine")
CG = Clock Generator
CLK = Clock
CONT = Control
CTL = Control
EE = Emotion Engine (PS2 CPU)
EIB = Element Interconnect Bus
EN = Enable
ESW = Ethernet Switch or Switchable
FB = Feedback
FlexIO = Redwood Rambus FlexIO CPU/GPU Interface
GS = Graphics Synthesizer (PS2 GPU)
MC2 = ? (VDDIO à BE_SPI, CHKSTP, JTAG, TBEN, & P_L_BYPASS)
MIC = Yellowstone Memory Interface Controller (AKA YC)
PCI = PS2 Hardware Interface
PLL = Phase-Locked Loop
PPE = Power Processor Element (main dual threaded CPU)
PWRGD = Power Good
RC = Redwood Broadband Engine Interface Controller (AKA BEI)
RRAC = Redwood Rambus FlexIO CPU/GPU Interface Voltage (VDDR)
RST = Reset
RSX = GPU ("Reality Synthesizer")
SB = South Bridge
SPE = Synergistic Processing Element (8x, 1 disabled for redundancy)
STBY = Standby
SW = Switch
VDD = Positive Field Emitting Transistor (FET) Voltage
VDDA = Positive FET Voltage Supply for Analog Subsystems
X = Rambus XDRAM Memory Subsystem
XDR = Yellowstone XDRAMTM System Memory (Y0_XDR0, Y0_XDR1, Y1_XDR0, & Y1_XDR1)
XGC = XDRAM Clock Generator
XIO = Yellowstone Rambus CPU/XDR Memory Interface ("Extreme Data Rate IO")
YC = Yellowstone Memory Interface Controller (AKA MIC)
YRAC = Yellowstone Rambus CPU/XDR Memory Interface Voltage
+1.0V_BE_VDDC = Cell Be Processor Core (PPE/SPEs)
+1.5V_BE_THERMAL_VDDA = CPU Thermal Power?
+1.6V_BE_VDDA = CPU ADC Voltage for PLL & Thermal
+1.5V_BE_YC_VDDA = CPU/XDR Yellowstone XIO Controller ADC Interface
+1.5V_BE_RC_VDDA = SB/CPU Redwood Rambus FlexIO Controller ADC Interface
+1.2V_RSX_VDDR = RSX Redwood Rambus FlexIO Core
+1.2V_RSX_VDDC = RSX Processor Core
+1.5V_RSX_RC_VDDA = RSX Redwood Rambus FlexIO Controller ADC Interface
+1.5V_RSX_VDDIO = VDDP_VO (Voltage for Picture, Video Out to DVE/HDMI)
+1.8V_RSX_PLL_VDD = RSX Phase-Locked Loop
+1.8V_RSX_FBVDDQ = RSX DRAM
Analog Voltage for the core PLL of IC5004, which is an ICS9214 Clock Generator used to support the Rambus XDR memory subsystem and Redwood logic interface.
Do not take the above as gospel. If you notice anything needing updated let me know and I'll edit back as necessary.
Getting back to Power good:
Notice the GPU Buck Controller (IC6201). It's locate at B5. This controller is what controls power to the RSX. I have labeled the jumper locations for Enable, which is the signal SYSCON sends to power on the RSX. I also labeled Power Good (PWRGD), which is the signal the controller sends back to the SYSCON about the power regulation. If it's bad, /RSX_POW_FAIL goes low and the syscon throws an error. Which error? 3004 or 1002. Which one is a mystery still. I hypothesize it depends upon the Step number of the Power On Sequence. Basically, when power good went low. The hypothesis is that earlier step numbers = 3004. Later step numbers = 1002. However, the earliest reported 1002 was 06, and 3004 was 09. So that is inconsistant with my hypothesis. The 06 1002 report could be inaccurate, but reports are all I have to go off of.
The last pad I labeled on the RSX VRM Controller is the voltage feedback/drop jumper. This is what the controller is monitoring to decide if power is good or not. We should be able to probe this point to see voltage drop across RSX_VDDC. If it drops too much it triggers under voltage lock out and the controller will send no power good (PWRGD low). The syscon will error. The CPU has the same and other controllers around the board have a similar function.
Since, preventing voltage drops on the CPU/GPU is important for system stability, the NEC/TOKIN's are a concern. If they can't sustain the voltage for long enough underload, the voltage drop will fall out of regulation. One way to prevent this is to replace failing/aged tokins with new tantalum capacitors. Everyone is already familar with that. But there is another way. By adjusting the power good voltage dropout threshold, so that there is a wider range the voltage can fall before triggering an error.
Adjusting Power Good Threshold
Vid pins VID0-5 on the buck controller form a 6-digit code corresponding to the Vout No load setpoint. Power Good Vmin and Vmax thresholds are relative to that set point. With the stock COK-00X voltage divider values (15K and 20k), Vmin = -163mV. Vmax is always +100mV. The Vout voltage cannot deviate more than that. If it does power good goes low and the SYSCON will error.
In some official SONY refurbished consoles, new resistor values (27K and 10K) change Vmin = -400mV. So the Low Voltage threshold is now more than twice as low, allowing much more voltage ripple before it triggers an error. My hypothesis is SONY did this to reduce the frequency of 1001 and 1002's errors. That would explain why they did it to both buck controllers (CPU and GPU). A sort of admission of guilt that they either set it too aggressively or were compensating for bad NEC/TOKINs without replacing them.
Power good low voltage threshold is there to prevent system instability, but if SONY decided it was okay to loosen it, then perhaps we can follow suit. It could be particularly important because we're seeing a lot of unexplained 1001's recently. Currently, 1002's are assumed to be bad NEC/TOKIN's. Replacing aged bulk filter capacitors certainly works, but just because changing them fixes the error doesn't mean that's the only way to skin a cat.
I have intentionally held off recommending this mod to people, because I'm only freshly aware of it's potential. Before I conclude this is okay to do, I would like to know what the VID at idle is, so we can compare actual voltage measurements with the low voltage threshold. On a console with 1001/1002 errors, Vout should drop more than that before it experiences a YLOD. That means I need oscilloscope measurements of Vout on both the CPU/GPU and PWRGD. Then, by replicating Sony's mod, I would like to see if the error goes away AND the system is stable (requires stress testing)! If so, then we have discovered an easier method of resolving these errors.
Replacing those resistors is fine micro-soldering work that pretty much requires a microscope. And I need oscilloscope measurements of the voltage drop and PWRGD (Cell BE for 1001 and RSX for 1002). So it's not too easy or cheap.
@feng_ye your software side is fine, otherwise you won't be able to see "Connecting to Debug Device (SB UART)."
This is either one missing resistance from rsx side to AV ic /Hdmi ic, either that special glod. If you want to try exchange AV ic or Hdmi ic is your choice, or focus and compare that area. not sure what to say from here, but I am sure no software errors on that unit.
Also check that AV and Hdmi area parts very well.
You can also exchange both but start with AV ic, testing after each exchange.
@feng_ye your software side is fine, otherwise you won't be able to see "Connecting to Debug Device (SB UART)."
This is either one missing resistance from rsx side to AV ic /Hdmi ic, either that special glod. If you want to try exchange AV ic or Hdmi ic is your choice, or focus and compare that area. not sure what to say from here, but I am sure no software errors on that unit.
Also check that AV and Hdmi area parts very well.
You can also exchange both but start with AV ic, testing after each exchange.
No need to exchange HDMI IC, you can just check it by running commands like "hdmi chstat 0" a "hdmi ports" and "hdmi redid". I think it should be done when it's connected to a tv/monitor.