RIP-Felix
Senior Member
I decided to lead with the DATA you most wated to see. As of January 2022 114 users have reported over 250 Consoles worth of error codes to this thread and the NEC/TOKIN thread. I painstakenly collated all of the reported error codes and noted the consoles history/progression on the forum. What was wrong with it and how it was resolved, or if it wasn't made my best guess to diagnose it. A lot of the time the issue is unknown. So I created a category for that.
When I say "painstakenly," I mean I litterally burnt my eyes up on it. I have an optomitrist appointment moday to get them checked out. I suspect eye strain, perhaps an uncorrected vision problem. Whatever the case, it's a lot more unnerving than I imagined. Double vision will make you panic! I don't recommend it, take breaks!!! Why I did this to myself can only be described as a masochistic obsecession. Nay, an Ahab level vendetta against the YLOD and misinformation making things worse. I have a problem. I need to just let it go, but I don't want to.
In the meantime, please enjoy the fruits of my tribulation...
The following SYSCON Error Code Matrix is the same DATA as above, but viewed in a way that makes seeing which errors group at earlier step numbers. This is important because of Power On Sequencing. Knowing what the console is doing at each step number (top row) allows you to deduce what the error code (leftmost column) means. The number highlighted in Red is the number of consoles reported to have exhibited that error (out of the 250 total consoles we have data for)...
I will elaborate on the meaning of these errors and the step numbr at which the occur in "Power ON Topology Part 3" at a later date. But here is an example of how important the step number is in diagnosing.
Errors such as 1001 (CELL VRM Power Issue) and 1002 (RSX VRM Power Issue) can occur at many step numbers. If it occurs when the console is idle (Step #80), you know that nothing was wrong during Power On Sequence Testing (POST). For example, when NEC/TOKIN Proadlizers are begining to fail, ripple/noise under load triggers these errors (A0801001 / A0801002). As the NEC/TOKINs get worse they can interfere with the Power On Sequencing (POS). Important initialization steps and checks are performed during the POS. If excessive voltage ripple/noise causes an interruption during any one of these steps, you can get a 1001/1002 code with a step number earlier than 80.
You can better visualize this phenomenon using the second graph (Error Code Matrix). This matrix reveals other errors which exhibit the same behavior. 1004, 1200, 1301, 1802, 2024, 2030, 2031, 2033, 2101, 2120, 2124, and 2131 show the same ability to occur at multiple step numbers. This is significant and provides insight to what's causing the error. It's kind of a breakthrough for us!
For example, Step number A0 = Immediatly after SYSCON Reset. Errors seen occuring at that step number are 2030, 2031, 2033, 2124, & 2131. When the power Rocker is flipped on, IC6004 receive +5V_EVER directly from the PSU. It produces /SYSCON_RST automatically. An error immediately after SYSCON reset will beep the moment you flip that rocker or plug in the console!
- IC6005/6 are powered by +5V_EVER and produce +3.3V_EVER and 1.8V_EVER respectively. These are the voltages that Power the SYSCON chip. If either of those IC's go bad, the SYSCON cannot do anything. You won't even get a standby LED. I'll look like the PSU is dead, when it isn't.
- SYSCON Reset serves as enable for IC6009, which forms +3.3V_THERMAL for CPU, RSX, and SB Thermal Monitors. If they are bad at this step you get errors 2030, 2031, & 2033 errors respectively at step number A0.
- If IC6009 is bad, it will probably cause 2030 because the CELL Thermal monitor is the first in line to be checked.
If these errors occur later (step #80) you don't need to worry about any of this! When step numbers occur earlier than 80, you have to factor in what the console is doing at that step# to narrow down the possibilities.
The most poignant example is with 2120/3013 error combinations. The step number is everything. See the error combos section below.
I have a lot more, but I'll save it for "Power ON Topology Part 3."
Common Error combinations:
3034/4xxx
3034/4401
3034/4402
3034/4411
3034/4412
3034/4421
3034/4421
3034/4432
8x Consoles total:
- 2x consoles had the same BitTraining error (BE:RRAC:RX0:GLOBAL1:RX_STATUS). Others didn't post bringup.
- 1x had 1001's leading up to this 3034/4401.
- 1x had delid damage to CPU traces.
- 1x was probed with an oscilloscope and confirmed to have bad CPU tokins, but the BGA defect was the immediate issue. He reballed the RSX, and it didn't fix the console. The data error changed to 4411 though.
3034/4402
11x Consoles total:
- 2x RSX:RRAC:RX0:GLOBAL1:RX_STATUS
- 1x Reflowed RSX and was last reported working.
- 1x "RSX Nec/tokins are 3.0 Ohms, while CELL are 14.0 Ohms." That resistance on Cell is very high. Could be an open fault. Attempted a reflow that did not change the 40 3034 but the associated data error changed to 40 4401 with BitTraining BE:RRAC:RX0:GLOBAL1:RX_STATUS. It's unlikely he actually reflowed the solder.
- 1x was reballed (both RSX & CELL) to GLOD. No errors. RSX was moved to known working board and did the same. Issues is suspected to be dead RSX VRAM, but that is not related to the original 3034/4402. CPU/GPU reball fixed it, the RSX was just dead.
- 1x @vyktomvmpay25 Reballed both CPU/RSX to a GLOD. Concluded it was bad RSXRAM Replaced RSX to fix. Posted the resistance measurements which showed that VDDQ (VRAM) was okay. PLL_VDD should be reading in the Mega Ohms for a 40nm RSX (off the board). So that's probably bad. VDDC is a bit low. VDDR shouldn't be 640K, but that could be a typo. 0.640k would be about right.
- RSX_PLL_VDD = 10.41kΩ
- RSX_VDDC = 1.6 – 1.8Ω
- FBVDDQ = 200Ω
- VDDIO = 938kΩ
- VDDR = 640kΩ
- YC_RC_VDDA = 2,940kΩ
- YC_RC_VDDIO = 9.28 kΩ
- @vyktomvmpay25 Reballed both CPU/RSX to a GLOD. Again concluded was RSX VRAM, but this time didn't post resistance measuerments. Hes calling this situation a "Special GLOD."
- @vyktomvmpay25 Reballed RSX and replaced tokins (it had 1001/1002 also). It became GLOD again. Didn't go further.
- BitTraining RSX:RRAC:RX0:GLOBAL1:RX_STATUS
- 1x Had melted plastic around vents due to hair drier trick (probably).
3034/4411
Not much known (Potentially Dead RSX)
3x Consoles total:
3x Consoles total:
- 1x GLOD Noted, no resolution.
3034/4412
9x Consoles total:
- 1x RSX:RRAC:RX1:GLOBAL1:RX_STATUS
- 1x RSX:RRAC:RX0:GLOBAL1:RX_STATUS
- 1x Reflowed RSX, worked.
- 1x Pressure test didn't work (inconclusive)
- 1x had a long string of A0802203's followed by a long string of A0801802's leading up to the 3034/4412.
- 1x had a string of A0203010 leading up to the 3034/4412.
3034/4421
9x Consoles total:
- 1x Successfully reflowed/reballed
- 1x showed A0801200s leading up to 801701 / 801601 & 8014FF, then 3034/4421 thereafter.
- 1x RSX:RRAC:RX0:GLOBAL1:RX_STATUS. Reflowed, reported success.
- 1x occurred 1yr after rsx delid (perhaps not gluing IHS back on led to BGA defect. Lasted 1 year?)
3034/4421
5x Consoles total:
- 1x Sealed. Previous errors were A0801601, A0801701, A0801802, A0801004, A08014FF. Reballed to A0801701. Replaced CXM4024R no change. Reball CELL yeilded A0801001, A08014FF. Diagnosed RSX – bump / die failure.
- 1x Heat test confirmed RSX BGA defect.
3034/4432
7x Consoles total:
- 3x BitTraining RSX:RRAC:RX3:GLOBAL1:RX_STATUS
- 1x Reflowed to a working state. Previously attempted replacing tokins, which didn't help. YLOD returned 2 months later, despite agressive fan curves, with 801701, 801601, 801802 because PWR was on at the time the BGA/Bump failed, then 403034/404402 thereafter.
- 1x Pressure test didn't work (inconclusive). Reflow worked.
2120/3013
Here is a shortlist of the issues preceding A0202120/A0213013 errors
Hypothesis
3013 = CPU side of YC_RC_VDDIO
Evidence (strong): Correlative with few confounding variables
Evidence (weak): Anecdotal and Associative, with many confounding variables.
2120/3013 errors are possible in BGA/bump defects and fuse/SMDs. Distinguishing them from one another comes down to the step number at which they occur.
- Failed Reflow
- Replacing Tokins on a console previously experiencing A0403034
- Reballing RSX (only)
- Bad F6302 and C6320
- Visible trace Damage on CPU caused A0213013 by itself
- 0.0V at JL6354-JL6361 (DC/DC converters for AV backend)
- On page 13 of the SYSCON thread @chiefhunnablunts accidentally overwrote the eeprom at address 3961 from FF to 00. It resulted in A0202120/A0213013 error. When he changed it back, the error disappeared. Why would changing the bit there cause errors related to YC_RC_VDDIO? If it's related to SYSCON/CELL communication over SPI, then that explains it. It's also possible there was a precarious BGA defect that responded to mounting pressure between tests, IDK.
- On page 179 of the Tokin thread @TwelveAtNight fixed A0202120/A0203013 errors by replacing F6302 and C6320.
- On page 25 of the SYSCON thread @ patricksouza472 fixed A0202120/A0213013 replacing F6302 and C6320. Note, this console produced 10x 2120 to every 1x 3013. @Aran3a noted the same thing on page 45, but we never thought to try these SMDs. @moptop219 on page 218 of the tokins thread noted the same. There's a good chance this is what's wrong with these consoles.
- On page 25 of the SYSCON thread @db260179 fixed A0002120 by replacing TH2501.
- On page 21 of the SYSCON thread @nyislander had a A0213013. "he checked voltages and found 0.0V at JL6354-JL6361 pts in the schematic and confirmed +3.3V_MISC & +5V_MISC are present. Lack of voltage on the DC/DC converters downstream of IC6301 suggests there could have been blown fuses (F6301/2).
Hypothesis
3013 = CPU side of YC_RC_VDDIO
Evidence (strong): Correlative with few confounding variables
- @Kleon1876 on page 36 of the SYSCON thread had A0213013 after CPU trace damage from a failed delid attempt.
- @poot36 on page 106 of the SYSCON thread had A0202120/A0213013 when his CPU interposer was cracked in half by a failed delid attempt.
- On page 21 of the SYSCON thread @nyislander had a A0213013. Confirmed lack of voltage on the DC/DC converters downstream of IC6301, which includes +1.2V_YC_RC_VDDIO.
Evidence (weak): Anecdotal and Associative, with many confounding variables.
- @Bbowes9 on page 82 of the SYSCON thread had an A0313032 caused by knocking R5167 during a failed delid attempt. This console was working before the delid attempt. R5167 is +1.2V_YC_RC_VDDIO refrence voltage for the CPU's Redwood FlexIO ADC differential reference clock pair (BE_RC_REFCLK_P). The voltage was not knocked out altogether, it was selectively knocked out on a specific reference clock after IC5004. He replaced the resistor and got A0402101 / A0403034 because RSX TX1 was shorted to ground by a nicked trace during the RSX delid (incredible luck). TX is the transmit line, so the CPU will note the error because it sees the short (BitTraining BE:RRAC:BX0:BX:FLEXIO_ID). He messed with the nick and the error changed to A0313031. This shows that issues with the BE side of the clock generators +1.2V_YC_RC_VDDIO reference voltage do not cause A0202120 and register in step number 31 (when the SYSCON check clocks). So step numbers 20/21 passes their checks (Initialize CPU, RSX, & AV backend). The voltage was good up to at least the DC/DC converters.
- @feng_ye on page 103 of the SYSCON thread had a GLOD A0802120 in which the HDMI transmitter was not being setup correctly. This was after probing an fixing numerous blown fuses. So we can be sure they were good at this point in the repair. He pressed on the corner of the RSX above VDDIO (He may have been referring to +1.5V_RSX_VDDIO instead of +1.2V_YC_RC_VDDIO. I'm still not sure how or if the two are related.) HDMI transmitter reset correctly, the 2120 disappeared, console booted. This confirmed a BGA defect affecting those balls can cause 2120 errors and a "Special" GLOD.
- @MicrowaveEgg on page 90 of the SYSCON posted an Errorlog showing a history of A0801001 leading up to a A0403034. Then it started giving A0231002/A0902120. GLOD after a BGA defect. Strange, but it's been known to happen, depending on how the ball reconnect thermomechanically from whatever repair attempt was made. He said, "PS3 was never opened." So there wasn't a reflow/reball. Later said he "recapped" and got a bunch of A0202120's, followed by one A0213013. And a bunch of A0231002's. My guess is there is both a BGA defect and potential fuse issue. The timeline of event is suspect too, IDK how trustworth his recollection of the errorlog and/or method of dumping it is. I marked this one in the BGA/Bumps category, but it's not as straightforward as other consoles.
- F6302 and downstream SMD's are involved in the formation of 1.2V_YC_RC_VDDIO. Also, BGA/Bump defects can affect both the GPU/CPU FlexIO pads for YC_RC_VDDIO. We see lots of 2120 error related to BGA/Bumps defects that are resolved by RSX reballs.
- @squeept posted at or around page 20 of the SYSCON thread about a console with A0A02031 A0202120/A0213013. It was not sealed. Flux everywhere, severe warp, likely heatgun. Shorts present near encoder chip and a bad choke. Did not attempt to fix.
2120/3013 errors are possible in BGA/bump defects and fuse/SMDs. Distinguishing them from one another comes down to the step number at which they occur.
- 80/90 = BGA/Bump defects and possibly voltage ripple/noise.
- 20/21 = Fuses/SMDs (possibly a BGA, but less likly).
- 00 = fuse.
3010
6x consoles
- @Barteg, pg207, Tokin thread. A0203010 errors lead up to 3034/4412 (RSX RRAC BitTraining error = +1.2V_RSX_VDDR).
- @marciolsf, pg1 of "fun with syscon" thread. A0203010 while live probing pins 19-21 of IC6103. That's the CPU Buck Controller Gate pins, which send PWM to coordinate 3 Buck converters (3-phase). Bridging these pins would cause the timing to fail and the Voltage feedback error compensation would freak out causing no PWR Good. Not sure if the SYSCON can tell if this was due to the Buck converters or not.
- @hrist, pg77, Syscon thread. A0313031 & A0902120 after performing the "Eraser mod" (placing pressure on Underside hole of CPU to increase contact pressure with HS). Step# 31 = CPU initialization. Reflowed to errors A0202120/A0213010. Suspect balls didn't actually flow.
- @Bosstom, Pg53, SYSCON thread. Had A0801002 errors fixed by replacing Side B tokins (both CPU/RSX). Console failed again 6 months later. He replaced Side A this time, which led to A0203010.
1802
This post is a particularly useful one to read.
7x consoles
Step# 20 is when the RSX is first Initialized. So if it's not responding then it's borked! 1802 is the error the SYSCON will return when there is no RSX installed at all! @squeept showed that by testing a console without one.
However, it's not that simple. The specifics I don't understand. The DC/DC regulation that supplies power to the RSX are checked during step numbers 00-10. So there isn't anything wrong with them. Somewhere after the DC/DC converters and whatever voltage reference/signal the SYSCON monitors is where the 20 1802 can happen.
It's still unclear if the issue is on the RSX itself, or the motherboard. Various user have assumed the issue was a dead RSX, but @vyktormvmpay25 results suggests the MB may also be to blame. No one specifically said they repaired an 1802 by replacing the RSX.
What we need are Ohm tests of all the RSX voltages and the voltage reading upon bringup. This would narrow the fault down to the voltage line involved. Then that line needs to checked thoroughly for shorts, blown fuses, ripple voltage/noise. If all is good, the RSX should be replaced with a known good one (ohm tested all 7 voltages). And the old one should be cleaned and ohm tested to figure out which voltage was bad. That would confirm if this error is a dead RSX and which voltage causes it.
7x consoles
- @chiefhunnablunts on page 14 of SYSCON thread had A0201802 / A0A02031. There was an RSX thermal monitor error immediately after SYSCON Reset (Step# A0) followed by RSX Initialization error when the RSX is first initialized in POST. The RSX isn't responding at all...period...nada! And that thermal error is bad news too. He concluded a dead RSX moved on. I would liked to have seen him try replacing the thermal monitor and probe around the SYSCON reset circuit.
- @squeept, pg23, Previously reworked. "EVERY chip heatgunned, warped. Current error was A0233020, "Previous errors were A0801301, A0902120. Absolutely covered in flux. Can't even see if there is any delamination from heatgun. No attempt made. Diagnosis = Heat gunned to death! Later he removed the RSX to see what would happen, "I could smell some magic smoke pretty much immediately, but I didn't see any fireworks. Then I still got a 2 second YLOD. I got errors A0A02031 and A0201802 at once. I ohm tested across the TOKIN after and got a dead short." @Byteman said, "Turning on PS3 without RSX will definitely blow your TOKINs. They will be shorted." I agree with this, it's similar to what happens when you don't use bridge wires - VRM tries to supply max voltage (3.3v) and it overloads the tokins 2.5v rating (which act as a fuse in case of an RSX short or open line scenario).
- @moptop219, pg218 (tokin thread). Long string of A0802203's followed by a long string of A0801802's. The latest error was A0403034/A0404412. So a BGA/Bump defect is currently hiding another issue. Or the defect was affecting whichever pads cause 2203 and 1802 errors. Then it progressed to causing the traditional 3034/4xxx. IDK.
- @squeept, pg 20. Sealed. Had A0403034 / A0404422, but previous errors in log showed A0801601, A0801701, A0801802, A0801004, A08014FF. Reballed to A0801701. Replaced CXM4024R no change. Reballed CELL yielded A0801001, A08014FF. Diagnosed RSX – bump / die failure.
- @CodeKiller, pg2. Had A0403034 / A0404432. Reflowed to a working state. Previously attempted replacing tokins, which didn't help. YLOD returned 2 months later, despite aggressive fan curves, with 801701, 801601, 801802 because PWR was on at the time the BGA/Bump failed, then 403034/404402 thereafter[pg13].
- @vyktormvmpay25, pg75. A0611802 "slim dropped on floor unit while it was working. rsx was missing pads. clean errors, same 1802 error only after adding new rsx." 1802, 14ff and 1701. He swapped the RSX to another motherboard for the customer. This one suggests that 1802 errors can be tied to the MB not just the RSX. But victor's wording/translation is a bit confusing. So maybe I'm misunderstanding him.
- @Kibillcat, pg75. CELL BE die was chipped. GLOD, Be PLL unlock. "I would constantly receive A0611802 and A0801301 errors. Error 1301 would happen more often. But on one point I received HDMI error." He managed to record the bringup of this 1802 error...
Code:bringup [SSM] state: 0000 -> 0101 Bringup Mode #0 (0xFF) [SSM] ssmCb_OnStartingBePowOn() called. [SSM] Bringup mode : syspm_stat=00000000/00000000 [POWSEQ] PowerSeq_Setup called. [SSM] state: 0101 -> 0201 [POWSEQ] AV Backend Setup [SSM] state: 0201 -> 0102 [SSM] state: 0102 -> 0202 [SSM] state: 0202 -> 0103 [SSM] state: 0103 -> 0203 [SSM] ssmCb_BeforeBeOn() called. [SSM] state: 0203 -> 0104 Psbd_SbTransMode_Half:0x20e2 >$ [SSM] state: 0104 -> 0204 [SSM] state: 0204 -> 0105 [SSM] nonfatalreq delayed. [SSM] state: 0105 -> 0400 (PowerOn State) [SSM] RSX Interrupt : Detected ! RSX SY_IES register (0x0008) = 0x4000000 [SSM] state: 0400 -> 0700 [POWSEQ] AV Backend Letup [SSM] ssmCb_AfterBeOn() called. [SSM] Shutdown mode : syspm_stat=00000000/00000000 [ERROR]: 0xa0611802 [POWSEQ] PowerSeq_Letup called. [SSM] state: 0700 -> 0600 (PowerOff State) (Fatal)
Step# 20 is when the RSX is first Initialized. So if it's not responding then it's borked! 1802 is the error the SYSCON will return when there is no RSX installed at all! @squeept showed that by testing a console without one.
However, it's not that simple. The specifics I don't understand. The DC/DC regulation that supplies power to the RSX are checked during step numbers 00-10. So there isn't anything wrong with them. Somewhere after the DC/DC converters and whatever voltage reference/signal the SYSCON monitors is where the 20 1802 can happen.
It's still unclear if the issue is on the RSX itself, or the motherboard. Various user have assumed the issue was a dead RSX, but @vyktormvmpay25 results suggests the MB may also be to blame. No one specifically said they repaired an 1802 by replacing the RSX.
What we need are Ohm tests of all the RSX voltages and the voltage reading upon bringup. This would narrow the fault down to the voltage line involved. Then that line needs to checked thoroughly for shorts, blown fuses, ripple voltage/noise. If all is good, the RSX should be replaced with a known good one (ohm tested all 7 voltages). And the old one should be cleaned and ohm tested to figure out which voltage was bad. That would confirm if this error is a dead RSX and which voltage causes it.
1701 / 1601 or 14FF
1701
1601
14FF
It's similar to the LiveLock situation in 1601 errors but not the same. It's not suprising it can also occur at the time of a BGA defect.
We've seen this error a lot. The common story between them was the console was on at the time the YLOD occurred. All subsequent attempt to start the console resulted in a GLOD with subsequent 1601/1701 errors, or a YLOD within 2 seconds. SYSCON errors usually show one A0801601/A0801701 occurring at the same timestamp, followed thereafter by 3034/4xxx errors for all subsequent attempts to PWR it on. Or it'll GLOD and throw more 1701/1601's. I think this means there is a precarious BGA defect teetering on the edge of breaking and it'll soon switch to 3034/4xxx like the others. But that's a guess.
Complicating the issue is the fact that sometimes people will get a 1301, 1401, 14FF, or 1802 also. I'm not sure what to make of it. Perhaps it just has to do with where the BGA is failing and it's involving those sub systems briefly before it too fully breaks...IDK.
Here's a few examples:
ATTENTION is an active-high output flag sent by the CPU to the SYSCON. During initialization & configuration it is used to request an operation by the SYSCON. When ATTN goes High the syscon reads the SPI Status Register to determine the cause of the Attention signal. It remains high until software resets the condition that caused it.
Attention is used during Power On Reset (POR)...
The configuration ring is a series of bits that must be loaded into the CPU over SPI during the POR sequence. These bits configure the Cell BE processor before starting. They should only be set during POR. If they are changed after starting, it can result in faulty operation. The hardware must reset before the CPU's config-ring can be used again. After the POR the attention signal is driven low and is supposed to stay there! If there is a Checkstop error (14FF), Livelock Detection (1601), or PLL Unlock (1301) the CPU enters a fault condition and raises the Attention signal (1701). One common way this happens is when a solder connection breaks while the system is on.
The following is taken from IBM's Hardware Initialization Guide, CMOS SOI 65 nm Cell Broadband Engine.
The guide goes DEEP into the details of how the CELL BE Processore operates and is a wealth of knowlege, but it's quite advanced. If you can wrap your head around it though, it does offer a debugging guide in case of a 1701 error.
Attention is used during Power On Reset (POR)...
- To load CPU VID voltage from the VRM internal registers.
- To Write configuration-ring data (Important CPU Config settings that should only be modified at boot, otherwise errors can occur).
- To calibrate the FlexIO interface (BitTraining).
- Unresolved Checkstop errors (14FF)
- Livelock Detection (1601)
- PLL Unlock Condition (1301)
- BGA/Bump Defect that occurs while the Console was On (Step# 80). Subsequent attempts to power on the console would result in 3034/4xxx errors.
- @poopskoop on pg27 of this thread had a console with a BGA/Bump defect (A0403034). He heatgunned it to a 3-5 second YLOD. It would error with the following log…
Code:Attention BE : Detected ! [SSM] BE Attention signal is detected !! [SSM] state: 0400 -> 0700 [POWSEQ] AV Backend Letup [SSM] ssmCb_AfterBeOn() called. [SSM] Shutdown mode : syspm_stat=00000000/00000000 [ERROR]: 0xa0801701. - Note, IBM's Hardware initialization guide says that "When the Cell BE processor drives the ATTENTION signal active, the reason for the attention is stored in the rd_spi_status register that is accessible from the SPI interface[...]If the rd_spi_status register reads as all zeros or all ones, then the data is not being correctly." I think that's what syspm_stat=00000000/00000000 is referring to.
- He Posted pic of his RSX Ohm tests. VDDQ might have been a little low. I'm not sure if VRAM was on it's way out or not, or if that would cause this error. BAD RSX VRAM is known to cause a GLOD. And he did describe the console as a GLOD, even though he said it would shut off after 3-5 seconds. Regardless, that's long enough for the POR sequence to have completed. Attention should be driven low after BitTraining and stay there!
The configuration ring is a series of bits that must be loaded into the CPU over SPI during the POR sequence. These bits configure the Cell BE processor before starting. They should only be set during POR. If they are changed after starting, it can result in faulty operation. The hardware must reset before the CPU's config-ring can be used again. After the POR the attention signal is driven low and is supposed to stay there! If there is a Checkstop error (14FF), Livelock Detection (1601), or PLL Unlock (1301) the CPU enters a fault condition and raises the Attention signal (1701). One common way this happens is when a solder connection breaks while the system is on.
The following is taken from IBM's Hardware Initialization Guide, CMOS SOI 65 nm Cell Broadband Engine.
The guide goes DEEP into the details of how the CELL BE Processore operates and is a wealth of knowlege, but it's quite advanced. If you can wrap your head around it though, it does offer a debugging guide in case of a 1701 error.
IBM said:To summarize, check the following conditions:
- That the rd_spi_status register matches the expected value.
- That VDD and VCS are adjusted to the correct voltage indicated by the VID according to the Cell Broadband Engine Datasheet.
- That the configuration ring is loaded with the partial good information from the rd_partial_good register.
- That the following configuration-ring information is correct
- The SPI address is correct.
- The SPI simple write sequence is correct.
- A '1' start bit prefixes the data.
- The configuration data length matches the specified length for this Cell BE processor version
1601
- CELL CPU is deadlocked and cannot proceed. Some "kind" of error occurred, preventing the process from completing. Basically this means the console froze and rebooted. It's the PS3 equivalent of the Blue Screen of Death (BSOD) you may already be familiar with. In the PS3 this is often preceded by graphical artifacting.
14FF
(Source)IBM said:A checkstop occurs when - usually a processor but sometimes a cache, memory, or I/O bus controller - determines that something is in an "impossible" state. An error occurs that cannot be isolated to a particular bus transfer in progress, or a processor detects no progress being made...Checkstops are inherently hardware phenomena. They do not necessarily indicate a solid failure of a component, so diagnostics will rarely determine that a problem exists.
It's similar to the LiveLock situation in 1601 errors but not the same. It's not suprising it can also occur at the time of a BGA defect.
We've seen this error a lot. The common story between them was the console was on at the time the YLOD occurred. All subsequent attempt to start the console resulted in a GLOD with subsequent 1601/1701 errors, or a YLOD within 2 seconds. SYSCON errors usually show one A0801601/A0801701 occurring at the same timestamp, followed thereafter by 3034/4xxx errors for all subsequent attempts to PWR it on. Or it'll GLOD and throw more 1701/1601's. I think this means there is a precarious BGA defect teetering on the edge of breaking and it'll soon switch to 3034/4xxx like the others. But that's a guess.
Complicating the issue is the fact that sometimes people will get a 1301, 1401, 14FF, or 1802 also. I'm not sure what to make of it. Perhaps it just has to do with where the BGA is failing and it's involving those sub systems briefly before it too fully breaks...IDK.
Here's a few examples:
- @leral, pg79. Atempted a reflow, but it froze in XMB. Afterwards it either GLOD or YLOD with 1701/1601.
- @Shawn Shakir, pg218 (Tokin thread). Oldest errors are 2x A0802022's and one A0801001. There was a 1701/1601 followed by a bunch of 3034's.
- @squeept, pg 21. RSX reflowed previously. Beat up, filthy. Previous errors were A0801701, A0801601, A0801001. Noticed delamination on GPU when cleaning after reball. Not sure if present before due to mess. A0403034 remained after reball. Diagnosis = Heat gunned to death!
- @squeept, pg23. Previously reworked. "Whole board heatgunned." Previous errors were A0801601, A0801701, A08014FF. "Discovered extensive delamination after ultrasonic bath cleaned the globs of flux off, stopped work." About the 14FF...
- @andrewscott87, pg103. GLOD, video reset worked. Then it started artifacting and froze. Then back to GLOD. He can repeat that cycle. Sometimes there was a A0801001 in the log. Probably from fliping PWR off. 1701/14FF always come in a pair (occur at the same timestamp). He did not have any 1601 errors. Instead he has 14FF = CheckStop error.
I will edit this post to make improvements to the formatting and grammar. I will likely add new information about the error combos as I learn more. There's just so much to unpack for me to get it all in on the first go. So this is just the start.
Last edited:
