PS3 Frankenstein PHAT PS3: CECHA with 40nm RSX

marciolsf · Jun 28, 2022

I was reading the write up on the new lv0ldr exploit (https://github.com/MikeM64/Exploit-Writeups/tree/main/PS3/lv0ldr-spi-mitm), out of curiosity more than anything, when I ran into this paragraph

As an aside with the SPI bus disconnected, the boot sequence of the PS3 can be stepped through by controlling when syscon gets interrupt requests from Cell. You can take as long as you want to boot in this manner.

This could be really useful to understand the boot sequence better...

RIP-Felix · Jun 30, 2022

How does that work tho? Like I could stop the SYSCON after each step number? Or each SSM state? How would I control this? And what diagnostics can I perform to see what exactly the SYSCON did during each step?

I mean, I could probe voltages with a multimeter, but what logs can I access to glean specifics.

RIP-Felix · Jun 30, 2022

Teensy was placed in peripheral mode to silently sniff the SPI bus during boot up. Once the appropriate packet header was seen the Teensy disconnected the SPI bus from Syscon and became a controller itself. It then writes the crafted NVS read response packet to prime the LS with both the controlled return address and the stage 1 shellcode.

I wonder if that's exactly how the ORBIS modchip works. The only difference being the controller used and packet (RSX_ID) sent.

RIP-Felix · Jun 30, 2022

Just finished reading the exploit. I think this and ensuing knowlege gleaned from the cell configuration ring will be game changing. I wonder if remarrying the cell will be soon possable. Or even replacing with a newer one, if a sutable PCB adapter can be made. Neat read, even if most of it was over my head.

marciolsf · Jun 30, 2022

My first thought was that it would be helpful along the lines of that series of scope reads you did a while ago, where you identified which parts of boot process mapped to different power states, or when you and booter were running sabotage tests.

A slightly fancier idea would be to change FlexIO calibration values on the fly via software, instead of hardware (if they're there at all -- maybe it's all done via hardware? I kinda don't think so, though).

DeadEnd · Jun 30, 2022

Alright so inbefore new stuff is discovered I've released another crazy video... Maybe it's too much/too long, but at least it's something. It's got chapters to jump into the part you like.

M4j0r · Jul 1, 2022

RIP-Felix said:
I wonder if remarrying the cell will be soon possable.

The exploit doesn't circumvent the power on reset secure boot sequence, it starts after lv0ldr has been authenticated.
Early prototype Syscon firmwares mention a flag "force non secure boot" but there's no code related to that. Maybe it's part of the config ring or part of some earlier communication (like the clock settings which aren't documented in the HIG).

RIP-Felix · Jul 1, 2022

M4j0r said:
The exploit doesn't circumvent the power on reset secure boot sequence, it starts after lv0ldr has been authenticated.
Early prototype Syscon firmwares mention a flag "force non secure boot" but there's no code related to that. Maybe it's part of the config ring or part of some earlier communication (like the clock settings which aren't documented in the HIG).

The interrupts he was talking about allows us to pause the SYSCON steps. I assumed he meant the Power On Sequence, which includes POR. I was curious if that could be a tool to allow use to take control of the HW before the secure boot is setup. And use the the same method of SPI sniffing to intercept the command to enable it. IDK if there are checks later that require it to be enabled, but could they not be intercepted too?

I wish I understood this all better, so I didn't come off as an ignoramous.

squeept · Jul 1, 2022

Had a 65nm CECHA01 swap returned under warranty for YLOD. Will tear down and report someday when I have the time, just wanted to share for now.

M4j0r · Jul 1, 2022

RIP-Felix said:
I was curious if that could be a tool to allow use to take control of the HW before the secure boot is setup.

Yes, you can do that both from the SPI or the JTAG port, but that depends on the fuse settings inside the pervasive logic. The chips need to be "unlocked" in order to debug (/step through) the actual secure part but not even the JTAG port is enabled.

RIP-Felix · Jul 1, 2022

squeept said:
Had a 65nm CECHA01 swap returned under warranty for YLOD. Will tear down and report someday when I have the time, just wanted to share for now.

Hmm... that is interesting. Let's hope it's just the tokins, not the underfill on 65nm RSXs, which does fall under the bumpgate era of Nvidia chip sets.

One assumption I've been making is that the underfill used on the 65nm is the same defective underfill used on the 90nm RSX - that Nvidia hadn't fixed it before the 65nm had already been manufactured for SONY PS3's.

@Computer Booter, @Workz_777, @PostalDude__, @Pacorretaco and I have had an ongoing debate about how much more reliable the bumps on 65nm RSX's are. I say that because they produce less heat, and a smaller Delta T for each mini-cycle, they should last longer than the 90nm, but not as long as the 40nm (which are unaffected by defective underfill). @Pacorretaco points out that the fan curves for those models has been adjusted such that the 65nm runs as hot or hotter than the 90nm, so if it were only about heat that would unfound my argument. I contest that maximum temp and Delta T are not the same thing and the Mini-Cycles make up the difference.

Currently there is no app to auto-log/graph temps (and/or output to CSV for later analysis). I think this could be added to webMAN (pretty easily). It might even be a good way to get started to learn coding. But for now, we don't have great Data to back up either hypothesis. Just anecdotal reports of "Phats" not lasting as long as slims. Who knows which Phats "they" are referring to. All, including the 90nm launch models, or just the later models with 65nm. There is one model slim with the 65nm too, so who knows how this has skewed the reports. It's more of a, 'I have seen lots of launch models with YLOD, less later model phats, and even fewer slims'...kind of thing.

Basically, It would be nice to get real data from a solid source. Someone that has paperwork on the console's OG condition, like becount before leaving the shop and after re-lapsing. If it was a 3034, then the time between gives us a real number we can use to evaluate how long a reball lasts. Eventually, we might see a trend where 90nm reballs fail, say in 2 years on average, 65nm fail in 3 years, and 40nm haven't yet. That would be much more useful a measure.

DeadEnd · Jul 1, 2022

RIP-Felix said:
One assumption I've been making is that the underfill used on the 65nm is the same defective underfill used on the 90nm RSX - that Nvidia hadn't fixed it before the 65nm had already been manufactured for SONY PS3's.
.

I think we'd need more data for this theory. Did he say where he pulled the chip from ? How much use it had and how many reflow cycles it went through?

RIP-Felix · Jul 1, 2022

M4j0r said:
Yes, you can do that both from the SPI or the JTAG port, but that depends on the fuse settings inside the pervasive logic. The chips need to be "unlocked" in order to debug (/step through) the actual secure part but not even the JTAG port is enabled.

Are you referring to POR Phase 1, where BE_ATTENTION is first driven active and then the PLL loads data from internal fuses? Do you mean the CELL_ID is one of those impossible to change fuse values? Not something that is written in the Config-Ring data?

Then what does @MikeM64 mean by this?

MikeM64 said:
As an aside with the SPI bus disconnected, the boot sequence of the PS3 can be stepped through by controlling when syscon gets interrupt requests from Cell. You can take as long as you want to boot in this manner...
...Code execution has now been achieved on lv0ldr during boot up. The time code execution occurs is before the config ring is read via the undocumented SPU channels. There is the possibility of toying with config ring settings once we understand how the config ring is sent to the PS3 Cell.

It sounds like the explot works during the initialization phases during POR (when the config-ring data is written). It is done over SPI. So while it's not currently known how to "toy" with the config-ring data, and see what it contains, it might be possible in the future. Not that there is anything particularly useful for remarrying a cell in there, IDK.

I guess that's my question. Is the Cell_ID hard coded as unchangeable fuses, or something we can set arbitrarily? If not in the cell itself, then in the SYSCON and NAND where they expect it to match? Perhaps this exploit isn't useful for that...IDK.

Forgive me if I don't understand this, I am trying to learn. It's just a steep curve. I only just started scratching the surface.

The reason I'm curious is because it sounded applicable to my project, attempting to reverse engineer Power On Sequence Testing that occurs before the Firmware Sequence. I'm coming at this from a more diagnostic approach, to repair consoles. Seeing if there is any more information the SYSCON errorlogs can tell us by analyzing "when" in the POS it occurred. If I can halt the SYSCON after each step number or SSM state, I can analyse the effect. Like a voltage was enabled, which enabled a clock generator. Then continue to the next step and see what else comes online. When it happens all at once I get lost and have to make guesses about the order.

Knowing what exactly is happening when, can give insight about what might have gone wrong. It seemed like perhaps this "Pause Button" might make the POS more accessible. That I might be able to use it to nail down my hypotheses about what's happening during each SSM state or Step Number (for example).

On a separate note, I would like to see what errorcode the SYSCON generated from this...

MikeM64 said:
When #SB_INT is asserted, this notifies Syscon that a packet is ready to be read from SPI space. If syscon isn't connected to the SPI bus, it'll get garbage data when it tries to read and panic. This means that in order for Syscon to not turn off the PS3 every time Cell writes a packet the interrupt lines must also be cut.

I'm curious if 1701 (BE_ATTENTION) caused by a checkstop error (14FF) or more likely a livelock detection (1601). Or perhaps a 3034/4xxx Data error. Too bad he didn't include the errorcode in that writeup. It's just a curiosity. We tend to see the 1701/1601 happen in BGA/Bump defects that cause a YLOD when the console was on and running code. And 3034/4xx during bittraining cuz it now cant make POST.

squeept · Jul 1, 2022

@DeadEnd @RIP-Felix TOKINs were preemptively swapped out. If you want to scour the syscon thread for my sheet on it, it has a start date of 3/24/22. Donor has 650 days uptime, and 4.0 ohms on VDDC. The COK-001 board was a virgin. I didn't record model numbers for the donor. I should be able to get to it next week some time for real details.

M4j0r · Jul 1, 2022

RIP-Felix said:
Are you referring to POR Phase 1, where BE_ATTENTION is first driven active and then the PLL loads data from internal fuses? Do you mean the CELL_ID is one of those impossible to change fuse values? Not something that is written in the Config-Ring data?

Yes, it's part of the fuses. The CID/eCID are (read-only) accessable through the spi registers in the pervasive logic (page 103 in the 90nm 1.5 HIG, x'0004' through x'000B').

It sounds like the explot works during the initialization phases during POR (when the config-ring data is written). It is done over SPI.

The exploit happens while lv0ldr is running, ~between "End of the POR sequence" and "System reset interrupt".

The reason I'm curious is because it sounded applicable to my project, attempting to reverse engineer Power On Sequence Testing that occurs before the Firmware Sequence. I'm coming at this from a more diagnostic approach, to repair consoles. Seeing if there is any more information the SYSCON errorlogs can tell us by analyzing "when" in the POS it occurred. If I can halt the SYSCON after each step number or SSM state, I can analyse the effect. Like a voltage was enabled, which enabled a clock generator. Then continue to the next step and see what else comes online. When it happens all at once I get lost and have to make guesses about the order.

Yes, you can do that.

RIP-Felix · Jul 1, 2022

DeadEnd said:
I think we'd need more data for this theory. Did he say where he pulled the chip from ? How much use it had and how many reflow cycles it went through?

I just went back to read the dates more closely.

Bumpgate affected chipsets primarily from 2006-2008. The RSX is based on the G70/G71 architecture used in Nvidia 7800GTX GPUs. Almost all GeForce 7000 series are affected, meaning the earliest RSX's (90nm), PS3 models with SKU release dates between 8/2006 - 10/2007, are definitely Affected. NVidia started producing "fixed" revisions for "some" affected chips in the summer of 2008. The 65nm RSX went into PS3 models with a SKU release from 8/2008 - 9/2009. This is a much closer call. These SKU dates are for the PS3 model release, not date of manufacture for the particular console, nor the RSX that went into it. The RSX manufacture date was certainly earlier.

So the question is, when did Nvidia actually manufacture the 65nm RSX's that went into the different PS3 models? Did they produce them all at once or revise the 65nm RSX mid run, producing "fixed" versions of the 65nm RSX for later phat models or the 20xx slim?

To answer this question the dates are important, because there were 2 models of 65nm RSX. The question is why? Why would Nvidia release another model revision 65nm RSX? And why would SONY release another MB SKU only 2 months after the DIA-002? In order to put this new RSX on it?

CXD2982 (Only found in DIA-002, SKU released 8/2008)
CXD2991 (Starting with VER-001 10/2008, ending with DYN-001 9/2009)

This article and this one too suggest that all 65nm and 55nm chipsets were affected.

Charlie Demerjian said:
The defective parts appear to make up the entire line-up of Nvidia parts on 65nm and 55nm processes, no exceptions. The question is not whether or not these parts are defective, it is simply the failure rates of each line, with field reports on specific parts hitting up to 40 per cent early life failures.

The first was published in late august 2008 and the second (with more details about the specific failuers) was published in September. The timeline fits with this theory. So that lends credence to the idea that the CXD2982 was defective. And the cat was out of the bag by september. The 65nm RSX was defective for sure and SONY would no doubt have been informed/working closely with Nvidia to rush a MB revision. The idea that perhaps the 2991 was a hasty fix still needs to be confirmed, but it makes sense there would be a new revision if they realized the manufacturing error back in august (after the DIA-002 had already hit store shelves). It is feasible 2 months might be enough time to get them into PS3's. So a September SKU makes sense.

It's not unusual for there to be RSX model revisions. The 40nm had 3 revisions, but there were SKU releases of around a year between. Not 2 months!

CXD5300 (Starting with SUR-001 9/2009, ending with JSD-001 7/2010) = 10 months
CXD5301 (Only on KTE-001 6/2011) = 15 months
CXD5302 (Only on MSX-001 9/2012) = Kinda wierd, they were released the same time as the 28nm? IDK why you would produce a new 40nm RSX at the Same time as 28nm RSX.

The 40nm from was released 3/2010 - at least 2014. So it wasn't affected. So 25xx model slims and later are all outside the Bumpgate window. The first revision 65nm RSX (CXD2982) was in PS3 MB SKU's at the tail end of Bumpgate (J & K models). They were probably produced by Nvidia during Bumpgate, with bad underfill. It's possible that they updated the design to the CDX2991 with good underfil and SONY released L-20xx models with good 65nm RSXs. However, it's also possable this was a half-fix, because some chipsets still had bad bump packaging materials, even though the underfill was more appropriate. These chips last longer, but are still prone to premature failue. So it's unknown if the 65nm RSX is still affected by this less severe defect.

Nvidia seemed to finally understand the issue clearly by 2010. While chipsets between late 2008 and 2010 may not be as severely effected by the Bumpgate fiasco, they may still be defective to a lesser extent. Meaning that the entire lineup of 65nm may be defective to some extent. We would need to have the underfil chemically tested and the Bumps examined under electron microscopy to see exactly how they were constructed. Only then would we be able to know the full extent to which the various RSX's were affected.

Again it all depends upon the timing of when Nvidia actually manufactured the RSX's that made it into each model PS3. The CXD2982 is almost certainly affected by the same manufacturing defects as the 90nm (albeit more reliable simply because it doesn't produce as much heat). The CXD2991 may have good underfil, but whether or not it's bumps and package materials were free from defects is still an open question. They were manufactured during a time when Nvidia was still struggling to figure out the full extent of the issue. And since they still didn't know how to make a reliable chip, can we trust that revision isn't defective as well, albeit to a lesser extent?

sandungas · Jul 2, 2022

RIP-Felix said:
Just finished reading the exploit. I think this and ensuing knowlege gleaned from the cell configuration ring will be game changing. I wonder if remarrying the cell will be soon possable. Or even replacing with a newer one, if a sutable PCB adapter can be made. Neat read, even if most of it was over my head.

For curiosity sake... i was talking about the different CELL pad layouts before, but today i made a comparison that is going to clarify some of the speculations

The point is... there are up to 4 CELL pad layouts, but 3 of them matches in the 41x41 number of pads in the peripheral (differs in the missing pads at the center "hole" though)... so we was wondering if soldering a different CELL revision in the pad layout of other CELL revision could affect ONLY the pads at the center
https://www.psdevwiki.com/ps3/CELL_BE#Alternative_listing
CELL 90nm (41x41)-(19x19)-84 = 1681-361-84 = 1236 pads layout
CELL 65nm (41x41)-(15x17)-84 = 1681-255-84 = 1342 pads layout
CELL 45nm (41x41)-(17x17)-84 = 1681-289-84 = 1308 pads layout
CELL 45nm (42x42)-(22x18)-9 = 1764-396-9 = 1359 pads layout

In wiki there is an image of the CELL 90nm pad layout, and the pinout, from the COK-001 or COK-002 manuals
https://www.psdevwiki.com/ps3/CXD2964GB

And i just started a page with the pinout of the CELL 65nm, from the SEM-001 manual, and i made this image following the same rotation for comparison purposes
https://www.psdevwiki.com/ps3/CXD2981GB

Long story short... i marked the CELL pads dedicated to syscon connections (8 in total, included the SPI channel) and the location doesnt matches
In other words... if we solder a 65nm CELL in the pad layout of a 90nm CELL this 8 pads dedicated to syscon are not going to be correctly connected, to have a beter understanding of how many pads has been moved it would be needed to continue doing this drawings painting more pads of the 65nm layout in colors to compare them with the 90nm layout, but anyway... by now it seems the only way would be by using some kind of intermediate "CELL adapter" board

RIP-Felix · Jul 2, 2022

sandungas said:
...but anyway... by now it seems the only way would be by using some kind of intermediate "CELL adapter" board

I was thinking a flex cable that would be soldered directly to the BGA, with appropriate pads where the new cell needs them moved to. And breakout pads routed the the edge for probing and diagnostics.

@vyktormvmpay25 and I have toyed with the idea a bit. It seems that many of the pads would not need to be completely moved, really only a few signals. So we might not need a full adapter, sandwiched. Perhaps only a small one on one side, or wherever the signal pads differ. This could be accomplished with a flex PCB that's very thin. Thin enough that it could sit between the MB/CELL and be soldered on top of during the reflow. The challenge is knowing exactly what the pinout is and identifying what need to be moved. Then there is going to be some clever designing that needs done. Cuz if we're talking about high speed differential pairs or data lines, there's impedance matching, serpentine routing to be considered. Not an insurmountable challenge.

This is all depends upon the pinout being mapped! Technically it would be the same with the 28nm RSX.

DeadEnd · Jul 2, 2022

sandungas said:
For curiosity sake... i was talking about the different CELL pad layouts before, but today i made a comparison that is going to clarify some of the speculations

The point is... there are up to 4 CELL pad layouts, but 3 of them matches in the 41x41 number of pads in the peripheral (differs in the missing pads at the center "hole" though)... so we was wondering if soldering a different CELL revision in the pad layout of other CELL revision could affect ONLY the pads at the center
https://www.psdevwiki.com/ps3/CELL_BE#Alternative_listing
CELL 90nm (41x41)-(19x19)-84 = 1681-361-84 = 1236 pads layout
CELL 65nm (41x41)-(15x17)-84 = 1681-255-84 = 1342 pads layout
CELL 45nm (41x41)-(17x17)-84 = 1681-289-84 = 1308 pads layout
CELL 45nm (42x42)-(22x18)-9 = 1764-396-9 = 1359 pads layout

In wiki there is an image of the CELL 90nm pad layout, and the pinout, from the COK-001 or COK-002 manuals
https://www.psdevwiki.com/ps3/CXD2964GB

And i just started a page with the pinout of the CELL 65nm, from the SEM-001 manual, and i made this image following the same rotation for comparison purposes
https://www.psdevwiki.com/ps3/CXD2981GB

Long story short... i marked the CELL pads dedicated to syscon connections (8 in total, included the SPI channel) and the location doesnt matches
In other words... if we solder a 65nm CELL in the pad layout of a 90nm CELL this 8 pads dedicated to syscon are not going to be correctly connected, to have a beter understanding of how many pads has been moved it would be needed to continue doing this drawings painting more pads of the 65nm layout in colors to compare them with the 90nm layout, but anyway... by now it seems the only way would be by using some kind of intermediate "CELL adapter" board

If you find all the pads that don't match and mark them, I'm sure Victor or anyone else could build the interposer. He's already made a PCB for RSX.

DeadEnd · Jul 2, 2022

RIP-Felix said:
I was thinking a flex cable that would be soldered directly to the BGA, with appropriate pads where the new cell needs them moved to. And breakout pads routed the the edge for probing and diagnostics.

@vyktormvmpay25 and I have toyed with the idea a bit. It seems that many of the pads would not need to be completely moved, really only a few signals. So we might not need a full adapter, sandwiched. Perhaps only a small one on one side, or wherever the signal pads differ. This could be accomplished with a flex PCB that's very thin. Thin enough that it could sit between the MB/CELL and be soldered on top of during the reflow. The challenge is knowing exactly what the pinout is and identifying what need to be moved. Then there is going to be some clever designing that needs done. Cuz if we're talking about high speed differential pairs or data lines, there's impedance matching, serpentine routing to be considered. Not an insurmountable challenge.

This is all depends upon the pinout being mapped! Technically it would be the same with the 28nm RSX.

So you are suddenly interested. I thought you all were against the idea lol

PS3 Frankenstein PHAT PS3: CECHA with 40nm RSX

marciolsf

Member

RIP-Felix

Senior Member

RIP-Felix

Senior Member

RIP-Felix

Senior Member

marciolsf

Member

DeadEnd

Senior Member

M4j0r

RIP-Felix

Senior Member

squeept

Senior Member

M4j0r

RIP-Felix

Senior Member

DeadEnd

Senior Member

RIP-Felix

Senior Member

squeept

Senior Member

M4j0r

RIP-Felix

Senior Member

sandungas

RIP-Felix

Senior Member

DeadEnd

Senior Member

DeadEnd

Senior Member

Similar threads