PS3 Fault finding YLOD with the SYSCON - First steps and Error reporting

What resistance do you see on cpu. Usually for me is simple to reball cpu. Desolder and see resistance when it cool outside board. 3 ohms is always fine.

The spreadsheet is in my signature now. CELL is 3.0 while on the board, I guess I can yank it off in a bit and check, but everything suggests I'm wasting my time.

Entered start data from another CECHA01 in the sheet just now. I'm already guessing this is another dead RSX, but I'm gonna change out CXM4024R in a little bit then dry for reball tomorrow.
 
I believe when board is heated to reball rsx another connection under cpu was lost. Side between them. Your choice not necessarily to do it. I know they are a pain.
Now I have seen your list with tests.
I will probably make video of dia001 this week. Been on table for test with diag and all I see that A0213013. I will reball both. Untouched board.
 
Last edited:
Holy shit guys, syscon rules.

5 second YLOD with A0801701 after repeated A0802022 was indeed CXM4024R. Fired right up after I swapped it out. I'll try to give it a little mini stress test before I reball the RSX (I'm moving all my warranties to 1 year, and I won't trust a 90nm system to last that long without a leaded RSX) so that just in case I kill it during rework, we still got pretty good verification of the results.

I'm putting the CECHE01 back in the queue to reball the CELL for science, might even change out that same chip just cause screw it. Definitely want the results of that now even if it's still dead.
 
Last edited:
Ya know.... I might start not reballing the RSX by default now....

This is eye opening data when there is a complete analysis. The system will be fully warrantied no matter what, so I'll just let the spreadsheet tell me if I need to start reballing every RSX again.
 
Last edited:
So some GLODs can actually be caused by a faulty CXM4024R... It's an audio/video driver? But what can cause it to fail?
 
So I was preparing the COK-002 board for reballing and left it in the oven to get rid of moisture at around 100 degrees for a bit over two hours. Out of curiosity I decided to test if the machine would start after that, and it did. It had errors 4432 and 3034 before. The RSX has 2.4 ohm before and after. So if the solder balls are lead free, how could 100 degrees have possibly fixed them? Or was 100 degrees enough to slightly bend the board and reconnect the solder ? Could it be that in this case the problem is caused by nec tokins?
 
To be honest this forum needs one thread with reball ideas and suggestions as this will help many people to understand more about bga ic's. . I will open after new year one with videos where I can share my ideas and work.
There was some forums years ago but they disappeared or at least they were intended to sell an specific machine I think.
As I have understand on college an group of RLC is an filter/limitation for courent/voltage. That induction between 2 caps which is inside necs is there with an reason.
Some bga will work if they are in good condition, some not if they cross that limits by filter (necs).
 
Last edited:
@DeadEnd This one was 5 second YLOD. No idea what caused it, active components just like to explode sometimes.

And yes, it sounds like the board / package warped a little and it moved a ball / bump a little. When I had a verified set of bad TOKIN, I heated them while staring at the oscilloscope and there was no change. Same with working sets. One experiment is surely not conclusive, but at best heating gives no actionable information since it is impossible to isolate those areas from each other.

Remember, cracks in the BGA or bumps are MICRONS in size. Just a few degrees of heating or cooling can cause that kind of shift. Just as important, if the crack is formed but still making mechanical contact, the amount of pressure can muck with the resistance of the joint. This is especially important when talking about data transmission in the GHz range.

@vyktormvmpay25 Couldn't say. I'm unfamiliar with troubleshooting that chip / area of the board, so I just skipped straight to changing it out.
 
I know the famous panasonic chip on SS can cause GLOD/blackouts, so it wouldn't be a surprise that something similar happens with that IC. But would be nice to know if that glod was stable, or the console turns off after a minute or so.

Also, I know this is pretty obvious, but did you test the secure mode secuense on that E, @squeept?

When you damage a trace on CELL you get a nice glod, in the best cases, and PS3 is dead, no beeps at all when trying to go in secure mode. Wouldn't be this the same case if the CELL would be dead?
 
Hehe, our friend @squeept fixing ps3s in full gear, people are having trouble keeping track. The spreadsheet already being jam packed with info. Great job

I think he meant the GLOD is still GLOD for now, queued up for CELL reball, mostly for science. And if I'm reading the spreadsheet correctly, "beeps:yes" means when you hold down the power button it beeps the normal secuences.
The other random IC fault was yet another console, A model with longish YLOD.
So it's different consoles we're talking about at once.

Btw what do you mean exactly with the "fan ramps up" in the spreadsheet?
You mean if you leave the GLOD on for a while, the fan changes speeds accordingly as heat rises?
Or simply that the fan is spinning.

Later models introduced a "fan test mode". I just happen to have gotten today a slim 2000 with ~1s YLOD. Fan test pass, so PSU ruled out i guess.
Sadly SYSCON is only for K models and older, right?
 
I think he meant the GLOD is still GLOD for now, queued up for CELL reball, mostly for science. And if I'm reading the spreadsheet correctly, "beeps:yes" means when you hold down the power button it beeps the normal secuences.
The other random IC fault was yet another console, A model with longish YLOD.
So it's different consoles we're talking about at once.

Btw what do you mean exactly with the "fan ramps up" in the spreadsheet?
You mean if you leave the GLOD on for a while, the fan changes speeds accordingly as heat rises?
Or simply that the fan is spinning.

Yes. Yes. And yes, just tracking that the fan spins up at all, though there is room to note if it goes straight to jet engine crazy fast mode.

I'm cleaning and re-assembling the {5 second YLOD A0801701 A0802022 CECHA01 fixed by replacing CXM4024R} system right now to put in to stress testing. Once I have that done, my clamp will be free so I can get back to work on the CECHE01 with GLOD. I'll start with CXM4024R there, too, just to see. Then I'll reball CELL for the F%$# of it.

I'll update the spreadsheet by this evening.
 
As I have understand on college an group of RLC is an filter/limitation for courent/voltage. That induction between 2 caps which is inside necs is there with an reason.
Yeah, your instincts are right. This is a second stage RLC filter.
The inductors are an array of 0.33uH coming off the DC-DC switching voltage regulators, 1 for each. They're in parallel with each other, 0.001 Ohm resistors, and the tokins. The tokens do have some internal resistance, but I don't think it's significant. Anyway, the purpose of the RLC is to tune the filter to the frequency and bandwidth of the switching noise generated by the voltage regulators. If left unchecked it would reap consequences for the processors and memory, as the ripple and noise would steadily kill them (along with artifacting, random freezing, & related shenanigans).

Anyway, an elaborate second stage RLC filter like this very efficiently removes switching noise - does so without much resistance. So the processors are free to chug current as fast as they like without burning the filter up. That said it does have a 105C temp limit for the components rated lifespan, but the cooling solution should keep that in check, if the TIC is good and console isn't full of dust - both of which we know happen in the real world. So they can and do fail, specifically the NEC/TOKINs. But does this failure occur before a BGA defect? That depend on how well designed the board and cooling solution are, the manufacturing process of the processors, thermal cycles, delta T, and much more. So who knows?!

Back on topic. Second stage RLC filters are an iterative process (trial and error), because parasitic inductance, capacitance, and resistance in the board and components cannot be predicted accurately (real world vs. theory). This is why 4800uF capacitance is actually an important factor not to significantly increase/decrease. The general advice of adding capacitance being fine, is false in this case. It would shift the resonant frequency the filter and make it less effective - detune it. Only a proadlizer has as good of a frequency response curve as the tokins had. So the best replacement for a token is a new proadlizer. Tantalum caps are the next best option and they should work fine, so long as you match the capacitance and ESR of the array they're replacing.

I spent all summer studying this circuit, so I feel confident this is correct. However, I have not been able to prove it, since all my consoles have had BGA defects and I haven't had a successful reball yet. I keep botching it. I did measure the noise with an oscilloscope and my tantalum array outperforms the worn tokins on the board I compared them to (in theory), but I can't speak to reliability or stress testing because I need a working console and I'm not about to rip the tokins off a working console...lol!
 
I have managed to test on win7 32 bits with python 3.4/2.7 this debbug. Now struggling with win7 64bits with python 3.4 won't auth/AUTH give some different errors in lines of serial. I will see if I can get them working. I know in Windows they are quite buggy but should work. Did anyone get this working on Windows 7 64 bits and would like to share info/python versions /serial configuration?
 
I have managed to test on win7 32 bits with python 3.4/2.7 this debbug. Now struggling with win7 64bits with python 3.4 won't auth/AUTH give some different errors in lines of serial. I will see if I can get them working. I know in Windows they are quite buggy but should work. Did anyone get this working on Windows 7 64 bits and would like to share info/python versions /serial configuration?

Make sure you are using Python 2.7.18 https://www.python.org/downloads/release/python-2718/

Here is what really worked for me:

I uninstalled every installation of python I had. Then I installed Python 2.7.18 fresh, checking the box to add it to the PATH during installation. Then I opend a CMD prompt and performed a "pip install pycryptodome", which installed sucessfully. It informed me of a pip update and game me a command to run it, which I did sucessfully. Them I performed the "pip install pyserial", which completed sucessfully. After that, I was finally able to use change the directory and run the script successfully. It greeted me with >$ instead of the previous error messages. So that was anticlimactic, but it's progress. I'll take it!
 
Last edited:
Win7 x64 here - same issues as above, same solution as above. You have to use 32 bit 2.7.18.

CECHE01 CELL reballed, errors went back to some of the previous ones. I'm calling it a dead GPU. Each time it gets heated, the errors swap around some. I'm gonna venture to guess that any time the errlog is filled with a dozen different random errors, you have a chipset failure. A failed BGA connection should only go back and forth between the same one or two errors? Just guessing.
 
@RIP-Felix, since you are knowledgeable on the electrical side of things, could you have a look here and perhaps explain what the two resistors in the mosfet driver circuit would cause ?
I have been following that thread and looked over your post. I'm a hobbyist, not an EE. I've taken physics, so we studied a little bit of the EE that overlaps, but it's not much. I started watching EE videos in sequence and promptly decided it would be easier to lookup the relevant videos instead of laying the basis for a fundamental knowledge of every principal upon which components work, just the ones relevant to the circuit I was trying to understand. Seriously, they must be trying to scare would be EE's off! It was just a fun summer project while I had some down time...

Continued on the Frankenstein Fat PS3 thread...
 
Sorry, I feel like I'm spamming this thread, but I'm just so damn excited.

Got a CECHH01 in the books, and I'll be adding the preliminary analysis of another CECHA01 shortly. Working on making things a little clearer in there. I also know now that I need a section dedicated just to old syscon errors, which makes the empty square in the middle mighty convenient.

I can't even begin to describe how helpful having my little sheet printed out and kept with each board is proving to be. I'll work on getting the layout for that perfect and then I'll post it for anyone to use.
 

Similar threads

Back
Top