10
\$\begingroup\$

EDIT: VERY INTERESTING TEST RESULTS

I just tried inducing the thermal failure using an inverted "can of air" (likely difluoroethane). It even when its super cold, (might even be below -40C), it does not fail! Then when I breathe warm (wet) air on it, and it gets some condensation, it DOES fail! More testing to come, but if unless I'm missing something, a strong arrow toward some sort of moisture issue!?

--original question--

I've been working on a project using an AD7124-8 SPI ADC. I'm getting very strange behavior where the part works as expected on the bench, but when cold (below about 15C) it stops functioning, stops returning data, or returns invalid data.

I've tried quite a bit to get to root cause, with little success. On the first board revision I suspected it might have been poor power, and/or lack of proper decoupling. But the issue has persisted a re-design. Another theory was that manufacturing issues (too high reflow temp?) were causing the die to crack, and thermal stress was causing intermittent failures. The most recent boards were assembled with leaded solder paste and a hot air gun set to 280C. 280C is listed as the max reflow temp in the datasheet.

This issue occurred on 2 of 2 boards of an earlier revision, and now a third board of a new revision. On the earlier revision, I attempted reflow of the part several times to get it fixed, with no apparent effect.

I would appreciate other suggestions about what sort of issues to look at to get to the bottom of this, and am happy to provide any additional details that would help!

Most recent layout:close-up of board layout

Schematic and Layout (PDF), ADC detail:

Detail taken from schematic

Hardware Repo

Software Repo

\$\endgroup\$
11
  • 1
    \$\begingroup\$ Please check the suppy voltages at or close to the pins using a scope. Anything strange there when cold? Also, I've sometimes had issues with reference voltages or locally decoupled voltages (REFIN1/2, REGCAPA/D). In case the behavior is very similar on three different boards using two different revisions, you may have a design or device issue and not a soldering problem. It often pays to question one's own design, but over the years, I have run into several issues within actual ADCs. Maybe you can include measurement results in your question. This may become interesting. \$\endgroup\$
    – zebonaut
    Commented Apr 27, 2022 at 6:51
  • \$\begingroup\$ Have you tried cooling just parts of the board with cooling spray? Just the capacitors or just the IC? \$\endgroup\$
    – kruemi
    Commented Apr 27, 2022 at 7:46
  • \$\begingroup\$ Also, your AGND seems to be split... The pur under the chip connects to two legs of the Device but not to any other AGND trace. Have you run a DRC? \$\endgroup\$
    – kruemi
    Commented Apr 27, 2022 at 7:49
  • 1
    \$\begingroup\$ Whenever different temperatures result in problems, suspect the soldering. Other than that, some signals in your schematics say solenoid. Do the solenoids relate to the ADC data in some way? Because they are typically very sensitive to temperature changes. \$\endgroup\$
    – Lundin
    Commented Apr 27, 2022 at 10:00
  • 1
    \$\begingroup\$ Can you describe exactly how you test your design at cold temperatures? Do you use a thermal chamber, or just go outside, or...? The fact that this happens consistently with two batches with a redesign in beetween suggests that the issue may very well not be the cold, but something else that you inadvertently do when you cool down your design. \$\endgroup\$ Commented Apr 27, 2022 at 20:29

5 Answers 5

9
\$\begingroup\$

Given you have systematic failure across two different batches made in different ways, I think it is less likely to be a manufacturing issue. Still possible though.

Two other possible causes came to mind:

  1. Condensation: is the air humid and when the circuit is at 15C condensation is forming on it. It's not a very low temperature so would be a bit of a surprise.
  2. Calibration: I haven't been through the datasheet or source code to see, but is it possible that something limits the temperature range over which the calibration / self-calibration is trusted and so if the temperature changes by too much either the IC or the source code returns an error?

Then I looked at the full schematic :-) In particular the reference voltage used. The circuit choice is strange to say the least. MY MISTAKE The 10V reference is made by dividing down an LDO output. ADP7142 is specified up to an input voltage of 40V but as configured (by R69 and R72) the output should be 41.7V! What input voltage are you giving this part (VBATT)? You are out of specification and the ADP7142 is probably shutting down. END MY MISTAKE The ADC has diagnostic detection which includes checking for a valid reference (>0.7V) which may be disappearing. There's a testpoint you can use to confirm this.

You appear to be providing the ADC with a 10V reference and the ADC is running at AVdd = 3.3V (3.6V max). The reference shouldn't exceed the AVdd.

enter image description here

MY MISTAKE Even if ADP7142 was capable >40V operation, or you changed the resistors to fix the 41.7V issue, END MY MISTAKE the scheme for this reference is far from ideal. There's no capacitance on it. Also there are numerous error sources that mean the accuracy is probably of order +/-3% (presuming 1% resistors, schematic doesn't specify). Your 24bit ADC is using a reference that is 3% accurate. You could happily use an MCUs 12bit ADC for that.

MY MISTAKE Aside from the >40V issue, END MY MISTAKE I'd recommend you look carefully at your references as well as considering what accuracy you need. None of the references looks accurate to better than 1% (haven't checked). You may be able to get a sensible part that has 10V and 2.5V outputs. Otherwise there are cheap 10V and 2.5V regs (0.1% LM4040) and you could divide down and buffer to make the 1.65V reference likely with 0.2% accuracy using 0.1% resistors.

\$\endgroup\$
7
  • 7
    \$\begingroup\$ "You appear to be providing the ADC with a 10V reference and the ADC is running at AVdd = 3.3V (3.6V max). The reference shouldn't exceed the AVdd." Ouch well... problem found, I would say. ADC tend to get very strange behavior once you've blown 'em up. \$\endgroup\$
    – Lundin
    Commented Apr 27, 2022 at 10:02
  • \$\begingroup\$ Thanks! I think you may have misread the resistor values. U17 powers VBR at 10V (with a 12V input). The reference voltage is 1/4 of that, nominally 2.5V. I've measured all of these rails to confirm they're working as intended. \$\endgroup\$
    – Tim Vrakas
    Commented Apr 27, 2022 at 18:09
  • \$\begingroup\$ Hi Tim. It's not the resistors it's the ADP7142 datasheet that majors on the 5V reference version as opposed to the adjustable version with a 1.2V reference. That changes the output of ADP7142 to 1.2*(1+11/1.5) = 10V. Could it be the soft start of ADP7142 which takes 1ms. If you take a sample in that time the reference will not be valid. Must wait >1ms after EN goes high (i.e. power up). \$\endgroup\$ Commented Apr 27, 2022 at 19:27
  • \$\begingroup\$ Could it be thermal expansion of a connector meaning it makes/breaks a connection? Also have you checked the contents of the error register. I wonder if you are accidentally using the internal reference and it's having issues due to the lack of capacitor on the REFOUT pin. I was looking to see if thermal capacitance variation is a possible cause, so you could look more at that. \$\endgroup\$ Commented Apr 27, 2022 at 19:50
  • \$\begingroup\$ Thanks for all the input. For this application (measuring a load cell) I don’t actually care about the precision of the 10v rail, as it’s used as the excitation and reference voltage. I calibrate out the imprecise resistors that divide 10v -> 2.5v. The load cells have a few % bias before calibration anyway. \$\endgroup\$
    – Tim Vrakas
    Commented Apr 28, 2022 at 10:24
6
\$\begingroup\$

TL;DR: The ADC is on an analog ground plane that is connected to the digital ground plane by a single segment 10 mil trace, about 1.3" away from the ADC, away from digital signals straddling the plane split. And then the ADC has a DGND trace going to the digital plane. And the analog circuitry has no power plane. It's all traces and capacitors...

And then the digital side... has no power planes, just ground planes. The power goes around on 25 mil traces...

And the ground planes are not stitched, other than a half-hearted effort done at the AGND plane border...

Yeah, no. Even if this somehow, magically, will begin to "work", it'll be failing in the field. This is an EMC nightmare, a susceptibility nightmare... Just no. It's a tour-de-force in what not to do. Amazing.


It is unlikely that the manufacturing has anything to do with the problems. Unless fake parts were used, it almost never is the problem in such designs - not with the symptoms you observe, not when the design has never worked before, and not when you don't have a bunch of working designs behind you already. And even if you got bad boards and fake parts, the layout itself makes it a no-go.

As seen on the sample of the layout you posted: the AGND/DGND split is likely the root of all the trouble. When you do a design, start with one ground plane, solid and uninterrupted, with only vias making holes in it. Split the planes only if/when needed. Prefer to keep unwanted signals local by routing and minimizing loop areas rather than plane splitting. It's OK to have local impedance-reducing planes under switching regulators and such, but in most cases ADCs work just fine with one analog+digital plane as long as the ADC's package design is sensible, and as long as the board cleanly separates the analog and digital circuitry on opposite sides of the ADC.

This board has so much empty room that there's no need for split planes. Just the physical separation will ensure that digital return paths won't pollute analog signals - assuming the routing of those is done appropriately.

DGND is the return path for all of the digital signals going to/from the ADC. But you run them over the AGND plane, and DGND is just a trace... This has approximately zero chance of working properly - the digital signals are corrupted badly, if you only measured them. You're extremely lucky that this problem has shown up now, rather than later. It wouldn't take much - just a slightly different routing for some traces - for the design to appear to "work" - except the eye patterns on all the digital signals would be despair-inducing, and the design would be an EMC nightmare. This thing must radiate quite well, and - conversely - it's also susceptible to digital noise nearby.


I've reviewed the Gerbers... it's worse than I thought. I'm amazed this works at any temperature...

The ground plane splits must go. All of them. Start with a single solid ground plane. And before you add any splits, have it manufactured and working at least once with a solid plane. Then understand exactly the return current paths for all the signals that will cross the split. And then remember that nothing digital that doesn't have carefully controlled edge slew rates can cross any splits whatsoever.

Ask someone with experience in signal integrity to explain this, since it'll be much faster hands-on than over the internet. Writing up the do's and dont's of this design would take me a solid day of work. I'm amazed that such a project at such a respectable institution would be essentially free-running without guidance from people who have some industry experience doing such layouts. This should have never made it to production as-is.


Ground plane splits are for the experts. Do not attempt if you're a novice without solid grounding in signal integrity - splits will always make your life a million times harder.

Do not blindly follow datasheet advice in this matter: a lot of the "advice" requires solid understanding of where it comes from. The advice is often more a reminder to professionals about what they should not forget or at least consider, rather than something to just take on faith.


Since you're at Drofnats, I imagine someone at EE will have an EM modelling setup that can extract the lumped model for any of the signal traces on the board, based on the gerber layout and board stackup. I highly suggest you get this for a couple of digital signals going between ADC and MCU, stick it into spice - even CircuitLab on this site will be enough - then dump some suitably fast pulses into this transmission line model and see what happens. It'll be wondrously amusing and educational, and it's basically a must see if you haven't gotten a TDR and some bare boards to play with (and someone to explain how to "ping" signal traces accurately using a TDR).


If you want more help with this, make sure there are PDFs of the schematics in the repo. Not everyone uses Altium.

\$\endgroup\$
5
  • \$\begingroup\$ Signal from SPI bus while operating properly: SCLK: ibb.co/ySSJ8kM MOSI: ibb.co/M92RGp2 MISO: ibb.co/THtdbYX these waveforms don't looks like they have significant ringing to me. I do appreciate the note about the DGND return not being near the signal lines, that could certainly be improved! \$\endgroup\$
    – Tim Vrakas
    Commented Apr 28, 2022 at 8:08
  • \$\begingroup\$ Awesome. Now probe DGND vs AGND at the pins of the ADC using a wideband differential probe :) The probe springs need to be 0.5” or less otherwise you’ll miss it. This isn’t about routing DGND trace next to SPI lines. A trace wouldn’t do at all. You’d need a “trace” probably 300-500 mil wide running below the SPI bus. \$\endgroup\$ Commented Apr 28, 2022 at 11:35
  • \$\begingroup\$ When probing DGND vs AGND, remove C77 and C79. After all, those are just decoupling capacitors for the supply and if anything, their removal should lower the current dumped into DGND and improve its quality. Well, as you’ll see, it won’t. And that’s your problem. The split planes are a fantasy in this case because C7x dump some of the SPI return current onto IOVCC as well. But then IOVCC is this little trace as well… \$\endgroup\$ Commented Apr 28, 2022 at 11:43
  • \$\begingroup\$ @TimVrakas I'm sorry you've spent so much work on this, but that layout has to be fundamentally redone at least as far as ground and power distribution goes. Feel free to contact me privately, I can probably fly out to Drofnats for a weekend and help you guys out, but that will be a solid two days of work where you'll have to learn a lot in a short time. This design is not salvageable as-is, and you're seriously wasting your time trying to "make it work". Whatever "test results" you're getting are meaningless. You'll be fighting goblins on this for ever. I mean it. \$\endgroup\$ Commented Apr 28, 2022 at 20:55
  • \$\begingroup\$ –1 needlessly cranky/condescending \$\endgroup\$
    – Reid
    Commented May 5, 2022 at 22:08
5
\$\begingroup\$

Change of behaviour due to temperature variations may also be due to timing issues on the digital interface. I would suggest you review the implementation of the digital interface (clock speed, clock edge, polarities, ...) and also compare the signals when the device works vs doesn't work. It may bring some light.

[I know this is not a full answer - just don't have enough reputation to just put a comment]

\$\endgroup\$
1
  • 2
    \$\begingroup\$ Welcome :-) Since you are suggesting something as a possible answer, along with suggesting a troubleshooting approach, then that shouldn't be a comment anyway and it will stand as an answer (even if only a partial one). As you are new to this part of Stack Exchange, I recommend you look at "our" tour & help center. Also here is the SE FAQ. Thanks and again, welcome. \$\endgroup\$
    – SamGibson
    Commented Apr 27, 2022 at 17:02
4
\$\begingroup\$

It’s probably a bad solder joint but it could also be a damaged plastic package. You can find an entire application note on dealing with that particular package here.

Note that the typical reflow cycle only has the peak temperature held for 20-40 seconds. You didn’t use thermal vias so it should have soldered fairly quickly but without preheat and controlled cooling the 250 degree C change could easily cause fractured joints. The joints can last fewer than 1000 cycles of only 35 degrees C…

It’s also possible to damage plastic packages by heating if they have not been properly stored in a low-RH environment or baked adequately in lieu of that, but you’d probably see that. This particular package is MSL level 3, and you can see some handling requirements here

I suggest removing the chip, cleaning the pads, and attempting to re-solder it with plenty of flux and preheating the whole PCB to perhaps 100 degrees C, and allowing slow cooling.

\$\endgroup\$
3
  • 1
    \$\begingroup\$ Thanks, I'll try that! I also should have mentioned that this issue occurs on all 3 of 3 prototypes (2 of a first revision, and the first of the most recent). I tried reworking the chip several times on the first revision boards, with no change. \$\endgroup\$
    – Tim Vrakas
    Commented Apr 27, 2022 at 6:29
  • \$\begingroup\$ I doubt very much that manufacturing has anything to do with this. Looks like a complete red herring. Just think about it: if the manufacturing was a problem, lots of the customers would report issues, and the company would be out of business in short order. This design has never worked properly, and as such the design itself should be the first suspect. I also expect that the temperature is only coincidence as well: something fundamental must be messed up here. Nothing that works correctly otherwise will stop at 5-10C less, and especially not a simple ADC circuit like we see here. \$\endgroup\$ Commented Apr 28, 2022 at 2:46
  • \$\begingroup\$ @Kubahasn'tforgottenMonica it could certainly be so, and the added information after this answer was composed that it was not a single unit makes that much more likely. We had some very high end TI ADCs failing occasionally and some power supply changes fixed it (related to the filtering). An unanswered query from yours truly is likely still visible on the TI forums. \$\endgroup\$ Commented Apr 28, 2022 at 3:05
2
\$\begingroup\$

After several months dealing with this issue, the moment of realization came to me at 4 AM as I was drifting off to sleep.

The SYNC Line
The ADC has an external synchronization feature that allows multiple of the chips to be fed with a single synchronization clock, with a guarantee that samples will be taken at a precise interval after the SYNC line goes high. The datasheet does not include clear instructions that the line needs to be pulled up or down if un-used.

The Dev Boards There are two publicly available AnalogDevices dev boards: one for general use with a PMOD header and all pins broken out, and one in an application-specific setup for thermocouple measurements. On the former, the SYNC is wired to a breakout pin, with a pull-up. I didn’t think much of this. On the second however, the SYNC line is tied directly to VCC.

That was strange enough that it sent me back into the datasheet: image from the datasheet describing SYNC functionality

Reading this set off the alarm bells. In particular “held in a reset state” was a potentially bad thing. But wait; how had I connected the SYNC pin on my board? wired directly to the MCU

It’s wired directly to the MCU, without any pull-up or pull-downs. OK, that should be fine. But what are we doing with it in the MCU?

code where nothing is being done with the SYNC line. Nothing.

The SYNC line was floating. When the ADC temperature was higher, some internal parasitic was changing in the chip, and pulling the SYNC line a little higher, causing the ADC to function properly. When the temperature was lower, it would drift down, and cause a permanent reset of the modulator. The digital electronics would still be running perfectly, but ADC itself was stuck in reset.

a trivial digitalWrite() call in the code

A trivial 2-line change to the code caused the issue to immediately be resolved, and it hasn’t been seen since:

Testing with an Ice Cube

I want to thank everyone who responded for your detailed examinations and thorough explorations. There are a number of sub-optimal design choices that came to light, which I may attempt to fix in future versions.

@Kuba hasn't forgotten Monica, we’ve named a PCB in honor of your many responses: 

PCB named "Drofnats"

\$\endgroup\$
0

Not the answer you're looking for? Browse other questions tagged or ask your own question.