Here at EOI we have three main kinds of project. One is our internal technology development projects. Some of these fail, mostly because they tend to be insanely hard, but the ones that pay off give us important new capabilities.
The second is research projects with customers, trying to push technological limits in fields such as biochip DNA sequencing, nanoantennas for infrared detection, ultrahigh resolution optical microscopy, hypersonic lidar, and (closer to home) infrared remote controls for consumer electronics. Those ones are a bit sporty, but succeed more often than not.
The third and most usual is customer work aimed at product development. These projects almost always succeed. On the rare occasions when they don't, it's generally because of client imperatives such as cancelling an already-funded project, whether for internal reasons or because the exernal funding went away. (Of course we aren't perfect either---I've already posted a project from 20 years ago where the responsibility was more nearly 50:50. Still, our record is very good.)
One example of a tantalizing near-miss was a transcutaneous (noninvasive) sensor for blood glucose and alcohol, to replace finger pricks (ouch) and breathalyzers. It was really sad—folks have been working on that problem for 30 years, burning through mountains of cash, and mine is the only one I know of that actually worked reliably. Here's the story.
The founder called me out of the blue at 3 PM on Christmas Eve, 2012. He turned out to be a charming and intelligent fellow with a lot of drive, who was almost entirely self-taught and was practically supernatural at raising money. He wanted me to build him an instrument, because that's what I do. We eventually became good friends.
He'd patented the general principle, which avoided the individual physiological variations that usually bedevil those sorts of measurements. The idea was to use a hand cradle with a virtual pivot(*) holding a fibre bundle against the web of the first and second fingers. The location is perfect: there are two arteries very close to the surface, so you get to measure fresh blood instead of tissue fluid, and no one has hair, fat, or calluses there to get in the way. (The finger webs are also quite tender to the touch, so if you put a small-diameter pin there as well, you can prevent the user from pushing so hard that the arteries get squashed.)
He had some promising data that he took himself using a Perkin-Elmer FTIR (Fourier-Transform Infrared) spectrometer and his hand cradle. He made arrangements with some folks at USC to provide him with lab space and a bit of technical help doing that. (He was a very able guy—being self-taught has great strengths as well as some profound weaknesses.) The USC statistics folks worked with him to develop AI-based detection algorithms for alcohol and glucose, which did very well, but of course his FTIR cost $100k. So he called me.
The project was unusual in that I didn't have my arms around the whole measurement. I designed and built the gizmo, but the founder had his USC statistics colleagues use their AI chops to build the model and extract the blood solute data, so I never knew in detail how that was done. (It wasn't anything simple such as spectral differences or ratios.)
I did a photon budget, which is my term for a detailed feasibility calculation emphasizing stability and SNR. That's super important, because without calculating how good the measurement could be, you never really know how you're doing. A photon budget prevents you from wasting time on recreational impossibilities on the one hand, or turning a silk purse back into a sow's ear on the other.
In this case it looked as though we could get a very good measurement fairly simply, using a tungsten source and condenser, a custom-designed split bundle of about 20 fibres (TX + RX), a conventional Czerny-Turner monochromator, a single extended-InGaAs photodiode at room temperature, and a chopper wheel plus lock-in for detection. (That passes for simple for the IR business.) The proof-of-concept (POC) system took me about six weeks start to finish, including the photon budget, optical design, designing and building the electronics, assembling the optomechanics, and writing the software.
It was built on a 12 x 24-inch aluminum breadboard using a combination of hacked Microbench(**) parts, JB Weld (the poor man's machine shop), a chopper wheel, and a servo from an RC airplane for moving the grating. The servo had titanium gears, so it was pretty manly for its size, and the grating cradle was also built from toy airplane parts, all courtesy of servocity.com. The electronics were hand-wired in die cast aluminum stomp boxes, dead bug style, connected with BNC cables. The chopper was a commercial unit from Thor Labs, and the back end was a console-mode C++ program running on a second-hand laptop and communicating via a LabJack data acquisition brick. The LabJack also produced the pulse-width-modulated (PWM) signal to control the servo. (This was early on, when EOI was just me. Nowadays it would have one of our MCU-based products inside, and Simon would have done some nice firmware to make it take data and communicate over USB, among other things.)
It all worked great, and was very amusing to watch—an advanced clinical instrument built with JB Weld and toy parts. Wouldn't the FDA have loved that?
We did the preliminary acceptance test by having some friends over for drinks and measuring all of our blood spectra every 15 minutes or so. Qualitatively the data looked exactly as we hoped—nice repeatable curves with the right time dependence and no big physiological variations between subjects. We did some glucose work using a strip reader for comparison, but the strips have relatively poor accuracy, so we concentrated on the alcohol measurement for that part of the demo. (Quaffing a few cool ones is much more fun than sticking pins in your fingers, coincidentally.)
After the founder used the POC data to raise a bunch more money, we brought the proto and the Perkin-Elmer FTIR to a contract engineering house in Orange County CA that will remain nameless because they have this unfortunate tendency to sue everybody in sight. The founder kept me sort of distantly in the loop, but made a crucial mistake: he tried to save money by supervising the CE firm himself, when he didn't have the technical background.
The optomechanics needed redoing, obviously. The CE hired an external consultant to do most of that, and he did a very nice job overall. The folks doing the electronics, motion control, and software were a different story. They proceeded to fall into every pothole along the road, like a drunk. Ignoring both the photon budget and my working design, they proceeded to replace my front end with an ordinary op amp TIA, not realizing they were trashing the SNR by a factor of 30 (15 dB) in the process. (I managed to get that one fixed, and the guy responsible taken off the project. Unfortunately he wasn't the worst.)
They replaced my direct drive for the grating with a rubber belt drive, which did give nice smooth motion. I had initially suggested a sine bar, which is used in most Czerny-Turner monochromators on account of its high resolution and excellent repeatability, but they ignored that too. That put them in need of more encoder precision, so they added an encoder to the motor as well as the grating shaft, and did some trick to combine the two encoder readings. Of course this scheme rapidly lost all accuracy as the belt squirmed around while moving, so that the calibration wouldn't sit still. (A metal taut-band drive would probably have worked.) Even the encoder on the grating shaft drifted like mad.
I went out to California to try to get to the bottom of some of this stuff. It was an uphill battle, because I had no official position in their client's organization (i.e. I wasn't writing the cheques), but we did manage to solve that one. The encoder's output was a PWM signal, and the data was encoded as the duty cycle i.e. the ratio of the pulse width to the period (like the RC servo only backwards). They were measuring the pulse width by itself, using a capture input of their MCU. That turned the frequency drift into an angular drift. Fortunately, once found it was easily fixed in software. When that was done, I hit the poor encoder with cold spray and a heat gun, vastly exceeding its specified operating temperature range, but couldn't get it to drift at all. Kudos to US Digital for building solid encoders, even those cheapish ones.
The belt-drive system failed anyway, basically because the measurement was being done on the slope of the very strong IR absorption spectrum of water in the 1.4-1.7 μm range, so that small wavelength shifts caused much larger amplitude errors. That put a huge premium on wavelength accuracy. The wavelength range was narrow, meaning that the grating moved only a few degrees during a scan, so even 4096-steps/turn wasn't fine enough resolution. Once again, I told them to use the tried-and-true sine-bar drive, and once again they refused, insisting on using a worm-and-sector gear instead, with the encoder on the worm shaft to get more encoder lines per degree of grating tilt. This was another mistake.
What's so bad about worm gears, you ask? Well, they're fine for some things, but high-precision angular motion is not one of them, for a number of reasons. Nearly all kinds of gears use rolling friction; the gear teeth are shaped so they mesh without sliding, like train wheels on a track. This minimizes wear. Worms are the exception; they work using sliding friction, which requires a lot of lubrication. Moving back and forth through a few-degree angular range makes the grease film thin out with time, as you'd expect, and that had a serious effect on our angular accuracy.
I calculated that in their design, with the very small radius of the sector gear and the tight wavelength-error budget, the maximum lubricant variation we could tolerate was about 70 nanometres, about the diameter of a small virus. Since they were nearly finished with the prototype build for the formal clinical trial, I told them to use dry molybdenum disulphide (MoS2) for lubricant instead of grease. Being a solid, that had some hope of working.
They straight-up refused again, saying they couldn't get MoS2, so I sent them a link to the exact SKU on fastenal.com, after verifying that their local Fastenal had it in stock. I even sent them Mapquest directions so they could find the store. (That was a bit sarcastic, which I regret, but I was getting pretty tired of their nonsense along about then.)
They proceeded to ship one unit with grease and three units unlubricated. When I complained about all the fiddling they were doing, with no calculations to guide it, one cheery lad smiled and said brightly, "That's engineering!" (He was one of the better ones.) They also took the POC proto apart to use bits of it in their test setup, so that they had no comparison data, and, oh, yes, they broke the $100k FTIR and didn't tell anyone.
The clinical trial had to be scrubbed when the units failed the acceptance test. I attended it, but since the USC folks weren't crunching the data in real time, the failure wasn't entirely apparent till later.
All along, I told the founder about the problems, and he told the CE. They did fix a few things, but mostly they simply said "yes" and meant "no". Since I wasn't supervising them, they didn't keep me in the loop with what they were doing to fix the problems.
By that point the CE had run through a year's time and most of a million bucks, and the founder had to pull the plug. Some months later, two units arrived on my bench, each attached to an expensive National Instruments A/D box because they hadn't been able to get their data acquisition system working. Along with the boxes came hundreds of megabytes of documents and software, and an urgent request for me to get to the bottom of it all. Turned out to be a real onion problem—you peel off one layer, cry, and peel off the next. Here are a couple of the layers:
Layer 1: The phase of the detected signal was wandering around by ±10 degrees or so. Since the measured signal goes as the cosine of the phase, this amounted to a couple of percent error—easily enough to destroy the measurement. The control code seemed to be an ordinary proportional-integral-derivative (PID) controller using an optointerrupter on the chopper wheel, which should have been fine. I built a strobe light using an HP 3325A frequency synthesizer driving a LED, so that I could stop the motion and see the loop dynamics. (This is a standard trick in motor control.) The controller was totally broken—regardless of the settings of P, I, and D, there was no way of making the phase sit still. A gentle continuous stream of canned air would move the phase, and it would never recover—i.e. there was no integral term in the control law, despite what the settings would have one believe.
Layer 2: It turned out that they'd discarded my nice working analogue lock-in amplifier design (3 jellybean chips and some Rs and Cs) in favour of a digital lock-in, probably to allow them to re-use a previous design. They'd never built a lock-in before, and were trying to extract the (approximately trapezoidal) signal waveform by least-squares curve fitting to a sine wave, instead of multiplying by samples of the sine and averaging like normal people.
Everybody screws up their first digital lock-in(***), but I'd never seen one as bad as that. (For non signals-and-systems folks: least squares fitting works OK at high signal-to-noise ratio, but being nonlinear, it falls apart completely with noisy data. Multiply-and-average uses the linear orthogonality property of sines and cosines, and so works at any SNR. The fast Fourier transform works that way too.)
I didn't get to the last onion layer, because the founder ran out of both money and friends. He never did pay me for my last month's work. A pity—I would have made those boxes do good measurements eventually.
A few lessons learned:
Stay out of Orange County.
Seriously, having a sharp technical person supervising, with a formal specification, design reviews, sign-offs on hardware and software, unit tests, and so on would have prevented this disaster. Considering all the effort that was wasted, doing it right might not have been any more expensive, and the system would have worked.
Checking the CE's references would have been a smart move too. They claimed that their contracts mostly forbade them to tell who their customers were, and the founder fell for that one. Oh, and when checking references, if you get good ones be sure to ask the names of the CE's people that worked on those projects. They'll always give out their 'A' team's customers, but you may get the 'B' or 'C' team. In our case, the guy who did the lock-in and encoder work was their CTO, so presumably the 'B' team was worse.
So there you have the sad story of a great project that initially succeded and nevertheless failed in the end. I'd like to have another whack at it one of these times, because it could help a lot of people.
Phil Hobbs
(*) One where the business end slides around on a concave curved surface, like the blade assembly on some razors, so that the pivot point is outside the mechanism like the focus of a lens.
(**) A cage system using plates held together with 6-mm centreless-ground stainless steel rods, similar to the Thor Labs 30-mm cage system.
(***) Digital lock-ins are difficult because they have to pull weak signals out of very strong background noise. You have to be totally paranoid about things like slew artifacts, settling time, input voltage sag during sampling, and anything getting in on the reference voltage. Some A/D converters can't slew their internal nodes fast enough to prevent pattern-dependent errors, and many op amps have trouble handling the charge injection that occurs during the A/D's sampling interval. (It's called"kickout".) The frequency domain is pretty brutal, and as far as I can tell, nobody ever learns that except by way of at least one failure. (You also have to use the right algorithm, and curve fitting is not it.)