Download as pdf
Download as pdf
You are on page 1of 84
circuit cellar FPGA SOLUTIONS LOOK TOWARD AI PROCESSING On oy. . D Product Focus: Tiny Embedded Boards P MCUs and Processors | Murphy's Laws in the DSP World (Part 2) | Managing Tricky FPGA Designs | D Thermoelectric Cooling (Part 2) | Electronic Speed Control (Part 2)| in i | | 08> Build an Audio Response Light Display | Signature Analyzer Uses NXP MCU | | ll oMraan IoT Security (Part 4) | Filtering Pulsed Signals } The Future of IoT Cellular rust Your Application LS id os — r Developing embedded systems is a challenge. Higher complexity, better performance - PS rea ioc mare te a ee ee er ene ett Cr meer red needs for secure workflows and full code control, extending the developers’ reach all the Pema Seana ue ie cian Nee eee ho SMart Cuma C ment Reece eici) support you need to be confident in your code when building the products of today and the innovations of tomorrow. Using the right tools, you can trust your application and Rae) CRC URL @l4 J-Trace PRO Cortex The Superspeed Trace Probe 2 caRCUT cata» aucust 2oie #37 ea scan i dn cn LI LOUDSPEAKER SOURCEBOOK dae tee 7 sy Accutrace, Inc. c3 All Electronics Corp. vr Avnet 23 cette ban) chase TATED CCS, Inc. 7 evsmscupovs GridConnect, Inc. 63 ireuteetarcom/suetion TAR Systems, Inc. c2 aaivnia Tkalogic 15 (Const: Hugh Hinson Labcenter Electronics, Ltd. a rend Lauterbach, Inc. 49 ct Mentor Graphics 45 Pico Technology 67 SEGGER Microcontroller Systems, LLC 1 Sensors Midwest 2018 75 ee Technologic Systems, Inc. C4, 77 fame UO NOT A SUPPORTING COMPANY YET? Contact Hugh Heinsohn (hugh@circuitcella.com, Phone: 757-525-3677, Fax: 888-980-1303) to reserve space in the next issue of Circuit Cellar. THE TEAM rscuamneR PRESIDENT EDITOR-IN-CHIEF GRAPHICS KC Prescott Jeff Child Grace Chen CONTROLLER ‘ADVERTISING PROJECT EDITORS Chuck Fellows COORDINATOR Chris Coulston Nathaniel Black Ken Davidson ae David Tweed Steve Ciarcia ADVERTISING SALES REP. Hugh Heinsohn COLUMNISTS Jeff Bachiochi (From the Bench), Bob Japenga (Embedded in Thin Slices), Robert Lacoste (The Darker Side), Ed Nisley (Above the Ground Plane), George Novacek (The Consummate Engineer), and Colin O'Flynn (Embedded Systems Essentials) © vox eda caro Ghetcase here was a time when a standalone digital signal processing (DSP) chip was the primary technology choice for any kind of single processing system. But over the past 15 years they've become ever more relegated to niche and legacy applications. What's pushed them aside is not one thing. On the one hand general purpose microprocessors—even embedded ones—have become so fast that they can run signal processing applications fast enough to meet the needs of many signal processing applications. Next, FPGAs have added extremely powerful fixed-point and floating-point DSP engines on chip that rival even what high-end DSP chips could once do. And finally the idea of using graphics processing units (GPUs) as an alternative to general-purpose processors gained momentum over the past decade or more. This spawned the term GPGPU, meaning “general-purpose computing on graphics processing units” and usually the GPU itself in such a context is referred to as a GPGPU. Recently as I was researching for this month’s FPGA article, I did an informal survey of some of the FPGA vendors I talked to about their take on how today’s FPGAS are positioned as a signal processing choice as well as for computing in general. To be fair, asking FPGA vendors this question is kinda like asking a hammer what's the best tool for hammering a nail. That said, what they told me {gave me some good insights on how FPGAs fit in amongst today’s spectrum of processing alternatives. [As one FPGA vendor told me, the main advantage that FPGAs have is that their hardware can be programmed to do almost anything, including processing and signal processing. That flexibility can be translated into very high compute performance because it enables engineers to build hardware-based compute platforms that can be very efficient for specific tasks. Meanwhile, general processors and GPUs will continue to be more flexible but FPGAs can provide higher performance for a narrow set of pre-defined compute functions. ‘Another expert remarked that GPGPUs have the advantage that they allow users to design their systems software programming constructs. For their part, FPGAs offer the highest performance and lowest power solution, but their increased flexibility means that there will be a need for some level of optimization at the hardware programming level. High-level synthesis (HLS) and OpenCL tools help bridge the gap between FPGAs and software programming constructs. Such tools are expected Can't Stop the Signal to improve over the next five years and that will further boost the value proposition of using FPGAs. Feedback from another FPGA was that low latency and power efficiency are key factors. In defense applications for example, massively parallelized and low latency solutions are required in order to provide electronic warfare responses and machine learning analysis in real time. In such cases, FPGA solutions are unbeatable. In contrast, GPGPUs have brought some rapid development and scalability options to such systems, but at the cost of extremely poor power efficiency. GPUs lack good power efficiency on their own, and when many are used in parallel the issue gets magnified. Yet another FPGA expert I talked to put the priority on FPGAs as the interface to the external environment. For some applications, FPGAs serve as a security buffer that allows only validated and authenticated communications to enter the system. This approach is sometimes used as a level of security protection for general purpose processors ‘or GPGPUs. Today's leading high-end FPGA chips leverage programmable DSP precision, deterministic low latency ‘and partial reconfiguration capabilities. FPGA-based system architectures are optimized for parallel processing enabling them to process enormous amounts of data at power-efficient clack rates. If Thad posed the same question to a microprocessor ‘company or a graphics chip vendor, there's no doubt that I receive different answers. All that said, neither microprocessor vendors nor graphics chip companies have much stake in the game when it comes to signal processing system designs. That's not where their focus is, FPGA chip- and board-level companies on the other hand consider signal processing a major focus. Where that whole picture gets skewed is the fact that microprocessor heavyweight Intel acquired Altera, one ofthe top two FPGA ‘companies, three years ago. Clearly FPGAS have a solid lock and a bright future when it comes to processing continue to be an ‘exciting technology to watch. © 54 58 64 68 79 Embedded in Thin Slices Internet of Things Security (part 4) The Power of Checklists The Darker Side falls of Filtering Pulsed Signals Waveform Woes The Consummate Engineer Thermoelectric Cooling (Part 2) The Test Results From the Bench Electronic Speed Control (part 2) Building the Circuitry TECH THE FUTURE The Future of Cellular in the loT What 5G Means for the IoT’s Road Ahead 74 : PRODUCT NEWS 78 ; TEST YOUR EQ WD Srratcter WA crater 5 0.2 025 03 035 04 0.4 6 Build an Audio Response Light Display Modern LEDs in Action 1 2 Murphy's Laws in the DSP World (Part 2) The Next Three Laws 2 8 Signature Analyzer Uses NXP MCU Scope-Free Tester 3 6 Managing FPGA Design Complexity Easing IP Integration SPECIAL FEATURE 4 l FPGA Solutions Evolve to Meet AI Needs Brainy System ICs TECHNOLOGY SPOTLIGHT 4 6 MCUs and Processors Vie for Embedded Mindshare Performance Push PRODUCT FOCUS 5 O Tiny Embedded Boards Compact Computing Coury Tee ou ced TUE CUI Ng Ca UR eu) MUL Ceo) Rm SU CoM Lol igin in the primitive “light organs” of the 1960s, in which each as a band had its own color that pulsed MiSs AMCuelt ent ec t PURSE Re mice cure o DT Meets meal ae RL implements a light organ, using today’s err Reheat By Devlin Gualtieri ne of my first electronic circuits was a light organ that modulated colored incandescent bulbs with audio signals within particular spectral bands. The bulb colors were blue, green, yellow and red, And these colors represented frequencies in octave-spaced bands, from the lowest (60 Hz) to the highest (7.5 kHz). While an octave spacing on a piano consists of a doubling of frequency, the frequency ratios between bands on my first light organ was five, which gave channel frequencies of 60, 300, 1,500 and 7,500 Hz, Designing such a circuit was 2 chore using 1960's technology, The bandpass filters were implemented with audio transformers for which the inductance of the secondary windings, coupled with a suitable capacitance, gave an LC bandpass filter. The transformers were important for another reason. They provided isolation between the audio signal source and the triacs used to drive the incandescent lights with the AC voltage they needed. I learned a bit of electronics designing this light organ including the fact that the threshold voltage for the triacs changes as they heat up during normal operation. Today, things are easier. We have bright light-emitting diodes (LEDs) to replace the incandescent light bulbs. Not only do the LEDs operate at low voltage, but they also have a longer lifetime—and there's a factor of at least five in energy savings as well. Although today’s computing technology can be used to implement spectral filtering using software, a light organ can be made just as easily without a microcontroller or an SBC. This makes the circuit more accessible to those electronics hobbyists who don't have an established base of compilers and programming firmware. My three-channel light organ circuit uses just six commonly available ICs and a few transistors. Whether or not you build this circuit, you can learn a few design tricks by studying it. I used a 4000-series CMOS logic chip as a linear amplifier to implement the bandpass filters, and a_ pulse modulation technique to amplitude modulate the high intensity LED light modules. The Circuitry also uses a field effect transistor (FET) as a way to implement an automatic gain controller (AGC) to enhance the light show aesthetics. THE CIRCUIT Figure 1 and Figure 2 show the schematic diagrams for the audio response light display. Figure 1 shows the input amplifier and AGC circuit, a 1.5 kHz triangle wave generator used for the pulse width modulation, a voltage regulator and a bias voltage generator. The Q1 input amplifier—a single transistor amplifier—feeds a secondary operational amplifier though a FET attenuator. The Q2 2N7000 enhancement mode FET, which implements the AGC, reduces its resistance from essentially infinity to a few ohms when its gate-source voltage reaches about 3 V. The ICib second stage of the triangle wave generator integrates a square wave produced by the voltage comparator of the ICia first stage. Positive feedback on the comparator stage gives ita flip-flop action, and the pair produce a stable oscillation. The amplified audio signal is filtered into three passbands by IC6. What's unusual about this IC is that it’s digital, not analog. CMOS inverting logic circuits—such as the 4011 used here—can be biased into linear operation by a resistor connecting their input and output. In this operating mode, the amplifiers can have significant gain and high frequency response. In fact, the CD4011B that I used has too much gain, so it will oscillate without the addition of a small capacitance shunting its output to ground. Such a capacitance would cause a digital uit to draw significant power, but that’s, not a problem in this linear configuration. a s pf 9 magne » 7 $ Lo eT. rie haf the $V sup tothe IC empitiers ‘ABOUT THE AUTHOR Devlin Gualtieri received his Ph.0. in Solid State Science and Technology from Syracuse Universi- ty in 1974, He had a 30-year career in research ‘and technology at a major aerospace company and is now retired. Dr. Gualtieri writes a science and technology blog. Links to his blog and his Amazon author's page are available on the Circuit Cellar article materials webpage. Note the “8” designation on the 40118. These are the ones commonly available today. The first CMOS logic circuits, called the A-series, had just as many FETs needed to produce the required logic function. The B-series CMOS logic circuits ("B” is for “buffered”) have cascaded inverters at the output of the typical A-series circuitry, so they have much more gain. If you use a 401LA for IC6, the 0.01 shunt capacitors should be eliminated. ‘The band-limited signals are then rectified and filtered by portions of IC3-IC5. The rectified voltages, summed at point “G," are fed back to the gate of the AGC FET to effect the gain control function. The AGC boosts the low amplitude input signals (Figure 3). These voltages are applied to comparator circuits that compare them to the voltage of the triangle wave generator. Whenever a voltage is higher than the triangle wave, the appropriate LED driver transistor Q3-Q5 is roure2 Bandpass ites, rectifiers, andthe comparators tha reduce the pUse-nith modulated signals to re the LEDS turned on. In this manner, the LEDs are pulse width modulated with a signal that represents the audio intensity in the passband. The circuit uses several TLC2272 dual rail- to-rail op amps, but any generic rail-to-rail amplifier operable at 5 V will work, Some will even have the same pin-out. The TLC2272 is especially useful in hobby circuits. That's because it's available as a through-hole component and it functions over a power supply range of 4.4 V to 16 V. You can use its quad amplifier counterpart, the TLC2274, in place of the 4011, but you'll need to redo the circuit design for the bandpass filters. The passbands for this circuit—400 Hz, 900 Hz and 3.5 kH2—were generally dictated by the capacitance values available and which frequencies gave a pleasing display. While resistors are available in 1% tolerance, capacitance values generally are accurate to within 5% to 10%, so each build will give slightly different frequencies. The filters were designed to give an averlap between the spectral bands to allow a smooth transition between the bands. The experimentally determined filter responses are shown in Figure 4. SUPERBRIGHT LEDs ‘An obvious choice for the audio response LEDs are the “superbright” LEDs available in many colors from many sources. Since the voltage drop across a series connection of three of any of these LEDs is less than 12. V, an appropriate current-limiting resistance can be found to ensure high brightness at a safe power level. This was my first course of action, using two strings of three LEDs, or a total of six LEDs for each color channel. The light output was very disappointing, so this idea was scrapped. ‘As most people know, the first-generation LED bulbs for replacement of compact fluorescent and incandescent lighting had a high failure rate. This wasn’t because of any failure of their high-powered white LEDs. Instead, it was usually failure of the inexpensive power-conversion electronics used to drive these from household mains voltage. I had a few of these failed bulbs on hand, so I attempted to harvest some of their white LED chips. While this is not a recommended operation, I wes able to get a few white LED chips from these to function well enough for testing. The light output was much better, so T decided that white illuminating LEDs were best for this application. My final solution was to use automotive interior lighting modules operable at 12 V and about 100 mA of current, as shown in Figure 5. These inexpensive modules 3.0 E = 20 @ & & s 2 10 é 0.0) 0 100 200-300 «400~=«S00 Input Voltage (mv) rune 3 Esperiertay etre trae function ofthe 20700 FET AGC cet. In this ase, the out cage was the vlage mmtored atin fC, i £ 2 & 2 Xo ‘ooo ee00 Frequency (Hz) racune 4 Experimentally determined pasar er isan aray ftv autamtive irterr ano modules with calles as used inthe aut Unt. The connector improved from peta copper fl FIGURE 6 ‘Shown hee is the preted circuit aor er the aus response ight poy. The crcl bootd was designed 2 single-sided with usta fw jumper <== = a8 Oo e — rIouRe 7 Rear pre! connections for the euti-response ight dspty, The ert microphone can be mechricly ‘mated with 2 Blutosth earpiece tn provide wress comet FIGURE ® Pyramidal gh ifiser ued on the interie of hepsi cube (town atthe ight in he actos ui responce bah epay. Te artace ofthis Plexi pyramid was cated wth a mate rich epay, ae was eevee cube For detailed article references and additional resources ‘www.clrcuitcellar.com/article-materials contain multiple white LED chips and have an internal current-limiter, so they replaced the LED strings and current-limiting resistors shown in Figure 2, Because these produce white light, they need color filters. 1 found that colored permanent markers on plastic transparencies work quite well, as do bits and pieces of colored plastic items found in “dollar” stores. The positive image for the PCB is available onthe Circuit Cellar article materials webpage. 1 etch my own circuit boards, so they're always designed to be single-sided copper with as few jumpers on the component side as possible. Also, because surface mount devices are difficult to work with, I always use through-hole components. The result is a larger circuit board than otherwise realized. However, in the case of a device such as the audio response light display—which is large in itself—the large (6" x 4") circuit board is not a problem. A 12-V “wall-wart" module is used as a power supply. The finished circuit board as mounted underneath the lighted cube, and the component layout are shown in Figure 6. A digram of the board's component layout is available on the Circuit Cellar article materials webpage. The best results are obtained by wiring directly to an audio source. However, the audio-response light display will function with an electret microphone, as shown in the schematic diagram (Figure 1). Such a microphone works well when the display is placed on or neat a loudspeaker. It will also respond to room noises, s0 it’s entertaining to typically loud children. The microphone also enables Bluetooth wireless operation, as it can be mechanically mated with an inexpensive Bluetooth earpiece. The rear connections for the audio response light display are shown in Figure 7. OPTICAL ENGINEERING Having the colored lights appear as if they were radiating from the surface of the cube is more difficult than it first appears. I coated the interior and exterior of the polystyrene cube with a matte finish spray (Krylon no. 1311, obtained from a craft store), but most af the ight still appeared to emanate from the LED modules. I solved this problem by mounting an internal matte sprayed Plexiglas pyramid over the LED modules to add an additional layer of diffusion (Figure 8). A frosted lamp globe might work well in place of a plastic cube. © ee i eS ELECTRONICS DESIGN SOFTWARE? Z ‘ 5 The Alternatives 2 -_T Re | a + M-CAD Integration + SPICE Simulation Peer Try * Built in IDE SUSE B cel leu) | abcenter/ V VAN Arietta) Electronics Bd Pit teee lela y UC URC Ue Ctr Ta Murphy’s Law DSP World (P. ce The Next Three Laws 2). EVAN Mt MO ae COM cla Pentti tenet] Sub aite eres ASE ecm Ram nee processing (DSP). Part 2 of this article series Cem introducing “Murphy's ea aa ee eo eg Cee eer LoVe seers Michael Smith, Mai Tanaka and Ehsan Shahrabi Farahani found wave pressures and water reservoir levels contain information that we need to manipulate with computers. Mobile phone conversations become clearer to understand if we can keep just the electromagnetic signals, coming directly from the cell-tower and remove any interfering signals that have bounced off nearby buildings. Developers must use sensors to transfer things from the analog world into a stream of sampled values that can be handled by computers. They need to move into the world of digital signal processing—DSP, The working of proposed algorithms can be explored using mathematical scripting languages such as Octave or Mathwork's MATLAB. The generated input and output results can then be used to validate the correctness—andjor limitations—of real time implementations of these algorithms on hardware. However, working in this new digitized arena is not straightforward. Imagine: The developer has everything under control, with the new algorithm becoming the best thing since ‘of our simulations the invention of sliced bread. Then a minor change in a simulation parameter and things suddenly are not making common sense. The step forward improvement from the change has suddenly become two steps backward, and you worry thinking: “Just how much else is broken? What else needs fixing?” In Part 1 this article series (Circuit Cellar 335, June 2018), we mentioned that there are special properties of the DSP world that must be taken into account when developing DSP code. While these properties are not introduced by use of a mathematical scripting language tool, our understanding of the properties can be influenced by the tool. We suggested that faster progress could be made if a person understood how the six variants of Murphy's Laws, listed in Table 4, might impact algorithm development Unfortunately, the earthquakes potentially generated by Murphy's Law do not stop at a DSP Richter ML_DSP Level VI. In this article, we show the greater damage that Murphy can inflict even when tackling a low-level practical DSP example. PROPOSED DSP EXAMPLE Suppose for the past year, you have been recording intermittent audio signals of a noise nuisance experienced in a local neighborhood. A lot of hassle would be removed if these noises could be matched against a suspected noise source, and the problem removed. You have been capturing data over the past month in several households around the eighborhood. After a month you recognize that down in the basement, no signals are being recorded above 200 Hz except for the occasional loud human voice and the impulsive bang, made up of all frequencies, ‘as somebody slams a door. You also recognize that sampling signals at 44 kHz is generating 42 GB of data per household over a long weekend—0.5 Terabytes of data over a month. ‘That's a lot of data to store and process. (On the professional side, you decide that, in the future you will adjust your recording equipment settings to only sample at 1 kHz and reduce storage and processing time issues. This is about 5 times faster than the highest expected signal frequency, 200 Hz, and so avoids issues discussed on how to avoid the impact of Murphy's Law DSP_ML-L (Table 1) discussed in Part 1 of this series. As this is a community problem with recordings needed in many homes, you have been kindly loaned variety of recording equipment from your industrial “competitors”. Other recordings are being made by householders on their own smart phones. After adjusting the recording settings to a 1 kHz sampling rate, you find that the quality of the sound recordings from some of the loaned equipment has become terrible, very noisy and with many unexpected frequencies now apparently present in the adjusted signal that were not in the analysis of the original recordings. In addition, you decide to digitally down- sample all the legacy data already collected for storage and compatibility reasons. The Nyquist sampling criterion that forms part of DSP_MLT says that you will get invalid signals if you don't sample at least twice as fast as the highest frequency present in the signal. Can you take all the signals measured fon previous days at the audio 44 kiz rate and then store every 44* sample to achieve a new 1 kHz sampled signal? Since the signals have some background traffic noise and occasionally voices, wouldn't it better to develop a program to output the average of 44 samples to remove the higher frequency components? After coding these ideas, you check the duality of the processed signal in terms of the signal-to-noise ratio (SNR). The simplest SNR measure requires dividing the biggest signal amplitudes with the average noise level. The new SNR of the processed old signals seems better than the worst of the new 1 kHz audio recordings, but itis not as good as expected. So, what new DSP problems has Murphy introduced in your new recordings and data reductions during digital down-sampling procedures? MURPHY’S DSP LAW VII Th our June Part 1 article, frequency spectrum of a 4 He signal sampled at 8 Hz apparently showed the presence of 2-12 Hz signal, This turned out to be a misinterpretation of the output of the fast Fourier transform algorithm fft() and was solved by remembering DsP_ML-V: Appiying ftshift() at least once every day will help keep frequency confusion at bay! It turns out that down-sampling a signal off-line on a computer or through specialized hardware can introduce similar DSP frequency misinterpretation issues. Whether ‘Things will get misleading when you don’t remember that DSP signals, consist ofa finite number of sampled values. ‘ny lines drawn to connect the DSP sample values are merely a visual convenience and are probably introducing a misinterpretation of the signal's true characteristics When your simulation results don’t pan out, you've probably calculated something just one sample point out! During DSP, nothing is as real as it seems! ‘Applying fitshift() at least once every day will help keep frequency confusion at bay! By symmetry, anything that causes a problem during DSP frequency domain analysis is bound to be introducing a totally equivalent, but probably not immediately obvious, problem during DSP time domain analysis—and vice-versa! ey Creed Paced DSP_ML-IV Ped ey applying down-sampling through hardware or software, we need to be aware of the seventh DSP Murphy's Law: DSP_ML-VI: As I was going up a DSP stair, Tet a signal that should not be there! It should not be there again today! T wish that signal would go away! reuses themethods ofthe TineDonainSignal class developed in the first article to develop a1 second signal with an easy to find DC component and two other signals, Figure 1a. The feequency components of a 48 Hz cosine signal, shown in black in Figure 1b, all lie along the real frequency axis and are easily distinguishable from the components of an 86 Hz sine signal, shown in red, which lie along the imaginary frequency axis. Down-sampling bya factor of 2isachieved in In Listing 1, Investigate _DSP_ML_ VITO, the function SimulateTriplesignal( % Place in file"C:\DSP_ML\ Investigate _0SP_ML_VIT. function Investigate DSP_ML_VII( ) Clear classes; BEncourage Octave/MATLAB accepting class changes tripleSignal = SimulatetripleSignal(0, 48, 86): DisplayTimeSignal(tripleSignal, "DC, 48 Hz and 86 Hz’, *-*) FrequencySpectrun(tripleSignal, ... "Frequency Spectrum TripleSignal', 4% Down-sampling introduces aliasing into high frequency signals donnsampledTripleSignal = DownSampleSignal(tripleSignal); DisplayTimeSignai(downsampledTripleSignal. "Downsampled DC, 48 Hz and 86 Hz’, *-"); FrequencySpectrum(downsanpledTripleSignal, “Frequency Spectrum down’ sampled TripleSignal’, *-"): X% Add noise to triple signal noisyTripleSignal = AddNoise(tripleSignal, 1.0); [frequency, freqData] = FrequencySpectrum(noi syTripleSignal. "Frequency Spectrum noisy tripleSignal’, *-"); CalculateDisplaySignalSNR(freqData, -36, -4); CalculateDisplaySignalSNR(freqData, 4, 36) 4% Doxn-sampling introduces aliasing of high frequency noise donnsampledioisyTripleSignal DownSamp1 eSignal (noisyTripl eSignal); [frequency, freqDataDownSampled] = .. FrequencySpect rum(downsampl edNoisyTripl eSignal, “Frequency Spectrun down-sampled noisy TripleSignal', *-") CalculateDisplaySignalSNR(freqDataDownSampled, -36, -4); CalculateDisplaySignalSNR(freqDataDownSampled, 4, 36 % Place in file"C:\DSP_ML\SimulateTripleSignal .m™ function tripleSignal = SimulateTripleSignal (freql. freq2. freq3 ) fastSampl ing = 256; duration = 1; phased = 0; phase90 = 90; signalfreql ~ TimeDomainSignal(1.0, freql. phaseO, ... duration, fastSampling): signalFreq2PhaseO = TimeDomainSignal(1.0. freq?. phased. . duration, fastSampling) signalFreq3Phase90 = TimeDomainSignal(1.3. freq3, phase90, . duration, fastSampling); tripleSignal = signalFreql + signalFreq2PhaseO ... + signal Freq3Phase90; sting 1 ‘Dowsamplng 2 signal with large inkety, high frequency components wil nraduce obvious aliasing. Hovever, it muck be reverbered thatthe alin ntraduced by down sampling eve lon inesity igh frequency, bacdoround, rise campanents wil rapidly degrade a signa sgnato-maise aia because ofthe wide bandh ofthe oie. Oscilloscopes are so yesterday. the new class method DownSampleSignal() siven in Listing 2 by throwing away— decimating—every second point. As can be seen from Figure 2e, down-sampling in this, simple fashion causes a drastic change in the time domain signal's appearance. Because we are no longer obeying the Nyquist criteria of sampling twice as fast asthe highest frequency signal (D5P_ML) the frequency spectrum of the down-sampled signal, Figure 4d, is also distorted, The 86 Hz signal now appears as a signal around 40 Hz. We can interpret the strange results in the down-sampled spectrum, Figure 1d, as follows. The original signal was sampled 256 times in one second for 1 second. Applying the discrete Fourier transform will produce 2 frequency spectrum ranging from “128 Hz to nearly +128 Hz. The triple signal will have frequency components at 0 Hz, at +40 Hz and =40 fiz and at 486 Hz and -86 Hz. After down-sampling, the effective sampling rate becomes 128 times per second, which will provide a frequency spectrum ranging from -64 Hz to nearly +64 Hz. The frequency spectrum of the DC and 48 Hz signal will be correctly displayed. However, the +86 He signal can't be displayed, but its energy is morphed—aliased—into a false frequency signal in the -64 Hz to +64 Hz range. Instead of being displayed as a positive amplitude imaginary component 22 Hz above the 64 Hz border, it will be aliased into a signal that is 22 Hz above the -64 Hz border—at -42 Hz. Similarly, the negative amplitude component at -86 Hz will be aliased to become a component at +42 H2. A REAL RECORDED SIGNAL Now imagine a real-life signal with a DC and only a 48 Hz signal. Since there is no 86 Hz signal to alias, is it safe to down- Mild 2 LIT i i | 7 m My 1) Te time domain signa with OC, 48 He and 38 He component has (b) a ly understands fequeney spectrum. Dawe-sampling this gal by simely discarding every second recorded gone dastaly charges bth (time demain signal and (the Fequeney spectrum, ustine2 Dowr-sampling can be achieved by leering sale vals, Hoo % Place in "C:\DSP_ML\@TimeDonainSignal\DownSampledSignal.m” % NEW CLASS METHOD function downsampledSignal =... ths imple approach DownSampleSignal(signal) DSP_ML-I and causes nen-pyscal instance aed, frequency components. to % Modify TimeDomainSignal downsampledSignal = signal: downsampledSignal.timeData = signal.timeData(1 : 2: downsampledSignal.timeSampled = . signal.timeSampled(1 : 2: signal.numPts); downsampledSignal.numPts = signal numPts / 2; downsampledSignal,samplingfrequency = ... signal. samplingFrequency / 2; pear. signal numPts); end sample in the simple way? Figure 2 shows how DSP-MLVII answers that question very firmly. Now the triple signal has been made more real-life with added White noise using the AddNoise() and CalculateDisplaySignalSNR() given in Listing 3. The SNR of the DC peak, DC_ Amplitude / stdev(backgroundNoise), has been calculated at two locations at around 21.2. However, why has the SNR. dropped to around 13.9 after down-sampling? The precise SNR value will change each time you run the script, because the noise values are re-generated as random values. Before down-sampling there were 256 samples of the DC signal giving frequency spectrum peaks with an intensity of 256. After down-sampling there are only 128 samples, leading to a frequency peak of half the intensity 128. Common sense suggests that the SNR should be un-changed. There are now only 128 noise values, so their intensities should also be halved! This is where DSP_MLVII comes into play. Yes, there are less noise samples, lowering their intensity by a factor of 2. However, the noise frequencies in the frequency range “128 Hz to -64 Hz are aliased and added to the noise already present in the 0 Hz to +64 Hz range, and similarly for the noise the range +64 Hz to +128 Hz which is aliased down to the -64 Hz to 0 Hz range. Standard deviations—the estimate of experimental signal variations—of random noise sequences add as the square root of the sum of the squared deviations: o= Jo +0 which in this case increases the noise level by a factor: 250| (ae AMPLITUDE FREQUENCY SPECTRUM (Hi) FREQUENCY SPECTRUM (He) igure 2 (0) The SHR ofthe DC peak is rourd 21.2 before down-sampng. er dowsing (the SNR crops to 13.9 when bath hgh Frequency signa ae nose are aaed to aver ustins 3 Signa th white roe are ey to ereate with the rondn( operat neide the Addoee) method, The function CaleusteDispaySeraAR Calodstes the SAR rato of the OC ak using nae eel measured In cater sgn. % Place in file function noisySignal = :\DSP_ML\@TimeDomainSignal \AddNoise.n” % NEW CLASS METHOD AddNoise(signal. noiseAmp) noi sySignal = signal: centreAmplitude = 0; noisySignal .timeData = noisySignal.timeData .. + centreAmplitude + noiseAmp * randn(noisySignal.numPts, 1)": end % Place in file “C:\DSP_ML\CalculateDi splaySignalSNR.m" function CalculateDisplaySignalSNR(freqData, leftfreq, rightFreq) markHeight = 0.4 * max(real(freqData)); plot(Cleftrreq, leftFreq], {-markHeight, markHeight], Plot(Crightfreq, rightFreq], C-markHeight, markHeight], numPtsBy2 = length(freqData) / 2; Veftindex = leftfreq + numPtsBy2; rightIndex = rightFreq + numPtsBy2 ort ‘b's stdDeviation = std(real(freqData(leftIndex:rightIndex) )); SNR = max(real (freqData) ) 7 stdDeviatior text(leftfreq +1, -1.15 * markHeight, 'SNR'); text(leftfreq +1. pause (3); The original SNR will drop to: 21.2 4.9 which is close to the value of 13.9 in Figure 2b calculated for the specific noise values generated in this study. SIGNAL QUALITY LOSS The first question to ask is: How do we— or perhaps how can we—stop this loss of SNR when down-sampling the legacy data values that have been measured over the past year? This is a genuine DSP issue associated with any down-sampling, not an artifact of using a Visualization tool during a simulated study. This means the second question to ask is: When we adjust our equipment settings to sample at 1 kH2 in future rather than 44 itz, has our system’s hardware designer ‘overcome the noise and signal aliasing issues to retain the proper SNR? What is causing a problem here is a variant of Murphy's Law VI: DSP_ML-VI: By symmetry, any DSP frequency domain analysis problem has a non-obvious time domain equivalent. ‘The new hardware related version can be expressed as: DSP_ML-VIB: By symmetry, any DSP analysis problem you can simulate in software has a on-obvious hardware equivalent. “1.45 * markHeight, sprintf('Z4.1f". SNR)): Data down-sampling to reduce storage requirements is a commonly needed DSP ‘operation. We need to find out how to perform it, in software or hardware, without degrading the quality of the signals we want to analyze. ‘The most obvious solution is: “high frequency noise components can’t cause problems after down sampling if we get rid of them before down-sampling!” This is the idea explored in the Investigate_DSP_ML_VITIO) simulation study in Listing 4, To get Investigate_DSP_ML_ VILIO to run, you will need to add the Listing 4 scripts MovingAverager IA. and CustomFIR.m and the new class method FIRFilterSignal from Listing 5. There is an extra step when running the test code because the fir2() function to design FIR filters is not part ofthe standard Octave download. To fix this problem, you will need to install the signal package from Octave Forge. To do this, you type and run “pkg load signal” in Octave's command window ‘A moving average filter is just that— ‘generating a new sample output point y{ndT] vith ll high frequency components removed by averaging the last P input values, x{(o-P41)8T] => x{naT]. Then we move one-time interval down the input signal array and do a P point average again. Computationally, this means that we calculate: y(naT]= G@(ndT]+ x(n This has the same format as the operation of applying a finite duration impulse response filter: y(NAT] =a,x{nAT]+a,x[(n=1)AT] + +a, .xf(n-(P-DAT] where the FIR filter coefficients ap .. ap..are all equal to the constant 1/P. FIR AND MURPHY’S DSP LAW VII FIR filters have some very nice DSP properties. First, they are easy to calculate— simply a sum of time domain data values multiplied by coefficient values. FIR operations are so common that single cycle “Multiply and Accumulate” instructions are standard to most processors for real-time FIR operations. Another nice thing is that a developer can quickly check that the code for custom high- speed, long length FIR filters on a multi core system is correct. All that is needed is to generate an impulsive input signal made up of zeros and one non-zero value: 0, 0, 0, 0,0, 1, 0, 0,0... And apply the FIR filter. The fiter’s output impulse response is guaranteed to be an ordered copy of the FIR coefficients: 0, 0, 0, 0, 0, aor a; «» ap.ir O; O, ° Even more interesting is the relationship between the impulse response and the frequency characteristics of the filtering operation. In the last lines of Listing 5, we generated the FIR filter’s impulse response by overlaying the filter coefficients on top of. an array of zeros: FIRImpulseResponse = zeros(signal.numPts, 1); FIRImpulseResponse(1 + numFIRCoeffs) = filterCoeffs: % Place in file "C:\DSP_ML\Investigate_DSP_ML_VIIT.n™ function Investigate _OSP_ML_VIII( ) clear classes; ZEncourage Octave/MATLAB accepting class changes tripleSignal SimulateTripleSignal(0, 48, 86) noisyTripleSignal = AddNoise(tripleSignal, 1.0) % Show gain of Moving Average tine domain FIR [frequency, freqdata] = .. FrequencySpectrum(noisyTripleSignal, "Frequency Spectrum noisy TripleSignal’, [~, FilterGain] Hy: = FIRFilterSignal(noisyTripleSignal, ‘Moving Average’); plot(frequency, abs(filterGain) * max(real(freqData)), "WAG(MA FIR)", Vegend( *REAL()", pause (3); "IMAGO", *Linewidth", 3): "location", *Northtast") % Show gain with a 10th order Custom time domain FIR FrequencySpectrum(noisytripleSignal , strcat( ‘Frequency Spectrum noisy tripleSignal’, ... *- Custom 10th Order FIR"), » filterGain} 1y = FIRFilterSignal (noisyTripleSignal, ‘Custom’ ): plot(frequency, abs(filterGain) * max(real (freqData)), "WAG(CUSTOM FIR)", Tegend(*REALQ", 'IMAGC)", *Linewidth", 3); "location", .. "NorthEast"): % Place in file “C:\DSP_ML\MovingAverageFIR.n™ function filterCoeffs - MovingAverageFIR(numCoef fs) filterCoeffs = ones(numCoeffs, 1) / numCoef fs; % Gein adjust % Place in file” C:\DSP_ML\CustomFIR.n* function filterCoeffs = CustomFIR(numCoeffs) FilterOrder idealGain = nunCoeffs a (1.0 1.0 0.0 0.03; designfrequencies ~ (0.0, 51.2, 76.8, 128.0]; normalizedFreq = designFrequencies / 128: FilterCoeffs = fir2(filterOrder, normalizedFreq idealGain): sme « Inthe Investigate DS? ML IID simulation study we explore the fectveness cf removing tigh Frequency nase conganens using mong averope ond cstom FIR fering operations bere down semi, sme s ie cat rato FIR nein) slows the implementation of th fonder oF 210th oder castor FIR operation, The method returs bth the fired signal and the fiter gon. If you take the DFT of the impulse response of an FIR filter, last line in Listing filterGain = fftshift¢ FF(FIRImpulseResponse) ): and remember DSP_ML-V: about applying Aftshift() at least once every day, then you will get the filter’s frequency response. This can be superimposed upon the input signal's frequency characteristics, Figure 3a, to predict the output signal's frequency characteristics. [As might be expected, a new DSP idea means more things to worry about. May T introduce you to the eighth DSP Murphy's Law which can be expressed in the form of a pun: DSP_ML-VII: When designing finite impulse response filters, be careful what you ask FIR! The first filter we investigate in the Investigate_DSP_ML_VIIIO study, Listing 4, is a 4 order moving average (MA) filter in the script Nov ingAverageF 18.0. Here, each of the 5 filter coefficients has the same value: 1/5. Common sense seems to suggest that summing up and averaging the last 5 values of the input signal should be sufficient to remove ALL quickly varying high frequency noise and signals, leaving just the more slowly changing low frequency signals. However, this is not the case, as can be seen from the 4 order MA filter's response shown in blue in Figure 3a. Yes, low frequency signal components around 0 Hz are passed as expected, but there are also strong (unwanted) signal components passed for frequencies around 80 Hz and more again near 120 Hz. This strange result can be understood by applying this moving MA filter, with its 5 coefficients 1, 1, 1, 1, 1, to a number of signals. Slowly varying signals, such as (0.86 0.96 1.00 0.96 0.86] will generate an average of 0.93, indicating they will be passed by the filter. The varying signal, Si, with values 80 0,30 1.00 0.30 -0.80], and the more rapidly varying signal S2 with values [0.30 -0.80 1.00 -0.80 0.30] will average out to zero. However, a signal S3 = [-1.0, 0.0, 1.0, 0.0, -1.0], with a frequency between’ that of $1 and S2, will have a non-zero average passed by the filter, This behavior is seen in Figure 3a with the moving average filter having many pass % Place in file “C:\DSP_ML\@TimeDomainSignal\FIRFilterSignal.m" function [filteredSignal, filterGain] ~ % NEW CLASS METHOD FIRFilterSignal (signal, whichFilter) if stremp(whichFilter, filterCoeffs = MovingAverageFIR(5); else filterCoeffs = CustomFIR(11 end "Moving Average’) % 4th order MA filter 10th order custom filter numFIRCoeffs= length(filterCoeffs); data jgnal .timeDatas filteredData = zeros(1, signal .numPts): for count = numFIRCoeffs : filteredValue = 0; for coeffs = 1 : numFIRCoeffs Filteredvalue data(count end signal.numPts % No output for initial points filteredValue + ... coeffs + 1) * filterCoeffs(coeffs); FilteredData(count) = filteredValue; end filteredSignal = signal; % Generate a TimeDomainSignal Instance FilteredSignal.timeData ~ filteredData; % Frequency response = FIRImpul seRespanse(1 filterGain end DFT(Time domain Impulse response) FIRImpul seResponse = zeros(signal.numPts, 1 : numFIRCoeffs) = filterCoeffs: fftshift( FFtCFIRImpulseResponse) ); and stop bands. This is why, by chance, the 48 Hz components we want to keep appears with zero amplitude in this filter curve picture—meaning it will be removed from the filter output. By contrast, the hhigh frequency components, noise or the 86 Hz signal we wanted removed, will only be slightly reduced in their amplitude after filtering. MA filters are not the right thing to apply to remove potential aliasing problems before down-sampling signals! MURPHY’S DSP LAW IX There are a number of tools available to design the “good quality” FIR filters frequently needed during DSP analysis. In Listing 4’ CustomFIRQ design function we have made use of the built-in f2() tool To use this tool, we need to know the FIR filter order, numCoeffs ~ 1. We also need to make design decisions and to specify the filter gain: idealGain =[1.0 1.0 0.0 0.01; at a number of specific frequencies: designFrequencies =[0.0, 76.8, 128.0) zs A gain of 1.0 at frequency 0, means all low frequencies will be passed. A gain of 0.0 at frequency 128 means all high frequencies will be blocked. An interesting property of digital filters is that their design does not depend on the actual range of frequencies that need to be filtered, but rather on the number of samples that cover that range of frequencies. This means that the fir2 tool must be given the frequency values normalized by samplingRate / 2, in this case 128. normalizedFreql] designFrequenciest] / 128 Unfortunately, after designing our FIR filter we quickly meet up with the ninth DSP ‘Murphy's Law: DSP_ML-IK: There are many advantages of digital filters over analog filters, but providing perfection is not one of them. A custom 10 order FIR filter does a reasonable job of removing high frequency noise components, but as can be seen from Fig. 38, it would leave some of the undesired 86 Hz signal to be aliased down to 42 Hz after down-sampling. To reduce the residual level of the 86 Hz signal we need a significantly higher order filter. In Listing 6, we investigate how the SNR of the down-sampled signal is impacted by application of the 10 order custom FIR filter, applied to the time domain signal, and a new concept—applying the filtering operation directly in the frequency domain. Figure 4a show the output frequency spectrum of the noisy three component signal after filtering by the 10% order custom FIR filter. The filtering has significantly reduced the high frequency noise components and left a greatly reduced 86 Hz signal intensity. Figure 4b shows the frequency spectrum after down-sampling the filtered signal. Removal of the high frequency noise components means that the initial and final SNR of the DC component remains above 9 In » =a Bo Boo z z a. i, ~ REAL) = [REAL 150) Sivksioune avenace nel 180 [Stnxcteustou nn} FREQUENCY SoecTRUM FREQUENCY SPECTRA Ot roe GT icy reo ri a RA Wi WK. Ti it a gr re LW RE for we before ustinc 6 Inthe Imestgnt 0sP. M19 slmalston, we identi the biden Astor that ocur when spphing gta fers in ether the tine oe frequency domains. 22, just what we wanted! The filtering has significantly reduced the amplitude of the 86 Hz signal, now aliased down to 42 Hz, to just above the noise level. However, the shape of edge of the frequency response of the 10! order FIR filter has also made an unwanted reduction in the amplitude of the 48 Hz signal we wanted to retain for examination The amplitude of the 48 Hz signal can be recovered by increasing the FIR filter length and choosing filter coefficient values to % Place in file function Investigate_DSP_ML_IX( ) sharpen up the edge of the filter’s frequency response. However, as should be expected, DSP_ML-VIII is again lurking in the wings and can be expected to try to thwart our effects. PROBLEM OF FILTER TRANSIENTS It turns out that we can't code the iltering operation to be evenly applied to all the points of the incoming signal. The early part of the signal is not being handled as the later part of the signal as can be seen when we try to write the FIR code. ‘C:\DSP_ML\ Investigate _DSP_ML_IX.m" Clear classes; %Encourage Octave/MATLAB accepting class changes tripleSignal = SimilateTripleSignal(0, 48, 86); noisyTripleSignal = AddNoise(tripleSignal, 4% Apply Custom time domain FIR [filteredNoisyTripleSignal, [frequency, freqdata] = 1.0); filterGain]... = FIRFilterSignal(noisyTripleSignal, *Custom"); FrequencySpectrum( fil teredNoi syTripleSignal , "FIR filtered noisy tripleSignal’, plot(frequency, abs(filterGain) * max(re Calcul ateDisplaySignalSNR( freqdata, freqdata)), ‘Linewidth", 3); 236, <4); CalculateDisplaySignalSNR(freqData, 4, 36): 2% Down sample FIR filtered signal downsampledFilteredNoisyTripleSignal = .. DownSampleSignal (filteredNoisyTripleSignal): E+, freqloisyDataDownSampled] = FrequencySpectrum(dounsampledFiiteredNoi syTripleSignal, ... *Down-sampled FIR filtered noisy TripleSignal’, Calcul ateDisplaySignalSNR{ freqNoisyDataDownSampled, -36, -4 Calcul ateDisplaySignalSNRi freqNoisyDataDownSampled, 4, %% Apply frequency domain filter [filteredNoisyTripleSignal, ): 36); filterGain}... = DFTFilterSignal(noisyTripleSignal): [frequency, freqdata] = ... FrequencySpectrum(¥i1teredNoisyTripleSignal,. “DFT filtered noisy tripleSignal plot(frequency, abs(filterGain) * max(real(freqData)), Calcul ateDisplaySignalSNR( freqdata, bon *Linewidth", 3); 36, -4); CalculateDisplaySignalSNR( freqData, 4, 36): %% Downsample DFT filtered signal downsampledFilteredNoisyTripleSignal DownSamp1e5ignal (filteredNoisyTripleSignal): [+, freqoisyDataDownSampled) = FrequencySpectrum(downsampledFi1teredNoisyTripleSignal. "Down-sampled DFT filtered noisy TripleSignal Calcul ateDisplaySignalSNR{ freqNoisyDataDownSampled, -36, 4 CalculateDisplaySignalSNR( freqNoisyDataDownSampled, 4, ee 36); TT AVNET Cee Myth: Embedded vision can be complex. Fact: It doesn’t have to be. Speed past this and other myths as Avnet and € XILINX Xilinx move you from complexity to clarity. Watch now at circuitcellar.com/avnet » ol 20} 5) AMPLITUDE [FREQUENCY SPECTRUM (He) FREQUENCY SPECTRUM (2) FIouRE 4 {@) shows the frequency spectrum ofthe fitered oy triple signal with DC, 48 Me and 86 He requ tered nosy ple signal. Filtering before dnrsamping has retained oe level. However the tring has aso reduced the ample of he dl components. 8) sows the frequancy spectra of the dow-ameed SN ofthe OC component and reduced the arpitue ofthe unwanted 86 Hz sgl to ust above the 48H signal Computing the FIR the ‘mathematical operation: y{nAT] =a,x[nAT]+4,x{(n— AT] + ay x[(n—(P-DAT] requires This sum needs to be calculated using a code loop. To process the first input value, time zero, means that this sum needs to be performed with n = 0 and use input values at times -1, before we started measuring. Since we can’t call on Dr. Who, the Time Lord, any time anybody needs to do filtering, we must find compromises to get the best filter output we can. One way, is to hope that since we are measuring for a long time, that we can ignore this starting problem and use only the valid outputs we can calculate. This, we did in Listing 5 by performing the loop calculation using the limits: for count = numFIRCoeffs signal.numPts and initializing the first P-1 filtered values that can't be evaluated to zero. This generates a distorted initial portion of the filter output known as the filter's starting transient. However, earlier we mentioned we needed to increase the number of filter taps P to improve the filter response to remove filter distortion and to keep the 48 Hz signal we wanted (Figure 4). However, doing this will increase the early signal's distortion as the FIR transient becomes longer. The transient distortions become much worse for the longer length FIR filters applied in real life where the filter order may need P > 256 to get the correct filter response shape needed to process images of 1000 x 1000 pixels With audio signals, there is an out to remove the starting transient. Audio signals are continually coming in. Rather than just applying a P tap filter to the N points captured over the time, KAT —> (K+N-1)AT, we retain P older values captured earlier and process more values over a longer time period, (K-P#)AT —> (K#N-1)AT, However, this longer calculation takes more time to perform, and that can be a problem as discussed in the next section. In real life, you don't capture one small section of data, filter it and then admire the spectrum! Yes, when testing our code, the filtered signal's spectrum is checked to see that the filter operation is the one expected. However, in real situations we continually capture data, fiter it, and then perhaps play back data through a speaker. The trouble is, for real time analysis, all that filtering must occur BEFORE the next signal sample comes in. Each Pth order filter operation requires the order 4? fetches of instructions 2P memory fetches for the data and coefficients P multiplications, P additions during the FIR This is a total of 7P processor operations. All calculations must complete within. the next 1/ 44000 s otherwise the output signal will be distorted. We will ignore the additional time needed for the 4°, or more, fetches of instructions because many processors now have a Harvard rather than von Neumann architecture. This means their architecture Pipelines things so that the fetching of instructions from program memory occurs in parallel with data memory fetches. However, we need to worry about the 4P time to repeatedly fetch the new data and the filter coefficients, and doing al the math operations. ‘Suppose the FIR filter length P large—300 is not uncommon—and you are trying to keep processor clock speeds down either for product cost or power consumption reasons. Now 4P FIR operations, and all the other things your processor has to do, may take longer than the time between input samples. Under these circumstances frequency domain filtering may provide the answer Historically this approach, —_ multiple applications of the ffi, got 9 bum rap. This was because it requires more data and program memory storage than time domain filtering, and in the “80s it would cost $1200 for a 1 MB memory card extension. Memory costs are no. longer important today, although over-shooting the size of the available memory in a small product can have significant market considerations. During frequency domain filtering you % Place in file function [filteredSignal, filterGain] =... % DFTFilterSignal(signal) data = signal.timeData; Despite their names ‘must capture M data points and then attempt to filter all of the last M points before the next M points have been recorded. Frequency domain filtering involves transforming the M points captured into the frequency domain via an fit() algorithm. The amplitudes of the frequency components are directly manipulated to cause the filtering operation, and then the signal is returned back into the time domain with an if). We need to show that all these transformations can be done in a time less. than 4PxM cycles to solve the time crunch of implementing real time Pt* order filtering The literature shows that a M point fft() takes M log,(M) operations of the form A = B+CxD. Because A, B, C and D are all complex data values of the form x + jy, memory fetches, stores and additions take 2 cycles rather than 1. Complex multiplications, (a+jb) x (x+y), involve 4 multiplication and 2 additions— 6 cycles rather than 1. That roughly means that a M-point fft() will take on the order of 16 M log,(M) cycles, if we ignore some special fft() memory characteristics that will reduce this number bya factor of 2or more. For frequency domain digital filtering we need 2 fft() operations and an additional 2M multiplications and : \DSP_ML\@TimeDoma inSignal \DFTFi1 terSignal .m NEW CLASS METHOD Hove to frequency domain remembering DSP_ML V: Applying fftshift( ) at least once every day will keep frequency confusion at bay fftshift( ). and its inverse ifftshift( ), give identical results dataDFT = ifftshift(fft(fftshift(data)) Design the perfectly sharp edge frequency domain filter dataLength = length(data); FilterGain = ones(dataLength, 1): leftFreq = -64; rightFreq = 63; FilterGain(1:(leftFreq + dataLength / 2 - 1)) = 0; FilterGain((rightFreq + datalength / 2 + 1): dataLength) = Apply the filter dataDFT = dataDFT .* filterGain’; % Move back into the time domain and then 2 generate a TimeDomainSignal instance and fill it modiffedData = Fftshift(ifft(ifftshift(dataDFT))); filteredSignal = signal: FilteredSignal.timeData = end modi fiedData ustinc 7 In apparent conradiction to OS? HL 1%, a perfcty shaped, sharp edged digta fier can be degre for use inte equeny domain. A urdesee high frequency note ard sxe ponets an be removed tho impacting the amples of all the wanted ow frequency comgonets a) 25078) » gt 8 E so é e & zo 2 “ a 218 23 FREQUENCY SPECTRUM (Hz) - FREQUENCY SPECTRUM (Hz) roune s (a) Frequency de ain cig fiterng appt deosaglirg, the original SNR of bath the OC and 8 Additional materials from the aut 4 ta be the route ta remove al ua compares ar retained. 4M data_memory fetches and stores to perform the filtering operation itself. If we decide to process blocks of data of size M = 1,024, then frequency domain filtering requires a total of 6 M + 32 M log,(M) cycles, or around 330,000 cycles. This i faster than the 410,000 cycles needed for the time domain FIR filter for P = 100. The break-even point is somewhere in the range P = 60 to 80. In addition to the time savings, frequency domain filtering appears to be more flexible as you can shape all 1,024 frequency components to be what you want. FREQUENCY DOMAIN FILTERING Listing 7, the code to perform digital filtering in the frequency domain, appears to be an algorithm that provides the exception to DSP_ML-IX ~ Perfection is not possible with digital filters. First we apply a DFT using the ‘MQ operator to all the time domain data— rot having to worry about signal distortion from initial zero values because of time domain filter transients. Note the double use of the advice from DSP_ML-V about applying Aftshift. ‘Once in the frequency domain, we can design a perfectly shaped, sharp edged digital Iter, blue line in Figure 5a. All undesired high frequency noise and signal components. can be removed without impacting the amplitudes of all the wanted low frequency ithor are available at: Mathworks | wvew.mathworks.com GNU Octave | waw.gnu.org/softwa re/octave igh equeny noice and gal component who i in watedcogonets () components. The real and imaginary parts of the frequency filtered signal are returned to the time domain using an inverse DFT, combining another double use of the fftshift() operator with ifft(). After down-sampling, we appear to have a signal whose perfect spectrum appears in Figure 5b. The original SNR, around 22, of both the DC and 48 Hz components are retained. This points out the importance of avoiding aliasing when down-sampling the signals already stored on our computer. We also need to ensure that an adequate analog anti-aliasing filter is put in front of analog to digital convertor (ADC) before we sample the audio signal captured by a microphone or the values from any sensor. ‘Then we ask the question: What are we going to do with this perfect signal? The answer is: Combine it with other section of “perfectly” filtered signals from the hours of audio data we captured in community homes. Problem solved—and then DSP_ML- VI springs to mind (Table 1). The vice-versa part of this DSP_ML-VI is about to come into play. Go back and look at Figure 4 in the June Part 1 article. (Figure 4 from Part 1 is also repeated for you on the Circuit Cellar article materials page for this article.) In that Figure we saw that a time domain sinusoidal burst icant distortions in its frequency domain signal. Generating a burst is simply applying a sharp edged digital filter toa longer sinusoidal signal! Applying this sharp edged, time domain filter caused such distortion in the frequency domain analysis. Unfortunately, DSP_ML-VI then clearly hints that an equivalent sharp edged, frequency domain filter will be causing distortions in the final filtered time domain signal we want to further analyze or play back. Obviously, there are many more DSP Murphy's Law dragons still to slay! USEFUL HINTS This article is designed as a series of projects where you cut and paste the code from this pdf into your .m script. Alternatively, you can get all the code from the Circuit Cellar Article Code & Files webpage. Remember: Cutting and pasting code from pdfs in MATLAB and Octave .m scripts suffers from its own Murphy's Laws, especially with the occasional cut-and-paste hyphen and minus sign having the wrong format. When all the files in this project have been added to those of the June Part 1 article, then the directory structure C! DSP_ML should look like Figure 6 with all the class information in directory C! TimeDomainSignal. ‘Add the line pkg load the startup.m seript so that Octave will recognize functions, fir2) that are not automatically available. In developing these articles, we have been struggling with the way that Octave seems to have problems with handling classes within a class directory such as C\DSP_ML\@ TimeDomainSignal under Windows. ‘Adding the line: clear classes; at the start of each sub project script, SEEMS to have solved the problem of Octave to stop using old class scripts and instead use the updated one where we have just corrected all the (unintentional) errors we made. However, we have not really found a solution to ensure Octave to automatically recognize new methods placed in @TimeDomainSignal in the same way that MATLAB does. We have been using two approaches when developing and testing our code for these: 1. Late at night we entered all the new scripts, Investigate DSP_ML, and class methods associated with the next section of the project. Now close Octave and go to bed for a well justified sleep. Restarting Octave at the beginning of a new day, or any other time, solves all problems of recognizing new class methods. 2. WIDFI (When in Doubt Fake it). Simply put all the class methods in the main directory C:\DSP_ML.! FLouRe 6 When all the les in this project hove been added 0 ta of Pat 1, (Creut Cela 335, June 2038) then the Girectry structure CADSP ML should Tok ets If any reader knows a better solution to this Octave problem, please let us know and we will add it into the next article in the series. Thanks. The authors thank summer research student Maaz Khurram for helping with the article. © ‘ABOUT THE AUTHORS Mike Smith (Mike. Smith@ucalgary.ca) is a professor in Computer Engineering at the University of Calgary, Canada. When not singing with Calgary's adult recreational choir Up2Something, Mike's main interests are in developing new biomedical engineering algorithms and moving them onto multi-core and multiple-processor embedded systems in a systematic and reliable fashion, Mike has recently become a convert to the application of agile methodologies in the embedded environment. Mike was one of the notable contributors interviewed in Circuit Ceflar’s 25th Anniversary issue. Mai Tanaka (tanakarn@ucalgary.ca.) is a graduate student in Electrical and Computer Engineering at the University of Calgary, Canada. Her research interests include the transfer and translation of information at the human ‘machine interface such as speech processing, Ehsan Shahrabi Farahani (ehsan.shehrabi.f@gmail.com) has completed his Masters in Electrical and Computer Engineering at the University of Calgary, Canada. His research interests include digital signal processing, image processing, biomedical signal processing, medical data analysis, medical imaging and speech processing. Brian Millier hen I was a teenager starting out in electronics, T longed to have as much test equipment as possible. At that stage in life, I couldn’t afford much beyond a multimeter. I remember seeing plans for a component tester in an electronics magazine. There weren't many hobby electronics magazines back in the 605, so it was probably Popular Electronics. This tester would provide a “signature” of most passive/active components by placing a small AC voltage across the component and ‘measuring the resulting current. My memory of the circuit is hazy after all these years, but, it was trivial: a 6.3 V filament transformer, fa current sensing resistor and a few other passive components, However, the catch was that it required an oscilloscope to display the resulting voltage vs. current plot—in other words, the component's signature. By the time I bought an oscilloscope about 10 years later, I had completely forgotten about thi testing concept. Today, test instruments are available that include a dedicated graphics display, instead of relying on an oscilloscope for display purposes. Having worked with Arm microcontrollers over the last few years, I realized that I could implement such a free-standing tester using, in large part, just the internal MCU peripherals. In this article I'll describe how the tester operates, and how I implemented it using a Teensy 3.5 development module (containing an NXP MK64FX512VMD12 MCU) and featuring a FT800-based intelligent 4.3" TFT to screen display. BASIC THEORY OF OPERATION Toobtaina signature of agiven component, you need to place a variable voltage across it and measure the resulting current through it, at each voltage level. In many cases, the component's normal operating mode. will include both positive and negative voltages across it, so the tester must provide an AC voltage source, For most testing purposes you would use a sine wave voltage source because most AC calculations are done using sine waves. The value of this AC voltage source must be adjustable. T decided on six ranges between 0.5 V peak-peak and 20 V peak-peak. For measuring the voltage across the component, I used an instrumentation ‘amplifier with three hardware gain ranges— plus three additional ranges based upon scaling in software, To. monitor current, it's easiest to measure the voltage across 2 small value resistor placed in the ground return path, ‘and then convert that to current using Ohm’s Law. Here too you need a range of current measurements. I chose to provide three hardware ranges—plus four additional ranges based on software scaling—between mA and 100 mA. You can‘t just place an AC voltage of any given value across a component, and hope that the component will be able to handle that current without damage, You must place a resistor in series with the component to limit the current flow. That resistor may need to vary in value over several decades, depending on the component being tested. {In my tester, I provide 2 switchable resistor bank with values covering a 1,000:1 range in decade steps. Figure 1 is a block diagram of the basic tester circuitry. The user interface, touch- screen display and SD card data storage are not shown here. The MK64FXS12VMD12 MCU’s 12-bit DAC A provides a sine wave signal that varies between 0 and 1.2 V over the full AC cycle. The programmable attenuator is an SPI pot device with 12-bit resolution. C1 is @ decoupling capacitor, which shifts the (attenuated) unipolar DAC A output signal into a bipolar AC signal. This AC signal is amplified by a factor of 21 by an LM675 power amplifier IC. DAC B, along with some passive components, provide 2 software- adjustable offset voltage adjustment. The LM675 amplifier is needed to provide enough drive current to handle the higher current ranges—up to 100 mA. Both the voltage and current are monitored using Texas Instruments (TI) instrumentation amplifier ICs. These contain input protection circuitry good to 440 V. The various gains needed for both amplifiers are set by 196 resistors, which are switched by miniature reed relays. The instrumentation amplifier output voltages, representing voltage and current through the component Under test, are fed to the two 16-bit ADCs present in the NXP MK64FXS12VMD12 Arm MCU, The sine weve signal generated by the MCU can be set for frequencies of 20, 50,60, 100, 200 or 400 He. 12H DAC A L sisi ‘Si ranges pace (Orfset wohage aust Four ranges| Four ranges noc ‘Component Being tested (Curent sense aDce reoeter Thwee ranges . ‘ARM MU FIGURE. This bck agra f the AC gral generation and otag/Curent mantoring crc SIGNATURE ANALYSIS The basic premise of signature analysis is that you obtain 2 signature of a component that is of questionable condition, and then compare it with @ known-good component of the same value. Alternately, you can do the same comparison on a specific circuit node ‘on two identical circuit boards/assemblies. To this end, my tester contains an A/B selector switch, which allows you to hook up two sets of probes—one for the component under test, the other forthe reference (good) ‘component. When you switch between A and 8, one of the signatures will be displayed in pink and the other in yellow. Both are displayed simultaneously. The tester also allows you to enter a “project name,” after which you can save multiple signatures to the onboard SD card storage. Once. set of known-good signatures for various nodes of a circuit board have been measured and stored, you can use the third “FILE” position of the selector switch to display these stored signatures in sequence. For each stored signature that you display, you can measure the same node en the board under test, to see how they compare. To do signature analysis, there are two reasons why you must have some idea what you are measuring: 1. You should set the sine wave amplitude to something less than the maximum voltage that the component can tolerate without being destroyed. 2. You should set the current-limiting resistor value to something large enough FLouRe2 Tiss a close-up ofthe Teesy 35 development module used inthe projet othe right isthe bun SD Card socket, whieh [put to goad use regraming done via the miro-U5B connector to the fa ef to limit the current to less than what the component can tolerate without being destroyed, For most components, several combinations of 1 and 2 will be suitable. Alternately, you can start out with low voltages and a high current-limiting resistance and slowly increase the voltage and decrease the current-limit resistor until you can see some useful voltage vs. current plot on the screen. SOFTWARE DETAILS ‘As mentioned earlier, this project uses the NXP MK64FXS12VMD12 Arm MCU, This is a complex MCU with 144 pins and comes in an LOFP or BGA package. I can’t handle or ‘mount such a device on a PCB, so I've chosen to.use a PIRC Teensy 3.5 development module for this project. This Arm MCU is found on nil 4 FIGURES isi the complete shemale diagram ofthe Signature Arya. Due to thee board ADCIDACS en the hem MC, the compe the additonal anager neste ie ede the Teensy 3.5 module shown in Figure 2 I don't have experience with either NXP’s C toolchain or other professional Arm toolchains, such as those provided by Kiel and others. Ihave, however, done many projects using the Arduino IDE, which supports Atmel AVR/Arm MCUs, Espressif’s ESP8266/ESP32 and some other Arm devices. Using the Teensyduino plug-in for Arduino allows you to write programs for all four of the Teensy development boards, including the Teensy 3.5 used in this project. There are two advantages to doing development this way 1. The Arduino “Sketch” format is basically C++ code, with compiler pre-processors added to make it much easier for non- experts in C to get started. 2. There are probably more Arduino code libraries and peripheral drivers freely available online than for any other current MCU families. Most of these Arduino libraries have been ported to the NXP Arm MCUs found on the Teensy development boards—in large part by the owner of the company that produces the Teensy modules. To compile the code needed for this project, you need only install the Arduino IDE from the Arduino software site, and then go to PIRC’s website and download the Teensyduino plug-in. The links for both are provided on Circuit Cellar’s article materials webpage. The Teensy 3.5 development board comes with a built-in serial boot-loader/programmer that operates via the board-mounted USB port. The Teensy Loader application comes bundled with the Teensyduino plug-in mentioned above. The Teensy programming scheme works very well and is quick, However, what you don’t get with any of the Teensy Arm development boards is a hardware debugger of any sort (JTAG, SWD). This is a limitation of the Arduino IDE in general. Also, the SWD debug pins are not available on the Teensy 3.5 module. Both the Arduino IDE and the Teensyduino plug-in are available on all common operating systems: Windows, Mac OS X and Linux. Now that you are comfortable with the software requirements, let’s take a closer look at the hardware aspect of the project. ANALOG SIGNAL PATH Refer to Figure 3 for the schematic diagram of the project. Figure 4 shows the complete circuit, less the display and other front panel switches, and so on. The sine wave excitation signal is generated by the FIGURE + This shows all the ccutry, apa rom the TFT dspay anda few user cone Teensy 3.5's DAC A, and is a unipolar voltage in the range of 0 V to 1.2 V. To get the six different voltage ranges used, I pass this signal through a Microchip MCP41010 digital ppot which has a 10k nominal value and 12-bit resolution. The scaled unipolar sine wave is then passed through C3, which converts it into a bipolar sine wave signal. To provide enough current drive to handle the upper current range of 100 mA, I used an LM675 Power Operational Amplifier. The power rails for this amplifier are 412 V, which allow it to provide the #10 V needed for the highest voltage range. The digital pot taps are selected in software, in combination with the LM675's gain of 21, to produce the proper output voltages for the six voltage ranges. T used the Teensy’s DAC B, along with R1, R4, R5 and a negative 2.5 V regulator to implement a software offset zeroing function. T didn’t get too fancy with this. In my code is a subroutine Zero0f fset. When called, this routine ramps up DAC B from 0 to full-scale, while printing out that value to the USB serial port. While running Zero0ffset, I measure the LM675's output voltage, and when it is closest to zero, I check the DAC B value being displayed on the PC's serial monitor. I then use that value in Line 197 of ‘my prograr analoghrite(A22,1940 determined value any offset Wempirically DAC B to zero fron ane FIGURES ie the riche unt ports can be een at the back the lt ABOUT THE AUTHOR pirted ina Hammnd extruded alurinun encleure. The A ae 8 cant ide ofthe ence The properly scaled sine wave output of the LM675 is then passed through the current limiting resistor array (R7-R10). Relays K1- KG select which of the 100 9 through 100 k2 resistors are placed in the circuit. The sine wave signal then goes out to the component or circuit node under test. Switch SW3 selects between the “A” and "B” ‘output terminals or cables, allowing you to compare the signatures of two (identical) components: a known-good standard and a questionable one. SW3's third “FILE” position will be discussed later in this article. To measure the current through the component, a low-value current sense resistor is placed in the ground return path of the AC waveform signal. Because the Teensy’s ADC has a full-scale resolution of 1.2 V, I chose a 1.2 9 1% resistor for this purpose (R18). To measure the voltage across the ‘componentand the voltage dropped across R18, the current-sense resistor, I am using two TI instrumentation amplifiers: U2 and U3. These provide internal input protection circuitry up to +40 V. I happened to have one INA128 and fone INA121 in stock, so that’s what I used, but Brian Miller runs Computer Interface Consultants. He was an instrumentation engineer in the Department of Chemistry at Dalhousie University (Halifax, NS, Canada) for 29 years. either one could be used in both places. In the case of U2, the voltage instrumentation amplifier, I needed to be able to handle a 410 Vj, signal for the highest excitation voltage range. This, voltage is beyond what the INA121 amplifier can handle when powered by the #5 V rails. The instrumentation amplifiers have a ifferential input, so I used an 11:1 voltage divider on each ‘of the differential inputs. This voltage divider is switched in/out by relays KS and K6, depending on the voltage range being measured. Besides this 11:1 divider for the +10 V,,, range, the gain of the voltage amplifier is varied by R17/K7, giving additional gains of 1 or 2, Inthe case of U3, the current monitoring amplifier, there are three decade gain settings that are determined by R21, R22, R23 and relays KB and K9. Beyond these hardware gain settings, some scaling is also done in software, to supplement the hardware-selected ranges. Both voltage and current instrumentation amplifiers have their output reference pin set to 1,0 V (from voltage reference D6 and trim-pot TMP3). Therefore, with a differential input voltage of zero, the amplifier output will be 1.0 V. But each amplifier feeds a 10-turn trim-pot (TMP, TMP2), which attenuates its amplifier’s output by a factor of 1.66, giving an output of 0.6 V. This is exactly half of the 1.2 V full-scale measurement range of the Teensy 3.5's ADCs. Therefore, the bipolar input signals are level-shifted to the 0-1.2 V unipolar ADC measurement range. ‘The outputs of both U2 and U3 can swing close to the #5 V power rails. To protect the Teensy 3.5 against any negative input voltages or any beyond its 3.3 V power source, I used a protection network made up of two N914 diodes and a 1 k9 current limit, resistor at the output of both U2 and U3. REED RELAYS For all the analog switching, I decided to use tiny reed relays for several reasons. In some cases, the bipolar voltages were beyond that which could be handled by inexpensive analog switching ICs. Second, the current carrying capability was sometimes greater than the analog switching ICs could handle. Third, analog switching ICs have some amount of series resistance, which would be a problem in cases where they were being used to select a gain-set resistor for the instrumentation amplifiers. The Coto 9007- 5 relays I chose cost only $1.25 each, so the nine of them in this project were not a significant portion of the overall cost. ‘These Coto relays need only 10 mA to drive their coils, but that is still more current than the Teensy 3.5's GPIO pins can handle. Therefore, I used a TI PCF8S74AN I2C Port expander to drive all the relays. These Coto relays have built-in snubber diodes, so no external diodes were needed to protect the PCF8574AN against inductive spikes. Note: there are two versions of the TI PCFBS74. The “A version responds to a different °C address than the original version (and the Philips brand version, I believe), so use the “A” version. For power, I needed #12 V and 45 ¥. The Teensy 35's Arm MCU operates at 3.3 V, but the Teensy module itself contains an on-board 3.3 V regulator for this purpase. The Teensy 3.3 V regulator has enough extra current capacity to also supply the TFT touchscreen display module, which runs on 3.3 V and draws about 150 mA. The +5 V rail is provided by a 7805 three- terminal regulator through R25, a 15 9, 2 W resistor to reduce the heat dissipation in the 7805. Alternately, the 7805 could be heatsinked, and the resistor removed. The -5-V rail uses just @ 7905 three-terminal regulator. While I was developing the project, 1 supplied power to it from a + 12 V bench power supply. When I was finished, I planned to use a small #12 V switching power supply. I discovered a strange pricing anomaly in these modules. 12 ¥, single supply modules, capable of supplying 1.25 A, cost about $8. In contrast, a 412 V power supply— Froure epics sh the py whe pote, (8) asian refer capable of supplying only 400 mA—costs a minimum of $35. The dual output power supplies contain just a few additional parts to deliver the negative voltage source, so it's hard to explain the price discrepancy. Maybe the demand for #12 V supplies is smaller. Luckily, I had room in the enclosure to mount two 12 V, single supply modules, the second ‘of which T hooked up to provide the negative 12 V that was needed. THE USER INTERFACE Figure 5 shows the completed unit. To provide an easily-read display, I decided on a 4.3" color TFT touch-screen display module. This is large enough to simultaneously display both a window for the signature plot and a parameter window. Numerous parameters can be varied: excitation voltage, excitation frequency, current limiting resistor value and full-scale current measurement. The touch-screen capability is used within the parameter window. Touching this area switches the display from its default mode, where it displays the values of all parameters (Figure 6a), to a setting mode, where the user picks the parameter to adjust by selecting the appropriate button (Figure 6b). The rotary encoder is used to cycle through the various values for that parameter—the value is shown at bottom right of screen. The DONE button is pressed to return to the default “All Parameter Value” display. The true colors of the display are subdued in these photos. That's because the pictures were taken at an angle to eliminate glare. For this project I used FTDI's FTS00 intelligent TFT touch-screen display module, because I've had experience with it in past projects. These modules interface to the host MCU via SPI. They contain a powerful graphics engine chip that supports many high-level ‘graphics operations. The combination of the PI interface and this graphics engine allow for very fast graphics operations, with very little load on the host MCU. Also, the way that the FT800 graphics engine works eliminates the need for an SRAM screen buffer in the host MCU's memory space. The downside of the FT800 architecture is that doesn’t work at all like most TFT displays, in terms of how its display driver For detailed article references and additional resources go to: www.circuiteellar.com/art ‘RESOURCES FTOI Chip [wwveftdichip.com le-materials Mean Well | www.meanwellcom kroElektronika | wuw.mikroe.com Microchip Technology | vrvw.microchip.com NXP Semiconductors | waws.nxp.com IRC Store | ww.pjre.com Texas Instruments | www.ti.com software operates. Luckily, FTDI has written an Arduino-compatible library to run these displays. This library covers all the available text and graphics functions, and also handles the touchscreen at a relatively high level. This, Arduino library works perfectly well with the Teensy 3.5 module and Teensyduino plug-in. This library folder along with my program source code are available on the Circuit Cellar article materials webpage. I used a Mikroelektronika “Connect Eve” FT800 display, which I had on hand. However, FTDI's own VM800B43 module could also be used, with a few circuit changes to accommodate its 5 V power supply requirement, Source information for these on the article materials webpage also. OPERATION You can use the unit in a basic mode, where you just connect the Port A test leads to a component and observe the voltage (X-axis) vs. current (Y-axis) characteristics. Although you are applying an AC voltage, you would consider the RED test lead to be the positive one, if you want the display to show the positive voltage to the right of the center of the screen, and positive current in the upper half of the screen. Figure 6a was taken with a 0.1 pF capacitor attached. With an ideal capacitor (which this basically is), you would expect the voltage and current to be 90 degrees out of phase—that is, when the voltage is maximum, the current is zero, and vice-versa. The displayed ellipse demonstrates this relationship. Figure 6b shows the display with a common silicon fe attached. The applied voltage is shown as 2.0 V, but that is the peak-to-peak value so you can see that at approximately positive 0.45 V, the diode starts to conduct current, as expected. No current flows when the voltage is negative—that is, when the diode is reverse biased, Figure 6c is the display with a resistor connected. As expected, the voltage vs. current function is linear for both polarities, If you want to compare two components, hook one up to Port A and the other to Port B, Then use the port selector switch to switch between A and B. You will get two traces—one in yellow, one in pink—with both simultaneously showing on the screen. ‘The third mode of operation is the FILE mode, This mode compares one or more signatures on file to a component(s) attached to Port A. The file storage is provided by the Teensy’s on-board SD card socket/SD card. In this mode, you first press the NEW PROJECT button. Then you enter a project file name using the on-screen keyboard, as shown in Figure 7. This filename must be 8 FTImpl .Cmd_Keys(0, (FT_DISPLAVHEIGHT*0.31), FT_DISPLAYWIDTH, (FT_DISPLAYHEIGHT*0.112), font, 0, 1234567890"); FTImpl .Cmd_Keys(0, (FT_DISPLAYHEIGHT*0.44), FT_DISPLAYWIDTH, (FT_DISPLAYHEIGHT*0.112), font, 0, WERTYUIOP™) FT Impl . Cmd_Keys( (FT_DISPLAYWIOTH*O.042), (FT_DISPLAYHEIGHT*0.57), (FT_DISPLAYWIDTH*0.96) , (FT_DISPLAYHETGHT*0.112), font, 0, SDFGHIKL"); FT Impl .Cmd_Keys( (FT_DISPLAYWIOTH*0,125), (FT_DISPLAYHEIGHT*0,70),, (FT_DISPLAYWIDTH*0.73) , FTImp1 .Tag(Ox0d (FT_DISPLAYHEIGHT*0.112),, font, 0, “ZXCVBNM"); FTImpl .Cmd_Button(420, FT_DISPLAYHEIGHT - 82, 60, 30, 27, 0, “Enter”); FTImpl .Tag(0x08) ; ustine 1 1 keybourd redred using these few ns of ck or less characters, because only the standard DOS 8.3 file format can be used with the Teensy 3.5's SD card library. With the project now defined, you connect components or circuit nodes to Port A, one at a time, with the Selector switch in the A position. The front panel SAVE button is pressed to save that signature, and the encoder button is used to cycle through the various components/nodes that, you want to store, When you want to compare those stored node signatures to those from another identical circuit board, you first press the front panel LOAD button and enter the project name that, you previously defined. You use the rotary encoder to select Which node or component number you want to display. Then you put your Port A test leads on the corresponding component or node of the board you are currently testing. Both the stored and the current signature will be displayed in pink and yellow traces, respectively. Because of the geometry ofthe Teensy 3.5 and the Vector protobeard on which I mounted all the circuitry, it was not, possible to position the SD card socket such that SD cards, could be inserted or removed from the outside of the unit's, enclosure. Any common SD card can hold thousands of signatures, so that is no big problem. However, I didn’t get too fancy on the file operating system software, so there is no file directory display function available. Accordingly, you must remember or write down the filenames you have assigned to projects. I did add a piezo buzzer, which beeps if, while loading a project, you enter a filename that doesn't, exist. As a point of interest, the ASCII touch-screen keyboard shown in Figure 7 did not require extensive coding on my part thanks to the high-level routines available in the FT8OO graphics engine. The keyboard itself is rendered using the following few lines of code in Listing 2. Only 22 lines of code (Lines 1139-1160) are needed to detect the keys touched, display them at the top of the screen and return a string to the calling program, CONCLUSIONS In the past, I'd actually gotten pretty far along with a design for this project using an S-bit Atmel AVR MCU. It required considerable external circuitry to implement the mixed-signal (ADC, DAC) circuitry and the other analog circuitry. The AVR MCU wasn't fast enough to handle the sine-wave generation and display functions, except at a low sine wave frequency. In the end, I abandoned the idea The availability of the low-cost Teensy 3.5 with its powerful NXP Arm MCU made the project much more reasonable to build. The NXP MK64FX512VMD12 Arm MCU contains two 12-bit DACs and two fast 16-bit ADCs, which greatly simplified the external circuitry. Having the SD-card socket built-in was also a big plus. enjoy building my own test equipment, and I hope that. you enjoyed reading about it in this article, such as this book, designing a micro Monte demonstrates how Verilog hardware descrition language (HDL) enables you processors alloy By Bob Sgandurra, Pentek ners to create extremely sophisticate d technology has revamped the roles of both hardware and software engineers as well as how dealing with on-chip IP adds new layers of complexity. Mana ing FPGA Design Complexity Eset IP Inte le 19 FPGA hardware ver the past 35 years, there has been a constant progression of, technologies for performing digital signal processing. Some of these have taken the form of processors dedicated to the task of efficiently executing ‘complex math in parallel like the digital signal processors (DSPs) from Texas Instruments and Analog Devices and other specialized processors from various manufacturers. Another path has been to exploit the specialized processing engines inside more general-purpose processors from companies, like Intel and Motorola (now NXP), or to repurpose highly-parallelized processors like graphics processing units (GPU) for DSP applications like radar. In each of these examples, it’s been the software engineer's, job to create programs or applications for a fixed-hardware architecture. This might be accomplished by programming on the “bare metal” and accessing internal registers and resources of the processor directly, or through the window of an operating system which manages the processor's resources, FPGAs: A GAME CHANGER: This paradigm changed with the introduction of programmable logic devices and specifically with the advent of FPGAS. ‘An FPGA’s logic is a mesh of gates and interconnects that have no function until a logic design is loaded into the array connecting the gates to form circuits. Modern FPGAs can contain millions of logic gates and thousands of embedded DSP processors allowing FPGA hardware designers to create extremely sophisticated and complex application-spacific hardware functions. And this is where the job of the software engineer takes a turn. With fixed targets like Texas Instrument’s DSPs or Intel's processors the software engineer writes programs for a static and well-defined hardware architecture. However, with FPGAs, the functions and even the interfaces into the hardware can vary. The FPGA functions are determined by what logic design the FPGA engineer uses to configure the FPGA and this can change with different iterations of the design. IP from hardware equipment manufacturers tt incompatible signal interfaces tt incompatible signal interfaces Processing ‘components from IP vendors tt incompatible ‘signal interfaces In addition, FPGA logic design and the software to control it are intimately tied together. This relationship and the need to keep FPGA design changes and software changes in sync are a reality that must be managed in the design environment of sophisticated FPGA-based systems. And lets not forget the FPGA design engineer. With the complexity of high performance FPGAs, the task of the FPGA designer has also become increasingly mare demanding. The logic design—sometimes called Intellectual Property or IP—is typically created using either VHDL or Verilog hardware description languages. And while these languages are the cornerstone of FPGA design, new tools and design environments can improve design efficiencies, particularly when developing for very large FPGAs with millions of gates. And not only new tools, but some of the basic philosophy of how IP is defined is seeing a change. THE AXI4 STANDARD As the density of FPGA fabric becomes greater, the possibility of creating more and more sophisticated IP tends to increase. Often components of the IP design can come from multiple sources: From the FPGA manufacturer: Much of the IP needed to create the overall dataflow through the FPGA fabric and control for specific FPGA interfaces is typically included in the libraries provide by the FPGA manufacturer. In addition, common peripheral resources like SDRAM are typically supported by manufacturer provided tools to generate the required IP. From equipment manufacturers: Often the FPGA is part of a development or deployable hardware solution manufactured by a company other than the FPGA manufacturer. Inmany of these systems additional hardware like analog-to-digital and digital-to-analog converters are part of the overall design and IP to control these components should be provided by the hardware manufacturer. From an IP vendor: Specialized processing functions can be purchased as IP from companies and individuals who target specific applications. These are often delivered as encrypted cores or “black- boxes” where just the signals entering and leaving the processing block is exposed for the purchaser to connect to the rest of his IP. Custom IP created by you: In most designs, the bulk of the IP is usually created by the engineer responsible for designing the system. With IP coming from these different sources an immediate challenge is making sure the different IP components have similar signal interfaces enabling data to pass from fone block to the next (Figure 2) The FPGA manufacturers have addressed this by standardizing on a common signal interface specification. Borrowed from Arm processor technology, both Xilinx and Altera are using the Advanced eXtencible Interface (AXD). Now in its second generation (AXI4), is an open standard, on-chip interconnect specification for the connection of functional Reference IP. designs from FPGA manufacturers: t t t t Axia axis Axia Axia axle interface interiace interface interface interface Custom IP. ‘created by the system FIGURE sec Peck) For high-performance memory-mapped requirements, with the cost of using more FPGA resources to implement. For many applications the combination of AXI4-Lite and AXI4-Stream, described below, can provide similar performance at the cost of less FPGA providing high throughout, it does come vere For simple, lower-throughput memory-mapped communication. For example, read and write access to status and control registers. ceil Provides an interface for high-speed data streaming ‘RESOURCES Pentek | www.pentekcom pt hander signal requirement oo detailed article references and itcellar.com/article-materials blocks in system-on-chip designs. Now extended to all FPGA IP, it provides the common interface for IP from different sources to remain compatible, providing fa level of plug and play functionality not previously possible (Figure 2). In addition to providing IP compatibility from different sources, IP reuse is enabled when a block from a previous design can get reused in a new design. This is possible when the interface signals are the same for both the old and new designs. Overall productivity is also improved when developers need to learn only a single interface protocol for IP. For AXI4 to be useful as a universal standard, it must be flexible enough to handle different signal requirements for different types of processing and data moving IP. AXI4 accomplishes this by providing three different types of interfaces as shown in Table 1. ABOUT THE AUTHOR Robert Sgandurra serves as director of prod: uct management for Pentek’s DSP, data acqui- sition, digital receiver and software products where he's responsible for product definition, educating and presenting technical seminars to systems engineers and Pentek’s sales force on the latest product technologies. Prior to joining Pentek, his background included seven years in the medical electronics industry where he de- signed and managed projects for ultrasound im: aging. Robert joined Pentek in 1994, working as an application engineer and system integrator. tional resources sit type of processing ond dato moving IP A des this by proving thee diferent types of nterfoes To add further flexibility, FPGA manufacturer's like Xilinx provide cores for interconnecting AXI4 interfaces of ferent widths and speeds. The Xilinx AX Interconnect IP core can accept and connect AXI4 interfaces with data widths of 32, 64, 128, 256, 512 or 1024 bits and with different synchronous or asynchronous clock rates. BLOCK DIAGRAM DESIGN TOOLS ‘As IP designs become larger and more complex, the job of structuring and visualizing data flows and the hierarchy of the design becomes an increasing challenge. Both Xilinx and Altera have addressed this in recent versions of their Vivado and Quartus Prime tools. For this example, we'll look at Xilinx Vivado and the included IP Integrator tool (Figure 3). Building on the foundation of the common signal interface provided by AXI4, IP Integrator allows IP to be “packaged” into blocks that can be interconnected on a graphical design canvas, Because AXI4 is the common interface used to create the signals entering and leaving the blocks, much of the wiring details can be abstracted leaving the interconnects to become “wires” that can be “drawn” between signal ports on each block. As described earlier, Xilinx’s AXI Interconnect IP core handles buses of different widths and speeds, further simplifying the interconnection of blocks that might otherwise not be compatible. A standard is only valuable when it’s accepted and used. As mentioned earlier, both Altera and Xilinx support AXI4 with ually all of Xilinx IP delivered in this format. A noticeable shift to AXI4 support ‘an also be seen from IP suppliers and hardware manufacturers with FPGA based products. Pentek, as a designer and manufacturer of ‘high-performance FPGA based data acquisition and processing products, has also embraced AXI4 and block iagram design. To edit the IP design of a Pentek product, an FPGA engineer opens Pentek’s Navigator FPGA Design Kit in Vivado. He or she then has immediate access to the product's entire FPGA design as a block diagram. Individual IP cores can be removed, modified, or replaced with custom IP to meet the application's processing requirements. Viewing the product's FPGA design as a block diagram enables the designer to see the products functions at a higher level and simplifies the design processes by working at the “interface” and not the “signal” level. If at any time a designer needs to work with the VHDL code directly, itis always accessible in a source window, as well as full on-line documentation of every Pentek IP core. SYNCHRONIZING IP & SOFTWARE Up to this point we've been looking mostly at FPGA IP and the challenges faced by IP designers. And while some processing done in FPGAs is fixed with no runtime interface that needs to be controlled or initiated for operation, much of the IP created for FPGAs looks like a piece of hardware with control and status registers. And just like a piece of hardware, software—running typically on a Windows or Linux based machine—is controlling the FPGA through an interface like PCIe or Ethernet. But as mentioned earlier, FPGAs and the very fluid nature of hardware designs created with FPGAs, creates a challenge for software engineers. During the development ofa project or product, the IP and the software to control it will often need to go through many iterations. From initial design to debugging and through feature changes and redesign, the jobs of the IP designer and software engineer are intimately tied together as changes in the IP require changes in the software. For a small project this can be the same person, but often—especially for large projects—there can be teams of IP and software engineers at work. Here again, the FPGA manufacturers have risen to the challenge and their latest tool offerings include features to generate templates from the IP design that can be used as the framework of software for control and status of the FPGA functions. The challenge can become greater when modifying existing IP and software. As an example, all Pentek products are delivered with a full suite of IP based functions. A typical Pentek product is built around a high- performance FPGA surrounded by additional hardware include analog to digital and digital to analog converters, hardware circuitry for generating and synchronizing clocks, SDRAM or SRAM memory, specialized optical interfaces, @ PCle and Ethernet interface and 50 on, At some level, each of these hardware features is controlled by a piece of IP in the FPGA. Add to that IP based data processing functions like, DMA engines, data acquisition and waveform generator engines, data tagging and metadata creation, FIR filters, digital downconverters and so on. Tom 2 eee ORF rma 2 Sait Bee > ‘PCDE por neler FIGURE Sonn here is design cerslting of IP blacks connected in lin IP Inegrtr, ricures " re NAVIGATOR FPGA Desion Ke To provide a complete product, all of the IP based functions need software libraries provided to control the IP. While some product users will be able to satisfy their system requirements with only te of IP functions provided, most will need to modify the supplied IP or ‘create custom processing for their application. With each change in IP, a software change is most likely required. Here Pentek has taken a very specific approach to help keep IP and software changes synchronized. The company’s Navigator Board Support Package (BSP) is the complimentary tool to the Navigator FPGA Design Kit. Designed to work together, every IP module function is matched to a complimentary BSP module (Figure 4). As change is made to an IP module, the matching BSP function can be easily found and the required change can be made in the software, IP PLUMBING WORK The I trary alo includes modes that are nt prt of the shipped hardware product but may be used by the IP Tei designer as needed. The OMA engines found in the library en it comes are an example. Consider an example where builtin board : functions stream dats rom an anaag to jal converter, to robotics, through some defale processing and out through the PCIe interface where itcan be sent toa computer for recording. The Userin that example may also ned to spit ofthe dat to eed some custom processing or analysis function. The Nevigetor feftehced Control Robotics simplifies TP brary includes a number of DMA engines for streaming data. The TP designe can “draw” ths block into the board desig, connecting between the existing data streaming path and is ust processing IP block, The Navigator Board Suppor Package includes BSP modes for controling these WAP males that can be turned nas needed Whe the P designer sll needs to creat a software function to contra hs or her custom processing I the Navigator tole provide much of the “plumbing” to enable the designers custom IP. fs ach new generation of FPGA grows in processing power and ogi density, the trends fr IP designs to grow larger and) mare complex expat the increasing haréware capable. While. thi constant migration towards higher density hardware and IP designs can deliver odvartages in overall size, cost and power, often complicates the IP developer and software engineer's job by requiring lrger and more complex IP and software dsigns. FPGA manufactures as well os IP vendors ond FPGA based product manufactures lke Pentek recognize this trend The industry wide adoptation ofan interface lke AXI4 can make the process of IP design and reuse more efficient, and individual innovations from manufactures in the FPGA space can help manage the Very Complex process of IP based design ensuite Eee Reet CeMue ia ce ee iat) Oe ea — > FPGA Solutions Evolve to Meet AI Needs Brainy System ICs Long gone now are the days when FPGAs were thought of as simple programmable circuitry for interfacing and glue logic. Today, FPGAs are powerful system chips with on-chip processors, DSP functionality and high-speed connectivity. By Jeff Child, Editor-in-Chief day's FPGAs have now evolved to the point that calling them “systems-on-chips” is redundant. It's now simply a given that the high-end lines of the major FPGA vendors have general-purpose CPU cores on them. Moreover, the flavors of signal processing functionality fon today’s FPGA chips are ideally suited to the kind of system-oriented DSP functions sed in high-end computing. And even better, they've enabled AI (Artificial Intelligence) and Machine Learning kinds of functionalities to be implemented into much smaller, embedded systems, In fact, over the past 12 months, most of the leading FPGA vendors have been rolling ‘out solutions specifically aimed at using FPGA technology to enable AI and machine learning in embedded systems. The two main FPGA market leaders Xilinx and Intel's Programmable Solutions Group (formerly Altera) have certainly embraced this trend, as have many of their smaller competitors like Lattice Semiconductor and QuickLogic, Meanwhile, specialists in so- called e-FPGA technology like Archonix and Flex Logix have their own compelling twist on FPGA system computing PROJECT BRAINWAVE Exemplifying the trend toward FPGAs facilitating AL processing, Intel's high- performance line of FPGAs is its Stratix 10 family. According to Intel, the Stratix 10 FPGAs are capable of 10 TFLOPS, or 10 trillion floating point oper (Figure 1). In May Microsoft announced its Microsoft debuted its Azure Machine Learning Hardware Accelerated Models powered by Project Brainwave integrated with the Microsoft Azure Machine Learning SDK. ‘Azure's architecture is developed with Intel FPGAs and Intel Xeon processors. Intel says its FPGA-powered Al is able to achieve extremely high throughput that can run ResNet-50, an industry-standard deep neural network requiring almost 8 billion calculations without batching, This is possible using FPGAs because the programmable hardware—including logic, DSP and embedded memory—enable any desired logic function tions per second FIGURE rguRe2 Vitex utraScale+ FPGAs provide a signal procesing barwidth at 21.2 TeraMACS. They deliver on 5 to 500 Mb of total on-chip iterated memory, is up te 8.GB oF HEM Ge memory densty with Inept n-package fer 460 GB/s memory band ‘RESOURCES Achronix | weww.achronix.com to be easily programmed and optimized for area, performance or power. And because this fabric is implemented in hardware, it can be customized and can perform parallel processing. This makes it possible to achieve orders of | magnitudes of performance improvements over traditional software or GPU design methodologies. In one application example, Intel cites an cffort where Canada's National Research Council (NRC) Is helping to build the next-generation ‘Square kilometer Array (SKA) radio telescope to be deployed in remote regions of South Africa and Australia, where viewing conditions ‘are most ideal for astronomical research. The SKA radio telescope will be the world’s largest radio telescope that is 10,000 times faster with image resolution 50 times greater than the best radio telescopes we have today. This increased resolution and speed results in an enormous amount of image data that is generated by these telescopes, processing the equivalent of a year's data on the Internet every few months. NRC's design embeds Intel Stratix 10 SX FPGAs at the Central Processing Facility located Flex Logix Technologies | www flex-logix.com Inte! PSG (formerly Altera) | wwvw.altera.com Lattice Semiconductor | worwlatticesemi.com Quicklogic | wmw.quicklogic.com Xilinx | www.ilinx.com at the SKA telescope site in South Africa to perform real-time processing and analysis of collected data at the edge. High-speed analog transceivers allow signal data to be ingested in real time into the core FPGA fabric. After that, the programmable logic can be parallelized to execute any custom algorithm optimized for power efficiency, performance or both, making FPGAs the ideal choice for processing massive amounts of real-time data at the edge. ‘ACAP FOR NEXT GEN For its part, Xilinx’s high-performance product line is its Virtex UltraScale+ device family (Figure 2). According to the company, these provide the highest performance and integration capabilities in a FinFET node, including the highest signal processing bandwidth at 21.2 TeraMACs of DSP compute performance. They deliver on-chip memory density with up to 500 Mb of total on-chip integrated memory, plus up to 8 GB of HBM Gen2 integrated in-package for 460 GB/s of memory bandwidth. Virtex UltraScale+ devices provide capabilities with integrated IP for PCI Express, Interlaken, 100G Ethernet with FEC and Cache Coherent Interconnect for Accelerators (CC1X). Looking to the next phase of system performance, Xilinx in March announced its strategy toward a new FPGA product category it calls its adaptive compute acceleration platform (ACAP). Touted as going beyond the capabilities of an FPGA, an ACAP is a highly integrated multi-core heterogeneous compute platform that can be changed at the hardware level to adapt to the needs of 2 wide range of applications and workloads. ‘An ACAP’s adaptability, which can be done dynamically during operation, delivers levels of performance and performance per-watt that is unmatched by CPUs or GPUs, says Xilinx. ‘An ACAP is well-suited to accelerate 2 broad set of applications in the emerging era of big data and artifical intelligence. These include: video transcoding, database, data compression, search, AI inference, genomics, machine vision, computational storage and network acceleration. Software and hardware developers will be able to design ACAP- based products for end point, edge and cloud applications. The first ACAP product family, codenamed “Everest,” will be developed in TSMC 7 nm process technology and Xilinx says it will tape out later this year ‘An ACAP will provide @ new generation of FPGA fabric with distributed memory and hardware-programmable DSP blocks on 2 multicore SoC. This includes one or more software programmable—yet hardware adaptable—compute engines, all connected through a network on chip (NoC). An ACAP will also provide highly integrated programmable 1/0 functionality, ranging from integrated hardware programmable memory controllers, advanced SerDes technology and leading- edge RF-ADC/DACs, to integrated High Bandwidth Memory (H8M) depending on the device variant. Software developers will be able to target ACAP-based systems using tools like C/C++, OpenCL and Python. An ACAP can also be programmable at the RTL level using FPGA tools. Xilinx says Everest is expected to achieve 20x performance improvement on deep neural networks compared to today's latest 16 nm Virtex VU9P FPGA. MACHINE LEANING FOR IoT Also, in line with the AI trend, back in May Lattice Semiconductor unveiled Lattice sensAl, a technology stack that combines modular hardware kits, neural network IP cores, software tools, reference designs and custom design services. Itis aimed at accelerating the integration of machine learning inferencing into broad market IoT applications. Lattice’s sensAl includes solutions optimized for ultra low power consumption—from under 1 ml to 1 W. Package sizes range from 5.5 mm? to 100 mm? and interfaces available include MIPI CSI-2, LVDS, GigE and more, The sensAl stack (Figure 3) includes the ECPS device-based Video Interface Platform (VIP), including the Embedded Vision Development Kit and iCE40 UltraPlus device- based Mobile Development Platform (MDP). It also includes IP cores such as a Convolutional Neural Network (CNN) accelerator and Binarized Neural Network (BNN) accelerator. Software tools include a neural network compiler tool for Caffe/TensorFlow to FPGA, Lattice Radiant design software and Lattice Diamond design software. Reference designs are provided for face detection, key phrase detection, object counting, face tracking and speed sign detection. The solution is also supported by an ecosystem of design service partners that enable custom solutions for broad market applications, including smart home, smart city and smart factory. COLLABORATIVE AI SOLUTION QuickLogic meanwhile is likewise targeting Al at the edge. In May, the company ricue3 The sere tack nce IP cores uch a a Convolutional Ne WN launched its QuickA platform for endpoint AI applications. The QuickAI platform features technology, software and toolkits from General Vision, Nepes, SensiML and QuickLogic (Figure 4). Together these forge a tightly-coupled ecosystem aimed at solving challengesassociated with the implementation of AI for endpoint applications. The QuickAI is based on General Vision's NeuroMem neural network IP, which has been licensed by Nepes and integrated into the Nepes Neuromorphic NM500 AI learning device. Both General Vision and Nepes provide software for configuring and training the neurons in the network. In addition, SensiML. provides an analytics toolkit to quickly and easily build smart sensor algorithms for endpoint IoT applications. General Vision’s technology enables on- chip exact and fuzzy pattern matching and learning using a scalable parallel architecture of Radial Basis Function neurons. The parallel architecture results in fixed latency for any number of neurons and a very low, energy efficient operating frequency. General Vision supplies the Knowledge Builder tool suite and SDK used to train and configure the neurons. in a NeuroMem network. The Nepes Neuromorphic NMSOO implements the NeuroMem technology in an energy efficient, small form factor component. This Al-enabling component can be trained in the field to recognize patterns in real time, and multiple devices can be chained to provide any number of neurons. In addition werk C8) aceertr and Binaries cekrstor Software tls incude a ner etvork completo for Cll Terearflw to PGA, Lattice Rodan design snare an! Lattice Diamond desig store, FIGURE The QuikAl platfrm ecudes the (Ula HO with £05 53, oo NOD sceckrometr, gyroscope BLE device, ast memory, ond an Intl Edson-compatte connector that alle acces: to Edeon daughter board uch =u. to the NM500, Nepes supplies the Knowledge Studio software tools used for configuring and training the neurons in the NM500 device. SensiML complements the General Vision/ Nepes technology by providing the SensiML Analytics Toolkit, which simplifies the task of {generating endpoint AI solutions by providing tools that automate the management of training data, optimize the choice of feature extraction algorithms, and automate code «generation for the resulting AT solution. QuickLogic’s piece of the QuickAI package is the EOS $3 voice and sensor processing platform. It provides an ultra-low power, sophisticated audio and sensor processing ‘and embedded FPGA as the host for the NMS0O and the software that implements AAI solutions using the NM500. The platform includes the QuickAT Hardware Development Kit (HDK) with EOS $3, two NM500 devices, accelerometer, gyroscope, magnetometer, microphones, Nordic Bluetooth Low Energy device, flash memory, and an Intel Edison- ‘compatible connector that allows access to Edison daughter boards such as uSD. FPGA ADVANTAGES As the leading FPGA vendors advance to higher levels of performance, a parallel trend is happening on the Embedded FPGA (eFPGA) side of the market. According to Flex Logix, ‘an eFPGA is an IP block that allows an FPGA to be incorporated in an SoC, MCU or any kind of IC. It's the connectivity aspects of eFPGA that’s perhaps most compelling. In an FPGA chip, the outer rim of the chip consists of a combination of GPIO, SERDES and specialized PHYs such as DDR3/4. In contrast, an eF PGA is an FPGA fabric without the surrounding ring of GPIO, SERDES and PHYs. Instead, an eFPGA connects to the rest of the chip using standard digital signaling, enabling very wide, very fast on-chip interconnects. Flex Logix provides an Al-based eFPGA core called EFLX4K AI. The core evolved from its exiting DSP eFPGA core, According to the company, the EFLX4K DSP core turns out to have _as many or generally more DSP MAC's per square millimeter relative to LUTs than other eFPGA and FPGA offerings, but the MAC was designed for digital signal processing and is overkill for AL requirements. But AI doesn’t need a 22 x 22 ‘multiplier and doesn’t need pre-adders or some of the other logic in the DSP MAC. With that in mind, Flex Logix architected a new member of the EFLX4K family, the EFLX4K AI core, optimized for deep learning. Tt has over 10x the GigaMACs/s per square mm of the EFLX4K DSP core, The EFLX4K AI core can be implemented on any process node in 6-8 months on customer demand and can be arrayed interchangeably with the EFLX4K Logic/DSP cores. A single EFLX4K AL core has the same number of inputs/outputs of all cores in the EFLX4K family: 632 in and 632 out, each with an optional flip-flop. eFPGAS FOR MACHINE LEARNING Meanwhile Achronix Semiconductor offers both standalone FPGA and eFPGA products. For its high-performance eFPGA offering, Archonix provides its Speedcore technology. The benefit of delivering FPGA technology as an embedded solution is that it can be customized to meet the specific requirements of the target system. For example, in machine learning systems, the compute engine can be 2 fix point DSP function, floating point DSP function or @ massively parallel engine for convolution neural networks. Designed specifically to be embedded in SoCs and ASICs, Speedcore IP is a fully permutable architecture technology that can be built with densities ranging from less than 10,000 look-up-tables (LUTs) up to two- million LUTs plus large amounts of embedded memory and DSP blocks (Figure 5). Speedcore IP can be customized with unique memory architectures or specialty functions like distributed TCAMS. Achronix's Speedcore FPGA technology is inherently a more secure solution than a multiple chip solution. That's because the communication between the FPGA and the core SoC is at the silicon level and cannot easily be probed or interrogated to decipher interaction signals. Back in January, Achronix announced that it completed ullsilicon verification ofits Speedcore FPGA production validation chip built on TSMC 16 nm FinFET+ process technology. Rigor bench and ATE tests were completed across full operating conditions to verify the complete functionality of the Speedcore tt Speedcore silicon validation device we using the Speeedcorel6t Validation board, a platform available to potential customers to Fully 2 number of application-specific i that operate at 500 MHz and ‘an be run through the companion ACE design tool suite to evaluate Speedcore capabilt well as proof-of-concept exploration. FPGAs have a come a long way since the days when they were basically a sideline, peripheral technology. Today, embedded system develop craft their entire designs with an FPGAs as the computing core of the system. And with the recent trend of AL and machine learning capabilities migrating to FPGAs, they've even eclipsed microprocessors ‘as the computing technology of choice for many cutting-edge applications. Speed up your PCB design verification Learn More in this FREE White Paper eee MCUs)and Processors)Vie.for, Embedded, iMindshare iy A eC ea neces CE Reed ee eRe ne toe i ee Ren ac ecu Core rid many choices in both catégories but the dividing line between Cie woken aa By Jeff Child, Editor-in-Chief tone time the world of microcontrollers and the world of microprocessors were clearly separate. That's slowly changed over the years as the high-performance segment of microcontrollers have become more powerful. And the same time, embedded processors have captured ever more mindshare and market share that used to be exclusively owned by the MCU camp. The lines blurred even further once most all MCUs started using Arm- based processor cores, All the leading MCU vendors have a high-performance line of products, some in the 200 MHz and up range. Moreover, some application-specific MCU offerings are designed specifically for the performance needs of a particular market segment—automotive being the prime ‘example. In some cases, these high end MCUs are vying for design wins against embedded processors that meet the same size, weight and power requirements as MCUs. In this article, we'll examine some of the latest and greatest products and technologies on both sides. HIGH PERFORMANCE MCU ‘An example of an MCU vendor's high-performance line of products is Cypress Semiconductor's FM4, FM isa portfolio, of 32-bit, general-purpose, high performance MCUs based fon the Arm Cortex-M4 processor with FPU and DSP functionality. FM4 microcontrollers operate at frequencies up to 200 MHz and support a diverse set of on-chip peripherals, for motor control, factory automation and home appliance applications. The portfolio delivers low-latency, reliable, machine-to-machine (M2M) communication required for Industry 4.0 using network-computing technologies to advance design and manufacturing The FM4 MCU supports an operating voltage range of 2.7 V to 5.5 V. The devices incorporate 256 KB to 2MB flash and up to 256 KB RAM. The fast flash memory combined with a flash accelerator circuit (pre-fetch buffer plus instruction cache) provides zero-wait-state operation up to 200 MHz, A standard DMA and an additional descriptor-based DMA (DSTO), each with an independent bus for data transfer, can be used to further, offioad the CPU. Figure 1 shows the F"4-216-ETHERNET, development platform for developing applications using the Arm Cortex-M4-based FM4 SGE2CC MCU. The high-performance line of MCUs from ST Microelectronics is its STM32H7 series. An example product from that series is the STM32H753 MCU with Arm’s highest-performing embedded core (Cortex-M7). ‘According to ST Micro it delivers a record performance of 2020 CoreMark/856 DMIPS running at 400 MHz, executing code from embedded flash memory, Other innovations and features implemented by ST further boost performanceThese include the Chrom- ART Accelerator for fast and efficient graphical user- interfaces, a hardware JPEG codec that allows high-speed image manipulation, highly efficient Direct Memory Access. (OMA) controllers, up to 2 MB of on-chip dual-bank flash memory with read-while-write capability, and the Ll cache allowing full-speed interaction with off-chip memory. Multiple power domains allow developers to minimize the energy consumed by their applications, while plentiful Os, communication interfaces, and audio and analog peripherals can address a wide range of entertainment, Femote-monitoring and control applications. Last year STMicro announced its STM32H7 high- performing MCUs are designed with the same security concepts as the Platform Security Architecture (PSA) from Arm announced at that time. This PSA framework on the STM32H7 MCUs are combined with STM32- family enhanced security features and services. ST's, STM32H7 MCU devices integrate hardware- based security features including a True Random-Number Generator (TRNG) and advanced cryptographic processor, which will simplify protecting embedded applications and global [oT systems against attacks like eavesdropping, spoofing or man-in-the- middle interception. MCU RUNS LINUX OS ‘One dividing line that remains between MCUs and microprocessors is their ability to run major operating systems. While most embedded processors can run OSes like Linux, most MCUs lack the memory architecture required to do so. Breaking that barrier, in February MCU vendor Microchip Technology unveiled a System on Module (SOM) featuring the SAMASD2 microprocessor. The ATSAMASD27-SOML contains the recently released ATSAMASD27C-DIG-CU System in Package (SiP) (Figure 2). ‘The SOM simplifies design by integrating the power management, non-volatile boot memory, Ethernet PHY and high-speed DDR2 memory onto a small, single-sided PCB, There is @ great deal of design effort ‘and complexity associated with creating an industrial-grade MPU-based system running a Linux operating system. The SOM integrates multiple external components and eliminates key design challenges around EMI, ESD and signal integrity The Arm Cortex-AS-based SAMASD2 SiP, mounted on the SOM PCB or available separately, integrates 1 Gbit of DDR2 memory, further simplifying the design by removing the high- speed memory interface constraints from the PCB. The impedance matching is done in the package, not manually during development, so the system will function properly at normal and low- speed operation. Three DDR2 memory sizes (128 Mb, 512 Mb and 1 Gb) are available for the SAMASD2 SiP and are optimized for bare metal, RTOS and Linux implementations respectively. Users developing Linux-based applications hhave access to the largest set of device drivers, middleware and application layers for the embedded market at no charge. All ‘of Microchip's Linux development code for the SiP and SOM are mainlined in the Linux communities, This results in solutions where customers can connect external devices, for which drivers are mainlined, to the SOM and SIP with minimal software development. MCUs FOR AUTONOMOUS CARS ‘Automotive is one particular area where MCU vendors have tailored high-performance products to meet specific needs. Along such lines, in March Renesas Electronics announced the sample shipment of the industry's first ‘on-chip flash memory microcontroller using a 28 nm process technology, Aimed at the realization of next-generation green cars and autonomous vehicles with higher efficiency and higher reliability, the RH8SO/E2x Series MCU incorporates up to six 400 MHz CPU cores (Figure 3). According to Renesas, that makes it the first on-chip flash memary automotive MCU. to achieve processing performance of 9,600 MIPS, The new MCU series also features a builtin flash memory of up to 16 MB as well as enhanced security functions and functional safety. Under Renesas Autonomy, an open, trusted platform for assisted and automated driving, Renesas provides end-to-end solutions that advance the evolution of vehicles towards next-generation green cars, connected cars and autonomous-driving vehicles. There are two main pillars of the Renesas Autonomy Platform. One is this new 28 nm automotive control MCU. And the other is the R-Car Family fof SoCs designed for cloud connectivity and sensing, FIGURE? rid opti or are meta FIGURE 3 erates up to sx 400 Mie CP In another example of high-performance MCU technology aimed at the automotive space, in June NXP Semiconductors announced a new family of high-performance safe microprocessors to control vehicle dynamics in next-generation electric and autonomous vehicles. The chip so blurs the lines between MPUs and MCUs that NXP refers to it as a microprocessor/ microcontroller. The new NXP $32S device will manage the systems that accelerate, brake and steer vehicles safely, whether under the direct control of a driver for an autonomous vehicle's control. ‘The 800 MHz NXP $325 processor uses an array of the new Arm Cortex-R52 cores, which Integrate the highest level of safety features of any Arm processor. The array offers four fully independent ASIL D capable processing paths to support parallel safe computing. In addition, the $32S architecture supports a f safety core ‘RESOURCES: AMD | wwww.amd.com Cypress Semiconductor | www cypress.com Infineon Technologies | www.infinecn.com Intel | vans. intel.com Microchip Technology | wwvw.micrachip.com NXP Semiconductors | wivwunap.com Renesas Electronics | ww.renesas.com STMicroelectronies | wwww.st.com Texas Instruments | wiv.t.com new “fail availability” capability allowing the device to continue to operate after detecting and isolating a failure—a critical capability for future autonomous applications. AI PLATFORM SUPPORT In January Infineon Technologies meanwhile expanded its safe automated driving collaboration with NVIDIA, announcing that its AURIX TC3xx series automotive MCU (Figure 4) willbe used in the NVIDIA DRIVE Pegasus Al car ‘computing platform. This supercomputer for autonomous vehicles meets the requirements ‘of Level 5 autonomous driving as defined by the Society of Automotive Engineers (SAE). Infineon now supplies the safety MCU, safety power supply IC, and selected vehicle ‘communication interface ICs for several NVIDIA DRIVE systems. The devices support increasing levels of autonomous driving capability, ranging from auto cruise functionality to auto chauffeur and full autonomy. The collaboration enables users of the platform to access AURIX ‘capabilities through an AUTOSAR-compliant software stack. The multicore AURIX MCUs help the platform meet the highest possible functional safety standard (ISO 26262 ASIL-D) for Advanced Driver Assistance Systems (ADAS) and self- driving systems. Key features of the AURIX MCUs are relevant to implementing both ADAS and Automated Driving (AD) functionality. That includes advanced support for ASIL-D applications assisted by more than 3,000 DMIPS of safety computational performance, self-test mechanisms in hardware for logic and memery, integrated menitoring and redundant Peripherals, PROCESSORS FOR CAR COCKPITS For its part, Texas Instrument provides the “Jacinto” femily of processors that support a variety of automotive digital cockpit applications including infotainment, head unit co-processing for infotainment, informational ADAS, integrated digital cockpit, digital instrument cluster, head-up display and more. Designed for automotive safety and robustness, the “Jacinto” heterogeneous architecture includes hardware firewalls, allows separation between High Level OS (HLOS) and safety OS as well as implementation of robust multi-domain software architecture capable to be ASILS safety certified. The scalability of the “Jacinto” family allows developers to target specific performance for their applications and leverage headroom using higher performance variants, if required, without software modifications or hardware changes. The “Jacinto 6" family of processors are all built on the same architecture, offering software and hardware compatibility with the broadest array of highly scalable Arm Cortex-A15 cores for automotive applications Coming from the microprocessor side of the fence, in May Intel, Google and Volvo Cars debuted the latest Android (P) Operating System (0S) running Google applications. Powered by an Intel Atom automotive system- ‘on-chip (SoC), Volvo Cars demonstrated the latest in-vehicle infotainment (IVI) experiences in a prototype Volvo XC40. Advanced features include voice recognition via Google Assistant, access to the Android app ecosystem through Google Play Store and the ability to use Google Maps natively in the car’s IVI system, According to Intel, in order to realize the Full potential of IVI, automakers needed to move away from proprietary software solutions. This required an operating system and applications that could be easily customized, updated and scaled. In 2015, Google enabled this by extending its Android OS for the IVI market. With Android running on Intel Atom. automotive SoCs, automakers get optimized processing, the latest software versions and security features. The SoC debuted in a commercially-available vehicle powered by Android. This enabled consumers access to the latest automotive Android apps like Google Maps and Google Assistant for hands-free driving assistance. PROCESSORS FOR EMBEDDED While Intel and AMD continue to go head to head in the desktop space, both companies offer rich technologies used by embedded applications. While many of these include processors designed not necessarily created for embedded systems, many of them wind up there. Occasionally they do roll out processors aimed directly at the embedded space. In an example along those lines, in February AMD introduced two new product families—the AMD EPYC Embedded 3000 processor and AMD Ryzen Embedded V1000 processor. The EPYC Embedded 3000 is designed for a variety of new markets including networking, storage and edge computing devices, while AMD Ryzen Embedded V1000 targets medical imaging, industrial systems, digital gaming and thin clients, The EPYC Embedded 3000 is a highly scalable processor family with designs ranging from four cores to 16 cores, available in single- thread and multi-threaded configurations. ‘Support for thermal design power (TDP) ranges from 30 W to 100 W. Its expansive, integrated YO includes support for up to 64 PCle lanes and up to eight channels of 10 GbE. The chip also has up to 32 MB shared L3 cache with up to four independent memory channels. The Ryzen Embedded V1000 is an ‘Accelerated Processing Unit (APU) coupling high-performance Zen CPUs and Vega GPUs on a single die, offering up to four CPU cores/ eight threads and up to 11 GPU compute units to achieve processing throughput as high as 3.6 TFLOPS (Figure5). Support for TOP ranges from 12 W to 54 W. 1/0 capabilities support up to 16 PCIe lanes, dual 10 GbE and several USB options, including up to four USB 3.1/ USB-C interconnects, with additional USB, SATA and NVMe support. The chip drives up to four independent displays running in 4k, with the ability to support SK graphics and includes support for W.265 decode and encode and VPS decode. The processor sports dual-channel 64-bit DDR4, with performance up to 3,200 M/s. © TRACE32° Debugger for RH850 CR UT EMR S69 DEBUGGING aE eT i bs = OL aed LL] us (Performance Counter) vwewuclauterbach com/1701 COTS (180 26262) oad Tracing CEE Ne ceo LAUTERBACH, i» P = ay pe Pry ad cd Bm sree CB err ‘An amazing amount of computing functionality can be squeezed on to a small form factor board these days. These tiny-sized board-level products meet the needs of applications where extremely low SWaP (size, weight and power) beats all other demands. By Jeff Child, Editor-in-Chief he magic of semiconductor integration means that tremendous functionality can be squeezed onto one or a handful of chips. And a happy consequence of that trend Is that board-level computers can now occupy extremely small form factors. Many of these are non-standard form factors. Non-standard form factors free designers from the size and cost overheads associated with including a standard bus or interconnect architecture That said, standard form factors such as SMARC and COM Express Type 10 Mini are within the size range of this “tiny” category of embedded processor boards. In very small systems, often the size ‘and volume of the board takes precedence ‘over the need for standards, Instead the priority is on cramming as much functionality and compute density onto single board solution. And because they tend to be literately “single board” solutions, there's often no need to be compatible with multiple companion 1/0 boards. These tiny form factor boards seem to be targeting very different applications areas—areas where slot- FIGURE. COM Exgress 1s used in Opts 8 patented mabe vison creeng device fer vison teing card backplane or PC/104 stacks wouldn’t be practical. Interestingly, most of the boards in this article's product gallery are based on processors like NXP's i.MX7 and some with the i.MXBM SoC. ‘An example application that uses tiny embedded processor board technology is Optovist (Figure 1), a patented mobile vision screening device for vision testing developed by Vistec AG. First and foremost a small form factor with a low power dissipation processor technology was required. That's because the Optovist vision screening device had to keep toa small footprint (39 mm x 24 mm x 44 cm) in order to be really manageable and portable, Requirements also included light weight and a high level of robustness without vulnerable fans for year-long, reliable mobile usage. Apart from the ultra-small format, the new hardware had more than anything to deliver good performance. Fast and precise graphics performance is an absolute necessity to vision testing. The designers chose a Kontron COM Express mini module, the COMe-mSPi, for the Vistec’s Optovist system.

You might also like