Trade Resources Industry Views Competition Between DSP and FPGA

Competition Between DSP and FPGA

In considering the design option for DSP vs.FPGA it is helpful to compare both architectures in a FIR filter application,writes Reg Zatrepalek One of the most widely used digital signal-processing elements is the finite impulse response,or FIR,filter.Designers use filters to alter the magnitude or frequency content of a data signal,usually to isolate or accentuate a particular region of interest within the sample data spectrum.In this regard,you can think of filters as a method of preconditioning a signal.In a typical filter application,incoming data samples combine with filter coefficients through carefully synchronised mathematical operations,which are dependent on the filter type and implementation strategy,and then move on to the next processing stage.If the data source and destination are analogue signals,then the samples must first pass through an analogue-to-digital(A/D)converter,and the results fed through a digital-to-analogue(D/A)converter.The simplest form of a FIR filter is implemented through a series of delay elements,multipliers and an adder tree or chain.You can think of the terms in the equation as input samples,output samples and coefficients.If S is a continuous stream of input samples and Y is the resulting filtered stream of output samples,then n and k correspond to a particular instant in time.Thus,to compute the output sample Y(n)at time n,a group of samples at N different points in time,or s(n),s(n-1),s(n-2),…s(n-N+1),is required.The group of N input samples is multiplied by N coefficients and summed together to form the final result,Y.

DSP Versus FPGA

A block diagram for a simple 31-tap FIR filter(length N=31).Various design tools are available to help select the ideal length of a filter and the coefficient values.The goal is to select the appropriate parameters to achieve the required filter performance.The most popular design tool for choosing these parameters is MATLAB.Once you have selected the filter parameters,the implementation follows the mathematical equation.The basic steps for implementation of an FIR filter are:1.Sample the incoming data stream.2.Organise the input samples in a buffer so that each captured sample may be multiplied by each filter coefficient.3.Multiply each data sample by each coefficient and accumulate the result.4.Output filtered result.

A typical C program for implementing this FIR filter on a processor using a multiply–accumulate approach is shown in the code below./**Capture the incoming data samples*/datasample=input();/**Push the new data sample onto the buffer*/S[n]=datasample;/**Multiply each data sample by each coefficient and accmulate the result*/y=0;for(i=0;i<N;i++){y+=k[i]*S[(n+i)%N];}n=(n+1)%N;/**Output filtered result*/output(y);

The implementation illustrated in Figure 3 is known as a multiply-and-accumulate or MAC-type implementation.This is almost certainly the way a filter would be implemented in a classical DSP processor.The maximum performance of a 31-tap FIR filter implemented in this fashion in a typical DSP processor with a core clock rate of 1.2GHz is about 9.68MHz,or a maximum incoming data rate of 9.68Msamples/s.An FPGA,on the other hand,offers many different implementation and optimisation options.If a very resource-efficient implementation is desired,the MAC engine technique may prove ideal.Using a 31-tap filter as an example illustrates the impact of filter specifications on required logic resources.Memory is required for data and coefficient storage.This may be a mixture of RAM and ROM internal to the FPGA.RAM is used for the data samples and is implemented using a cyclic RAM buffer.The number of words is equal to the number of filter taps and the bit width is set by sample size.ROM is required for the coefficients.In the worst case,the number of words will be the same as the number of filter taps,but if symmetry exists,this may be reduced.The bit width must be large enough to support the largest coefficient.A full multiplier is required since both the data sample and coefficient data change on every cycle.The accumulator adds the results as they are produced.The capture register is needed because the accumulator output changes on every clock cycle as the filter is sampling data.Once a full set of N samples has been accumulated,the output register captures the final result.

DSP Versus FPGA_1

When used in MAC mode,the DSP48 is a good fit.The input registers,output registers and adder unit are present in the DSP48 slice.The resources required for this 31-tap MAC engine implementation are one DSP48,one 18kbit block RAM and nine logic slices.There are a few additional slices required for sample and coefficient address generation and control.If a 600MHz clock were available in the FPGA,this filter could run at an input sample rate of 19.35MHz,or 19.35Msamples/s in the FPGA.If the system specification required a higher-performance FIR filter,a parallel structure could be implemented.The Direct Form I filter structure provides the highest-performance implementation within an FPGA.This structure,which is also commonly referred to as a systolic FIR filter,uses pipelining and adder chains to exploit maximum performance from the DSP48 slice.The input is fed into a cascade of registers that acts as the data sample buffer.Each register delivers a sample to a DSP48 which is then multiplied by the respective coefficient.The adder chain stores the partial products that are then successively combined to form the final result.No external logic is required to support the filter and the structure is extendable to support any number of coefficients.This is the structure that can achieve maximum performance,because there is no high-fanout input signal.The resources required to implement a 31-tap FIR filter are only 31 DSP48 slices.If a 600MHz clock were available in the FPGA,this filter could perform at an input sample rate of 600MHz,or 600Msamples/s in the FPGA.From this example,you can clearly see that the FPGA not only significantly outperforms a classic digital signal processor,but it does so with much lower clock rates(and therefore lower power consumption).This example illustrates only a couple of implementation techniques for FIR filters in FPGA.The device may be further tailored to take advantage of data sample rate specifications that may fall in between the extremes of sequential MAC operation and full parallel operation.You may also consider additional trade-offs between performance and resource utilisation involving symmetric coefficients,interpolation,decimation,multiple channels or multirate.

Deciding between traditional DSP and FPGA

If the system sample rate is below a few kilohertz and is a single-channel implementation,the DSP may be the obvious choice.However,as sample rates increase beyond a couple of megahertz,or if the system requires more than a single channel,FPGAs become more attractive.At high data rates the DSP may struggle to capture,process and output the data without any loss.This is due to the many shared resources,buses and even the core within the processor.The FPGA,however,can dedicate resources to each of these functions.DSPs are instruction based,not clock based.Typically,three to four instructions are required for any mathematical operation on a single sample.The data must first be captured at the input,then forwarded to the processing core,cycled through that core for each operation and then released through the output.In contrast,the FPGA is clock based,so every clock cycle has the potential ability to perform a mathematical operation on the incoming data stream.Since the DSP operates on instructions or code,the programming mechanism is standard C or,for higher performance,low-level assembly.This code may have high-level decision trees or branching operations,which may prove difficult to implement in an FPGA.IP cores are available for FPGAs addressing video,image-processing,communications,automotive,medical and military applications.Often it is simpler to break a high-level system block diagram into FPGA modules and IP cores than it is to map it into C code for DSP implementation.

Moving to FPGA

It is widely accepted that software programmers outnumber hardware designers by a significant margin.The same is true for DSP programmers vs.FPGA designers.However,the transition for system architects or DSP designers to FPGA is not as difficult as software-to-hardware migration.Many resources are available to decrease the learning curve for DSP algorithm development and implementation in FPGAs.The main hurdle is a change from a sample-and event-based approach toward a clock-based problem description and solution.This transition is much easier to comprehend and apply if it is made at the system architecture and definition stage of the design process.It is not unusual for different engineers and mathematicians to be defining system architecture,DSP algorithms and FPGA implementation somewhat isolated from one another.This process is,of course,much smoother if each member has some knowledge of the challenges the other team members face.In order to appreciate FPGA implementations,an architect need not be highly proficient at FPGA design.An understanding of the devices,resources and tools is all that is required.These steps can be reduced through the many focused courses that are available.Reg Zatrepalek is a DSP/FPGA design specialist at Hardent

Source: http://www.electronicsweekly.com/Articles/2012/05/14/53636/dsp-versus-fpga.htm
Contribute Copyright Policy
DSP Versus FPGA