Datasheets
 ALTERA implementation 
 XILINX implementation 
 LATTICE implementation 
 ACTEL implementation 
 ASIC implementation 
Products Summary
DFPAU

Floating Point Arithmetic Coprocessor


The DFPAU is a Floating Point Arithmetic Coprocessor, designed to assist CPU in performing the floating point arithmetic computations. The DFPAU directly replaces C software functions, by equivalent, very fast hardware operations, which significantly accelerate system performance. It does not require any programming, so it also does not require any modifications of the main software. Everything is done automatically during software compilation by the DFPAU C driver.
The DFPAU was designed to operate with DCD’s DP8051, but can also operate with any other 8-, 16- and 32-bit processor. Drivers for all popular 8051 C compilers are delivered together with the DFPAU package.
The DFPAU uses the specialized algorithms to compute arithmetic functions. It supports addition, subtraction, multiplication, division, square root, comparison, absolute value, and change sign of a number. The input numbers format is according to IEEE-754 standard single precision real numbers. DFPAU is prepared to use with 8-, 16- and 32-bit processors. Trigonometric functions are supported indirectly, because they are computed as set of add, multiply and divide operations by software subroutines.
The DFPAU is a technology independent design that can be implemented in a variety of process technologies.


Key Features

Applications

  • Direct replacement for C float software functions such as: +, -, *, /,==, !=,>=, <=, <, >
  • C interface supplied for all popular compilers: GNU C/C++, 8051 compilers
  • No programming required
  • IEEE-754 Single precision real format support – float type
  • Flexible arguments and result registers location
  • Performs the following functions:
    • FADD, FSUB – addition, subtraction
    • FMUL, FDIV – multiplication, division
    • FSQRT – square root
    • FCHS, FABS – change of sign, absolute value
    • FXAM – examine input data
    • FUCOM – comparison
  • Exceptions built-in routines
  • Masks each exception indicator:
    • Precision lack PE
    • Underflow result UE
    • Overflow result OE
    • Invalid operand IE
    • Division by zero ZE
    • Denormal operand DE
  • Fully synthesizable
  • Static synchronous design
  • Positive edge clocking and no internal tri-states
  • Scan test ready
  • Math coprocessors
  • DSP algorithms
  • Embedded arithmetic coprocessor
  • Fast data processing & control


The tables and figures below illustrates the system with DFPAU performance improvements for two typical CPU.
The DFPAU floating point instructions performance has been compared to standard C library functions delivered with every commercial C compiler. Each program was executed in the same system environments. Number of clock periods were measured between input data loading into work registers and output result storing after operation. The results are placed in tables below.
Improvement has been computed as a number of clock cycles required by the CPU to compute FP operation, by the number of clocks required to compute the same operation by system of CPU with DFPAU:



DP8051 based system

32-bit RISC based system

    The following table gives a survey about the DP8051+DFPAU performance compared to std 8051 microcontroller.

    Device Improvement
    80C511.0
    DP80517.3
    DP8051+DFPAU91.0




    Improvements of particular operations is presented below.

    IEEE-754 FP Instruction Improvement
    Addition73
    Subtraction60
    Multilication65
    Division182
    Square Root392
    Sine10
    Cosine10
    Tangent12
    Arcs Tangent17
    Average speed improvement:91
    The table below shows performance improvements of the sample 32-bit-RISC CPU with DFPAU, compared to the same system without the DFPAU coprocessor.

    Device Improvement
    CPU1.0
    CPU+DFPAU (arithmetic)7.5
    CPU+DFPAU (trigonometric)5.9
    CPU+DFPAU (overall)6.8




    Improvements of particular operations is presented below.

    IEEE-754 FP Instruction Improvement
    Addition6.4
    Subtraction6.5
    Multilication5.1
    Division6.5
    Square Root12.9
    Sine5.2
    Cosine5.4
    Tangent5.8
    Arcs Tangent7.2
    Average speed improvement:6.8


Symbol

 clk
 rst
 datai1 (31:0)
 addr2 (4:0)
 cs
 we
datao1 (31:0) 
irq 

Pins description

PinTypeDescription
clkinputGlobal clock
rstinputGlobal reset
datai1 (31:0)inputData bus input
addr2 (4:0)inputRegister address to read/write
csinputChip select for read/write
weinputData write enable
datao1 (31:0)outputData bus output
irqoutputInterrupt request indicator

Block diagram

Align
Exponent
Interface
datai1 (31:0)
datao1 (31:0)
addr2 (4:0)
cs
we
irq
Mantissa
Shifter
Control Unit
clk
rst

Units

Align

It performs the numbers analyze against IEEE-754 standard compliance. Information about the data classes are passed as result to appro-priate internal module.

Exponent

It performs operations on exponent part of number. The addition, subtraction, shifting, comparison and conversion operations are executed in this module. It contains exponents and work registers.

Interface

It is an interface between external device and DFPAU internal 32-bit modules. It contains data, control and status registers. It can be configured to work with 8-, 16- and 32-bit processors.

1 - data bus can be configured as 8-, 16- or 32- bit depends on processor’s bus size
2 - address bus is aligned to work with 8- (3:0), 16- (3:1) or 32- (4:2) bit processors

Mantissa

It performs operations on mantissa part of number. The addition, subtraction, multiplication, division, square root, comparison and conversion operations are executed in this module. It contains mantissas and work registers.

Shifter

It performs mantissa shifting during normalization, denormalization operations. Information about shifted-out bits are stored for rounding process.

Control Unit

It manages execution of all instructions and internal operation required to execute particular function.

Performance

ImplementationSpeed
grade
Logic CellsFrequency
[MHz]
APEX20KE-1264048
APEX20KC-7264057
APEX-II-7264070
STRATIX-52210115
CYCLONE-6241091
CYCLONE-II-6228096
STRATIX-II-31680169

DFPAU implementation results for ALTERA devices. The all features have been included.

ImplementationSpeed
grade
SlicesFrequency
[MHz]
SPARTAN-II-6131042
SPARTAN-IIE-7131042
SPARTAN-3-5129049
VIRTEX-6130042
VIRTEX-E-8130048
VIRTEX-II-5130075
VIRTEX-II pro-7130084
VIRTEX-4-11130094

DFPAU implementation results for XILINX devices. The all features have been included.

ImplementationSpeed
grade
LUTs/PFUsFrequency
[MHz]
ORCA 3-72306/34614
ORCA4-32429/38532
ispXPGA-52881/74743

DFPAU implementation results for LATTICE devices. The all features have been included.


Family summary

DesignStandard complianceArithmetic operations
ADD, SUB, MUL, DIV, SQRT, COMP
Trigonometric operations
SIN, COS, TAN, ARCTAN
Processors interfacesSingle precisionDouble precision8/16/32 bit integers52-bit integers
8,16,32 bit
DFPAU IEEE-754+-++---
DFPMU IEEE-754++++-+-
DFPAU-DP IEEE-754+-+++++
DFPMU-DP IEEE-754+++++++


The main features of each Arithmetic Coprocessors family member has been summarized in table above. It gives a briefly member characterization helping user to select the most suitable IP Core for its application.