DFPAU-DP
Floating Point Arithmetic Coprocessor - Double Precision
Documentation
The DFPAU-DP is a Floating Point Arithmetic Coprocessor, designed to assist CPU in performing the floating point arithmetic computations. Our reliable solution directly replaces C software functions, by equivalent, very fast hardware operations, which significantly accelerate system performance. It does not require any programming, the same like no modifications to be made in the main software. Everything is done automatically during software compilation by the DFPAU-DP C driver.
Our efficient coprocessor was designed to operate with DCD’s DP8051 but can also operate with any other 8-, 16- and 32-bit processor. Drivers for all popular 8051 C compilers are delivered together with the DFPAU-DP package.
It uses specialized algorithms to compute math functions like: addition, subtraction, multiplication, division, square root and comparison. The DFPAU-DP has built-in conversion instructions, from integer type to floating point type and vice versa. The input numbers' format has been developed according to IEEE-754 standard. Our Floating Point Arithmetic Coprocessor supports double and single precision real numbers: 8-bit, 16-bit and 32-bit integers. This proprietary solution is ready to be used both with 8-, 16- and 32-bit processors.
The DFPAU-DP is a technology independent design, that can be implemented in a variety of process technologies.
Family summary
| Design | Standard compliance |
Arithmetic operations ADD, SUB, MUL, DIV, SQRT, COMP |
Trigonometric operations SIN, COS, TAN, ARCTAN |
Processors interfaces 8,16,32 bit |
Single precision | Double precision | 8/16/32 bit integers | 52-bit integers |
|---|---|---|---|---|---|---|---|---|
| DFPAU | IEEE-754 | + | - | + | + | - | - | - |
| DFPMU | IEEE-754 | + | + | + | + | - | + | - |
| DFPAU-DP | IEEE-754 | + | - | + | + | + | + | + |
| DFPMU-DP | IEEE-754 | + | + | + | + | + | + | + |
The main features of each Arithmetic Coprocessors family member has been summarized in table above. It gives a briefly member characterization helping you to select the most suitable IP Core for your application.
Performance
Each core has been tested in variety of FPGA and ASIC technologies. Its implementation results are summarized below.
| Implementation |
Speed grade |
Slices |
Frequency [MHz] |
|---|---|---|---|
| VIRTEX-II | -6 | 2015 | 80 |
| VIRTEX-II pro | -7 | 2015 | 97 |
| VIRTEX-4 | -11 | 1975 | 93 |
| Implementation |
Speed grade |
Logic Cells |
Frequency [MHz] |
|---|---|---|---|
| CYCLONE | -6 | 3660 | 79 |
| CYCLONE-II | -6 | 3630 | 71 |
| STRATIX | -5 | 3660 | 84 |
| STRATIX-II | -3 | 2800 | 110 |
| STRATIX-IV | -2 | 2750 | 160 |
Info
The table and figures below illustrates the system with DFPAU-DP performance improvements for typical 32-bit RISC CPU.
The DFPAU-DP floating point performance instructions has been compared to standard C library functions delivered with every commercial C compiler. Each program was executed in the same system environments. Number of clock periods was measured between input data loading into work registers and output result storing after operation. The results are placed in tables below.
Improvement has been computed as a number of clock cycles required by the CPU to compute FP operation, by the number of clocks required to compute the same operation by system of CPU with DFPAU-DP:
32-BIT RISC BASED SYSTEM
The table below shows performance improvements of the sample 32-bit-RISC CPU with DFPAU-DP, compared to the same system without the DFPAU-DP coprocessor
| Function | CPU CLK | DFPAU_DP CLK | Improvement |
|---|---|---|---|
| Arithmetic operations | - | - | - |
| Addition | 1376 | 114 | 12.0 |
| Subtraction | 1338 | 114 | 11.7 |
| Multilication | 1628 | 153 | 10.6 |
| Division | 2964 | 197 | 15.0 |
| Square Root | 3030 | 141 | 21.5 |
| Total | - | - | 14.1 |
| Trigonometric operations | - | - | - |
| Sine | 18730 | 1600 | 11.8 |
| Cosine | 21798 | 2120 | 10.3 |
| Tangent | 37500 | 3695 | 10.1 |
| Arcs Tangent | 36790 | 2500 | 14.7 |
| Total | - | - | 11,7 |
| Average speed improvement: | - | - | 11.8 |
Key Features
- Direct replacement for C double, float software functions such as: +, -, *, /,==, !=,>=, <=, <, >
- Configurability of all available functions
- C interface supplied for all popular compilers: GNU C/C++, 8051 compilers
- No programming required
- IEEE-754 Double precision real format support – double type
- IEEE-754 Single precision real format support – float type
- 8-bit, 16-bit 32-bit and 52-bit integers format supported – integer types
- Flexible arguments and result registers location
- Performs the following functions:
- FADD, FSUB – addition, subtraction
- FMUL, FDIV – multiplication, division
- FSQRT – square root
- FXAM – examine input data
- FUCOM – comparison
- FCLD, FILD – 8-bit, 16-bit integer to dou-ble
- FLLD, FELD – 32-bit, 52-bit integer to double
- FCST, FIST – double to 8-bit, 16-bi integer
- FLST, FEST – double to 32-bit, 52-bit integer
- FFLD – float to double
- FFST – double to float
- Exceptions built-in routines
- Masks each exception indicator:
- Precision lack PE
- Underflow result UE
- Overflow result OE
- Invalid operand IE
- Division by zero ZE
- Denormal operand DE
- Fully configurable
- Fully synthesizable
- Static synchronous design
- Positive edge clocking and no internal tri-states
- Scan test ready
Applications
- Math coprocessors
- DSP algorithms
- Embedded arithmetic coprocessor
- Fast data processing & control
Symbol
clk
datai1 (31:0)
addr2 (4:0)
we
cs

Pins description
| Pin | Type | Description |
|---|---|---|
| clk | input | Global clock |
| datai1 (31:0) | input | Data bus input |
| addr2 (4:0) | input | Register addres to read/write |
| we | input | Data write enable |
| cs | input | Chip select for read/write |
| datao1 (31:0) | output | Data bus output |
| irq | output | Interrupt request indicator |
Block Diagram
| AlignIt performs the numbers analyze against IEEE-754 standard compliance. Information about the data classes is passed as a result to appropriate internal module. |
| InterfaceMakes interface between external device and core internal 32-bit modules. It contains data, control and status registers. It can be configured to work with 8-, 16- and 32-bit processors.. 1 - data bus can be configured as 8-, 16- or 32- bit depends on processor"s bus size 2 - address bus is aligned to work with 8- (3:0), 16- (3:1) or 32- (4:2) bit processors |
datai1 (31:0)
datao1 (31:0)
irq
addr2 (4:0)
we
cs
| ExponentIt performs operations on exponent part of number. The addition, subtraction, shifting, comparison and conversion operations are executed in this module. It contains exponents and work registers. |
| MantissaIt performs operations on mantissa part of number. The addition, subtraction, multiplication, division, square root, comparison and conversion operations are executed in this module. It contains mantissas and work registers. |
| ShifterIt performs mantissa shifting during normalization, denormalization operations. Information about out-shifted bits is stored for rounding process. |
| Control UnitIt manages execution of all instructions and internal operation required to carry particular function. |

| Exponent bus Exponent data bus is 17-bit wide bus used for exponent transferring between modules. |
| Mantissa Mantissa data bus. It is 70-bit wide internal bus used for mantissas transferring between modules. |
| Control bus Control bus is intended for control signals connected to each module. Main control is performed by Control Unit. |
Units
Align
It performs the numbers analyze against IEEE-754 standard compliance. Information about the data classes is passed as a result to appropriate internal module.Interface
Makes interface between external device and core internal 32-bit modules. It contains data, control and status registers. It can be configured to work with 8-, 16- and 32-bit processors..1 - data bus can be configured as 8-, 16- or 32- bit depends on processor"s bus size
2 - address bus is aligned to work with 8- (3:0), 16- (3:1) or 32- (4:2) bit processors