Accelerated Computing Solutions

We develop custom solutions for the highest throughput in computing intensive applications, including:

  • Artificial Intelligence and Machine Learning
  • Cryptography and blockchain
  • High Performance Computing  (HPC)
  • Real Time and High Speed Signal Processing

Why accelerated computing?

Processors and microcontrollers can implement any algorithm but their speed, even if high, is limited and sometimes not enough.

One of the cases for accelerated computing is when real-time processing is needed, that is, when a continuous stream of data has to be processed and obviously each data unit has to be ready before the next comes over. This occurs in communications, Software Defined Radio, Radar processing, etc.

Another situation is when a quick response is required, for example if the algorithm is part of a control loop and a fast implementation is necessary for an appropriate response and stability.

Yet another situation is when there is competition. This happens in financial applications, blockchain and military systems, where the fastest simply takes over the adversaries.

The route of accelerated computing

Arithmetic coprocessors were chips associated to a CPU, like the older x86 processor series, where mathematical operations were executed more efficiently.

Computer graphic cards evolved from the increasing video processing needs in home and office computers. A GPU (Graphics Processing Unit) contains a large number of dedicated processing units that can function simultaneously thus being more effective than the CPU. Nvidia, a leading manufacturer of GPU, have created CUDA, a parallel computing platform and API so any PC application can use the graphics card to accelerate computations. Graphics cards were used for blockchain calculations and are used for applications like FEM (Finite Elements Method) simulation.

FPGAs appeared in the late 80s (processors date back to the 60s) and probably were not initially developed to compete with processors. They have a large array of configurable blocks, with configurable connections among them, so any digital function can be implemented on them. Because of allowing parallel computation and being cheaper than developing a custom IC, they are an effective way of accelrating algorithms.

ASICs (Application Specific IC) are 'hard' implementations of functions into a chip. Typically, the design is first evaluated on an FPGA. An ASIC will perform better than an FPGA but requires a much larger investment and production quantities.

Multi-core processors has become the standard in both computers, tablets and smartphones, but not all algorithms can be efficiently split for a number of cores and applications have to be compiled specifically for multi-core CPUs to get the advantage.

Vector Processors are processing units that perform operations on an array of data, typically vector and matrix operations.

Our Projects

We design FPGA soft cores of clusters of specific processors (for example vector, convolutional or FFT processors) with its throughput optimized for a given algorithm, that can range from Object Recognition in video to a Neural Network.

Katana is an Algorithm Specific Processor Cluster IP optimized for Deep Convolutional Neural Networks, tested on Xilinx Development Boards ZCU102 and ZCU106 ...read more

Qabalah is a set of HDL blocks and functions to easily build and emulate quantum computers on FPGA to solve quantum algorithms ...read more

Washi is a Streaming Pipelined Multi Processor Cluster IP designed for Real Time Image Segmentation and Object Recognition, and tested on Xilinx Boards ZCU102 and ZCU106