Hardware Accelerators for Data Structures

We developed the first parallel architecture for a dynamic data-structure for approximate membership (i.e., a Bloom filter that also supports deletions). 

The filter architecture supports multiple operations per cycle while preserving sequential order with a fixed and stable throughput that does not depend on the type of operations, the workload, or how populated the filter is. 

We developed a parallel architecture that is based on PRAM emulation for a dynamic filter that supports multiple operations per clock cycle (insertions, deletions, and queries with one-sided errors). 
We demonstrate our invention with a PoC implemented on an FPGA that achieves a stable and fixed throughput of 1.1 − 1.12 billion operations per second with a 100 MHz clock.
Our architecture is based on two key principles: 
1.  Special purpose processors –  We designed special purpose wide-word processors that support only insert, delete, and query instructions on small dictionaries. Each special purpose processor executes these instructions in a single clock cycle. 
2.  Local memories – The memory is partitioned into banks (each bank stores a fraction of the small dictionaries). Every memory bank has a processor that processes operations over the small dictionaries stored in that bank.

Filters are used to avoid slow and costly searches of missing entries in tables. A variation of filters, called retrieval data-structures (i.e., Bloomier Filters) quickly return short values associated with keys in a set. We list a few applications that can benefit from filters.
1. Networking: routing tables, firewall rules.
2. Databases: acceleration of join operations, short sketches of large databases.
3. Bioinformatics: fast evaluation of similarity of sequences.
4. Computer Architecture: cache coherence protocols (e.g., snooping), branch prediction.

Provisional patent application 

Sign up for
our events

    Life Science