Hardware Accelerators for Data Structures
We developed the first parallel architecture for a dynamic data-structure for approximate membership (i.e., a Bloom filter that also supports deletions).
UNMET NEED
The filter architecture supports multiple operations per cycle while preserving sequential order with a fixed and stable throughput that does not depend on the type of operations, the workload, or how populated the filter is.
OUR SOLUTION
We developed a parallel architecture that is based on PRAM emulation for a dynamic filter that supports multiple operations per clock cycle (insertions, deletions, and queries with one-sided errors).
We demonstrate our invention with a PoC implemented on an FPGA that achieves a stable and fixed throughput of 1.1 − 1.12 billion operations per second with a 100 MHz clock.
Our architecture is based on two key principles:
1. Special purpose processors – We designed special purpose wide-word processors that support only insert, delete, and query instructions on small dictionaries. Each special purpose processor executes these instructions in a single clock cycle.
2. Local memories – The memory is partitioned into banks (each bank stores a fraction of the small dictionaries). Every memory bank has a processor that processes operations over the small dictionaries stored in that bank.
APPLICATIONS
Filters are used to avoid slow and costly searches of missing entries in tables. A variation of filters, called retrieval data-structures (i.e., Bloomier Filters) quickly return short values associated with keys in a set. We list a few applications that can benefit from filters.
1. Networking: routing tables, firewall rules.
2. Databases: acceleration of join operations, short sketches of large databases.
3. Bioinformatics: fast evaluation of similarity of sequences.
4. Computer Architecture: cache coherence protocols (e.g., snooping), branch prediction.
INTELLECTUAL PROPERTY
Provisional patent application