Hardware Accelerators for Data Structures

We present the first parallel architecture for a dynamic data-structure for approximate membership (i.e., a filter).
The dynamic filter supports insertions, deletions, and approximate membership queries. Our architecture borrows techniques from PRAM emulation to parallelize filters based on two levels of fingerprints. A key component in the architecture is a special-purpose wide-word processor we designed to support operations over small dictionaries.
On top of several applications of this invention is a cutting edge solution and alternative to Snoop Filters which provides a list of advantages including smaller memory/foot print, faster execution and lower energy consumption.

We present a throughput of ≈ 10 – 25x compared to throughputs of single-threaded dynamic filters implemented in software on modern CPUs.
A unique feature of our architecture is the stable and constant throughput that it achieves. Previous implementations of filters on CPU’s and on GPU’s do not share this feature. Namely, their throughput depends on the type of operations (e.g., insertions become slower as the dataset grows or negative queries are slower than positive queries).

We present the first parallel architecture for a dynamic filter that supports insertions, queries, and deletions.
We present a parallel architecture for a dynamic filters that borrows the PRAM emulation methodology.
This architecture builds on principles from PRAM emulation and an “engineered” version of the theoretical filter. Instead of general-purpose processors, we designed a special-purpose wide-word processor with local memory that supports only three instructions: insert, delete, and query.
We demonstrate our invention with a PoC implemented on an FPGA, which achieves a stable and fixed throughput of 1.1 − 1.12 billion operations per second, using 100 MHz clock.
Our architecture is based on two key principles:
1. Special purpose processors – We designed special purpose wide-word processors that support only insert, delete, and query instructions on small dictionaries. Each special purpose processor executes these instructions in a single clock cycle.
2. Local memories – The memory is partitioned into banks (each bank stores a fraction of the small dictionaries). Every memory bank has a processor that processes operations over the small dictionaries stored in that bank. Hence, communication is local.

1. Filters for screening out negative queries.
2. Networking – Packet forwarding and routing, web caching, gossiping, resource discovering, scheduling, etc.
3. Communication Networks – Cognitive radio networks, wireless sensor networks, device-to-device communications, smart grid, etc.
4. Database – Information retrieving, recommendation, record linkage, duplication, anonymization, etc.
5. Network security – In wireless networks employed for authentication, anonymity, firewalling, trace-backing, misbehavior detection, replay attack detection, and node replication detection. In wired networks, various uses of different security mechanisms, including string matching, IP trace- backing, spam filtering and e-mail protection, DoS and DDoS attacks detection, and anomaly detection
6. Bioinformatics – To represent sequenced genomes, biometric information such as iris, face, handshape, fingerprint, etc

Provisional patent application

Sign up for
our events

    Life Science