Distributed Enhanced Suffix Arrays: Efficient Algorithms for Construction and Querying

SC19 Proceedings

Distributed Enhanced Suffix Arrays: Efficient Algorithms for Construction and Querying

Authors: Patrick Flick (Georgia Institute of Technology), Srinivas Aluru (Georgia Institute of Technology)

Abstract: Suffix arrays and trees are important and fundamental string data structures which lie at the foundation of many string algorithms, with important applications in computational biology, text processing, and information retrieval. Recent work enables the efficient parallel construction of suffix arrays and trees requiring at most O(n/p) memory per process in distributed memory.

However, querying these indexes in distributed memory has not been studied extensively. Querying common string indexes such as suffix arrays, enhanced suffix arrays, and FM-Index, all require random accesses into O(n) memory - which in distributed memory algorithms becomes prohibitively expensive.

In this paper, we introduce a novel distributed string index, the Distributed Enhanced Suffix Array (DESA). We present efficient algorithms for the construction and for querying of this distributed data structure, all while requiring only O(n/p) memory per process. We further provide a scalable parallel implementation and demonstrate its performance and scalability.

Presentation: file

Back to Technical Papers Archive Listing