Abstract: Suffix arrays and trees are important and fundamental string data structures which lie at the foundation of many string algorithms, with important applications in computational biology, text processing, and information retrieval. Recent work enables the efficient parallel construction of suffix arrays and trees requiring at most O(n/p) memory per process in distributed memory.
However, querying these indexes in distributed memory has not been studied extensively. Querying common string indexes such as suffix arrays, enhanced suffix arrays, and FM-Index, all require random accesses into O(n) memory - which in distributed memory algorithms becomes prohibitively expensive.
In this paper, we introduce a novel distributed string index, the Distributed Enhanced Suffix Array (DESA). We present efficient algorithms for the construction and for querying of this distributed data structure, all while requiring only O(n/p) memory per process. We further provide a scalable parallel implementation and demonstrate its performance and scalability.
Back to Technical Papers Archive Listing