78
78

Jul 20, 2013
07/13

by
Yasuo Tabei; Yoshimasa Takabatake; Hiroshi Sakamoto

texts

#
eye 78

#
favorite 0

#
comment 0

We solve an open problem related to an optimal encoding of a straight line program (SLP), a canonical form of grammar compression deriving a single string deterministically. We show that an information-theoretic lower bound for representing an SLP with n symbols requires at least 2n+logn!+o(n) bits. We then present a succinct representation of an SLP; this representation is asymptotically equivalent to the lower bound. The space is at most 2n log {rho}(1 + o(1)) bits for rho leq 2sqrt{n}, while...

Source: http://arxiv.org/abs/1304.0917v3

3
3.0

Jun 30, 2018
06/18

by
Yoshimasa Takabatake; Yasuo Tabei; Hiroshi Sakamoto

texts

#
eye 3

#
favorite 0

#
comment 0

Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies between different appearances of the same substrings in a string. ESP can be used for computing an...

Topics: Data Structures and Algorithms, Computing Research Repository

Source: http://arxiv.org/abs/1408.0467

3
3.0

Jun 30, 2018
06/18

by
Yoshimasa Takabatake; Yasuo Tabei; Hiroshi Sakamoto

texts

#
eye 3

#
favorite 0

#
comment 0

While several self-indexes for highly repetitive texts exist, developing a practical self-index applicable to real world repetitive texts remains a challenge. ESP-index is a grammar-based self-index on the notion of edit-sensitive parsing (ESP), an efficient parsing algorithm that guarantees upper bounds of parsing discrepancies between different appearances of the same subtexts in a text. Although ESP-index performs efficient top-down searches of query texts, it has a serious issue on binary...

Topics: Data Structures and Algorithms, Computing Research Repository

Source: http://arxiv.org/abs/1404.4972

16
16

Jun 28, 2018
06/18

by
Yoshimasa Takabatake; Yasuo Tabei; Hiroshi Sakamoto

texts

#
eye 16

#
favorite 0

#
comment 0

Although several grammar-based self-indexes have been proposed thus far, their applicability is limited to offline settings where whole input texts are prepared, thus requiring to rebuild index structures for given additional inputs, which is often the case in the big data era. In this paper, we present the first online self-indexed grammar compression named OESP-index that can gradually build the index structure by reading input characters one-by-one. Such a property is another advantage which...

Topics: Computing Research Repository, Data Structures and Algorithms

Source: http://arxiv.org/abs/1507.00805

8
8.0

Jun 30, 2018
06/18

by
Tatsuya Ohno; Yoshimasa Takabatake; Tomohiro I; Hiroshi Sakamoto

texts

#
eye 8

#
favorite 0

#
comment 0

We propose a faster algorithm for run-length BWT (RLBWT) in run-compressed space. We improve the state-of-the-art algorithm for RLBWT in terms of empirical construction time. Adopting the dynamic list for maintaining a total order, we can replace rank queries in a dynamic wavelet tree on a run-length compressed string by the direct comparison of labels in a dynamic list. The empirical result for various benchmarks show the efficiency of our algorithm, especially for highly repetitive strings....

Topics: Data Structures and Algorithms, Computing Research Repository

Source: http://arxiv.org/abs/1704.05233

9
9.0

Jun 29, 2018
06/18

by
Shouhei Fukunaga; Yoshimasa Takabatake; I Tomohiro; Hiroshi Sakamoto

texts

#
eye 9

#
favorite 0

#
comment 0

Various grammar compression algorithms have been proposed in the last decade. A grammar compression is a restricted CFG deriving the string deterministically. An efficient grammar compression develops a smaller CFG by finding duplicated patterns and removing them. This process is just a frequent pattern discovery by grammatical inference. While we can get any frequent pattern in linear time using a preprocessed string, a huge working space is required for longer patterns, and the whole string...

Topics: Data Structures and Algorithms, Computing Research Repository

Source: http://arxiv.org/abs/1607.04446

41
41

Sep 22, 2013
09/13

by
Naoya Kishiue; Masaya Nakahara; Shirou Maruyama; Hiroshi Sakamoto

texts

#
eye 41

#
favorite 0

#
comment 0

Practical data structures for the edit-sensitive parsing (ESP) are proposed. Given a string S, its ESP tree is equivalent to a context-free grammar G generating just S, which is represented by a DAG. Using the succinct data structures for trees and permutations, G is decomposed to two LOUDS bit strings and single array in (1+\epsilon)n\log n+4n+o(n) bits for any 0

Source: http://arxiv.org/abs/1101.0080v3

4
4.0

Jun 29, 2018
06/18

by
Yoshimasa Takabatake; Kenta Nakashima; Tetsuji Kuboyama; Yasuo Tabei; Hiroshi Sakamoto

texts

#
eye 4

#
favorite 0

#
comment 0

Although several self-indexes for highly repetitive text collections exist, developing an index and search algorithm with editing operations remains a challenge. Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string into another. Although the problem of computing EDM is intractable, it has a wide range of potential applications, especially in approximate string retrieval. Despite the...

Topics: Data Structures and Algorithms, Computing Research Repository

Source: http://arxiv.org/abs/1602.06688

62
62

Jul 20, 2013
07/13

by
Keisuke Goto; Shirou Maruyama; Shunsuke Inenaga; Hideo Bannai; Hiroshi Sakamoto; Masayuki Takeda

texts

#
eye 62

#
favorite 0

#
comment 0

We consider the problem of {\em restructuring} compressed texts without explicit decompression. We present algorithms which allow conversions from compressed representations of a string $T$ produced by any grammar-based compression algorithm, to representations produced by several specific compression algorithms including LZ77, LZ78, run length encoding, and some grammar based compression algorithms. These are the first algorithms that achieve running times polynomial in the size of the...

Source: http://arxiv.org/abs/1107.2729v1

70
70

Sep 18, 2013
09/13

by
Atsushi Manabe; Kohki Ishikawa; Yoshihiko Itoh; Setsuya Kawabata; Tetsuro Mashimo; Youhei Morita; Hiroshi Sakamoto; Takashi Sasaki; Hiroyuki Sato; Junichi Tanaka; Ikuo Ueda; Yoshiyuki Watase; Satomi Yamamoto; Shigeo Yashiro

texts

#
eye 70

#
favorite 0

#
comment 0

For data analysis of large-scale experiments such as LHC Atlas and other Japanese high energy and nuclear physics projects, we have constructed a Grid test bed at ICEPP and KEK. These institutes are connected to national scientific gigabit network backbone called SuperSINET. In our test bed, we have installed NorduGrid middleware based on Globus, and connected 120TB HPSS at KEK as a large scale data store. Atlas simulation data at ICEPP has been transferred and accessed using SuperSINET. We...

Source: http://arxiv.org/abs/cs/0306051v2