Questions tagged [suffix-tree]
A suffix tree is a data structure that stores all suffixes of a string. It is the basis for many fast algorithms on strings.
suffix-tree
230
questions
1227
votes
7
answers
207k
views
Ukkonen's suffix tree algorithm in plain English
I feel a bit thick at this point. I've spent days trying to fully wrap my head around suffix tree construction, but because I don't have a mathematical background, many of the explanations elude me as ...
95
votes
7
answers
47k
views
Suffix tree and Tries. What is the difference?
I am reading about Tries commonly known as Prefix trees and Suffix Trees.
Although I have found code for a Trie I can not find an example for a Suffix Tree. Also I get the feeling that the code that ...
60
votes
11
answers
73k
views
How to call module written with argparse in iPython notebook
I am trying to pass BioPython sequences to Ilya Stepanov's implementation of Ukkonen's suffix tree algorithm in iPython's notebook environment. I am stumbling on the argparse component.
I have ...
49
votes
5
answers
29k
views
Longest palindrome in a string using suffix tree
I was trying to find the longest palindrome in a string. The brute force solution takes O(n^3) time. I read that there is a linear time algorithm for it using suffix trees. I am familiar with suffix ...
36
votes
1
answer
15k
views
How is it possible to build a suffix tree in linear time?
To build a suffix tree, in the worst case if all the letter of the string are different the complexity would be something like
n + (n-1) + (n-2) ... 1 = n*(n+1)/2
which is O(n^2).
However ...
34
votes
5
answers
1k
views
String analysis
Given a sequence of operations:
a*b*a*b*a*a*b*a*b
is there a way to get the optimal subdivision to enable reusage of substring.
making
a*b*a*b*a*a*b*a*b => c*a*c, where c = a*b*a*b
and then ...
28
votes
1
answer
11k
views
How and when to create a suffix link in suffix tree?
Could anyone give me an example about how and when to create a suffix link in suffix tree?
If my string is ABABABC, but do use a different example if that is better.
Hope to give some pictures to ...
27
votes
1
answer
8k
views
Understanding Ukkonen's algorithm for suffix trees [duplicate]
I'm doing some work with Ukkonen's algorithm for building suffix trees, but I'm not understanding some parts of the author's explanation for it's linear-time complexity.
I have learned the algorithm ...
23
votes
1
answer
13k
views
How to generate suffix trees using a python library? [closed]
I need python library that can construct suffix trees and especially generalised suffix trees. Could you suggest me some libraries. Thanks.
20
votes
3
answers
7k
views
really hard to understand suffix tree
I've been searching for tutorials about suffix tree for quite a while. In SO, I found 2 posts about understanding suffix tree: 1, 2.
But I can't say that I understand how to build one, Oops. In ...
20
votes
3
answers
1k
views
Strange algorithm performance
For context, I wrote this algorithm to get the number of unique substrings of any string. It builds the suffix tree for the string counting the nodes it contains and returns that as the answer. The ...
17
votes
2
answers
6k
views
Approximate substring matching using a Suffix Tree
This article discusses approximate substring matching techniques that utilize a suffix tree to improve matching time. Each answer addresses a different algorithm.
Approximate substring matching ...
17
votes
2
answers
4k
views
Suffix Arrays vs Suffix Trees
I just wanna know, when a suffix tree is superior to an enhanced suffix array.
After reading Replacing suffix trees with enhanced suffix arrays i don't see a reason to use suffix trees anymore. Some ...
15
votes
5
answers
20k
views
Generalized Suffix Tree Java Implementation [closed]
I am looking for a Java implementation of the Generalized Suffix Tree (GST) with the following features:
After the creation of the GST from say 1000 strings I would like find out how many of these ...
14
votes
7
answers
37k
views
Finding the longest repeated substring
What would be the best approach (performance-wise) in solving this problem?
I was recommended to use suffix trees. Is this the best approach?
14
votes
3
answers
14k
views
Looking for the suffix tree implementation in C#?
I've implemented a basic search for a research project. I'm trying to make the search more efficient by building a suffix tree. I'm interested in a C# implementation of the Ukkonen algorith. I don't ...
14
votes
9
answers
7k
views
Longest Non-Overlapping Repeated Substring using Suffix Tree/Array (Algorithm Only)
I need to find the longest non-overlapping repeated substring in a String. I have the suffix tree and suffix array of the string available.
When overlapping is allowed, the answer is trivial (deepest ...
14
votes
2
answers
5k
views
Ukkonen's algorithm for Generalized Suffix Trees
I am currently working on my own Suffix Tree implementation (using C++, but the question remains language agnostic). I studied the original paper from Ukkonen. The article is very clear so I got to ...
11
votes
4
answers
6k
views
Effcient way to find longest duplicate string for Python (From Programming Pearls)
From Section 15.2 of Programming Pearls
The C codes can be viewed here: http://www.cs.bell-labs.com/cm/cs/pearls/longdup.c
When I implement it in Python using suffix-array:
example = open("iliad10....
10
votes
1
answer
2k
views
Kasai Algorithm for Constructing LCP-Array Practical Example
I am attempting to complete the Algorithm's on Strings course on Coursera and am stuck on the method to construct an LCP array described in this video:
https://www.coursera.org/learn/algorithms-on-...
10
votes
1
answer
18k
views
suffix tree implementation in python [closed]
Just wondering if you are aware of any C based extension in python that can help me construct suffix trees/arrays in linear time ?
10
votes
1
answer
3k
views
Finding the Longest Common Substring in a Large Data Set
In the past few days I've researched this extensively, I've read so many things that I am now more confused then ever. How does one find the longest common sub string in a large data set? The idea is ...
10
votes
1
answer
2k
views
how to get longest repeating string in substring from suffix tree
I need to find the longest repeating string in substring. Let's say I have string "bannana"
Wikipedia says following:
In computer science, the longest repeated substring problem is the
problem ...
10
votes
1
answer
352
views
Generating suffix tree of string S[2..m] from suffix tree of string S[1..m]
Is there a fast (O(1) time complexity) way of generating a suffix tree of string S[2..m] from suffix tree of string S[1..m]?
I am familiar with Ukkonen's, so I know how to make fast suffix tree of ...
9
votes
2
answers
11k
views
Is it possible to count the number of distinct substrings in a string in O(n)?
Given a string s of length n, is it possible to count the number of distinct substrings in s in O(n)?
Example
Input: abb
Output: 5 ('abb', 'ab', 'bb', 'a', 'b')
I have done some research but i can'...
9
votes
3
answers
5k
views
Successive adding of char to get the longest word in the dictionary [closed]
Given a dictionary of words and an initial character. find the longest possible word in the dictionary by successively adding a character to the word. At any given instance the word should be valid ...
9
votes
1
answer
5k
views
Maximum and minimum number of nodes in a suffix tree [closed]
What are the maximum and minimum number of nodes in a suffix tree? And how can I prove it?
9
votes
2
answers
1k
views
How to use a Trie data structure to find the sum of LCPs for all possible substrings?
Problem Description:
References: Fun With Strings
Based on the problem description, a naive approach to find sum of length of LCP for all possible substrings (for a given string) is as follows :
#...
9
votes
1
answer
12k
views
Finding all repeated substrings in a string and how often they appear
Problem:
I need all the sequences of characters that meet the following:
Sequence of characters must be present more than once ((LE, 1) is thus invalid).
Sequence of characters must be longer than ...
8
votes
9
answers
8k
views
Efficient String/Pattern Matching in C++ (suffixarray, trie, suffixtree?)
I'm looking for an efficient data structure to do String/Pattern Matching on an really huge set of strings. I've found out about tries, suffix-trees and suffix-arrays. But I couldn't find an ready-to-...
8
votes
6
answers
5k
views
Given a string, find all its permutations that are a word in dictionary
This is an interview question:
Given a string, find all its permutations that are a word in dictionary.
My solution:
Put all words of the dictionary into a suffix tree and then search each ...
8
votes
4
answers
5k
views
Suffix trees in javascript?
Is there a nice implementation of suffix trees in JavaScript? Something that will take a string (and a separator) and make the appropriate suffix tree?
8
votes
3
answers
10k
views
Short, Java implementation of a suffix tree and usage?
I'm looking for a short, simple suffix tree building/usage algorithm in Java. The best I've found so far lies withing the Semantic Discovery Toolkit, but the implementation is several thousand lines ...
8
votes
1
answer
1k
views
Matches overlapping lookahead on LZ77/LZSS with suffix trees
Background: I have an implementation of a generic LZSS backend on C++ (available here. The matching algorithm I use in this version is exceedingly simple, because it was originally meant to compress ...
8
votes
1
answer
1k
views
substring finding from a string
Input: string S = AAGATATGATAGGAT.
Output: Maximal repeats such as GATA (as in positions 3 and 8), GAT (as in position 3, 8 and 13) and so on...
A maximal repeat is a substring t occurs k>1 times in ...
7
votes
4
answers
2k
views
Optimizing construction of a trie over all substrings
I am solving a trie related problem. There is a set of strings S. I have to create a trie over all substrings for each string in S. I am using the following routine:
String strings[] = { ... }; // ...
7
votes
2
answers
2k
views
How to remove substring from suffix tree?
I reviewed a lot of literature, but I dont found any information about deleting or insertion substrings into suffix tree. There are only Ukkonen's or McCreight's algorithms for building tree.
The ...
6
votes
1
answer
3k
views
Can I generate all substrings in complexity < O(n^2)
Currently I am using two nested for loop to generate all the substrings of a string. I heard about Suffix Tree but AFAIK Suffix Tree generates suffix not the substrings. Following is the code which ...
6
votes
4
answers
5k
views
Working with suffix trees in python
I'm relatively new to python and am starting to work with suffix trees. I can build them, but I'm running into a memory issue when the string gets large. I know that they can be used to work with ...
6
votes
0
answers
871
views
Haskell Data Type With References
I'm implementing Ukkonen's algorithm, which requires that all leaves of a tree contain a reference to the same integer, and I'm doing it in Haskell to learn more about the language. However, I'm ...
6
votes
1
answer
2k
views
Find longest common substring of multiple strings using factor oracle enhanced with LRS array
Can we use a factor-oracle with suffix link (paper here) to compute the longest common substring of multiple strings? Here, substring means any part of the original string. For example "abc" is the ...
6
votes
0
answers
296
views
Stream variant of the Longest palindromic substring
Suppose I have a character stream as my input.
What is the most optimal way to find the longest palindromic
substring after each new character is added without reprocessing
the whole string all ...
5
votes
1
answer
4k
views
Suffix tree VS Tries - in plain English , what's the difference?
I've taken a look at this questions , but I still don't see the difference between a Suffix tree and a Trie .
Both have all the substrings of a given string , so where do they differ from one ...
5
votes
1
answer
3k
views
What is a generalized suffix tree?
I saw the Wikipedia page but still am not clear with the idea.
To find the longest common substring of 2 strings (T and S), I've read that we must build a suffix tree for the string T($1)S($2), where`...
5
votes
3
answers
5k
views
How do Suffix Trees work?
I'm going through the data structures chapter in The Algorithm Design Manual and came across Suffix Trees.
The example states:
Input:
XYZXYZ$
YZXYZ$
ZXYZ$
XYZ$
YZ$
Z$
$
...
5
votes
1
answer
651
views
Optimization: Python, Perl, and a C Suffix Tree Library
I've got about 3,500 files that consist of single line character strings. The files vary in size (from about 200b to 1mb). I'm trying to compare each file with each other file and find a common ...
5
votes
4
answers
2k
views
How to speed up calculation of length of longest common substring?
I have two very large strings and I am trying to find out their Longest Common Substring.
One way is using suffix trees (supposed to have a very good complexity, though a complex implementation), and ...
5
votes
1
answer
2k
views
Finding all common, non-overlapping substrings
Given two strings, I would like to identify all common sub-strings from longest to shortest.
I want to remove any "sub-"sub-strings. As an example, any substrings of '1234' would not be included in ...
5
votes
1
answer
1k
views
How is worst case time complexity of constructing suffix tree linear?
I have trouble understanding how the worst case time complexity of constructing a suffix tree is linear - particularly when we need to build a suffix tree for a string that may be composed of ...
5
votes
2
answers
735
views
Algorithm for 2 Pattern String Matching
I need to code an algorithm for longest two pattern prefix/suffix match whose time complexity is O(n+m1+m2), where n is the length of the String and m1, m2 are the lengths of pattern1 and pattern2 ...