Questions tagged [suffix-tree]

A suffix tree is a data structure that stores all suffixes of a string. It is the basis for many fast algorithms on strings.

suffix-tree
Filter by
Sorted by
Tagged with
1227 votes
7 answers
207k views

Ukkonen's suffix tree algorithm in plain English

I feel a bit thick at this point. I've spent days trying to fully wrap my head around suffix tree construction, but because I don't have a mathematical background, many of the explanations elude me as ...
Nathan Ridley's user avatar
95 votes
7 answers
47k views

Suffix tree and Tries. What is the difference?

I am reading about Tries commonly known as Prefix trees and Suffix Trees. Although I have found code for a Trie I can not find an example for a Suffix Tree. Also I get the feeling that the code that ...
Cratylus's user avatar
  • 53.5k
60 votes
11 answers
73k views

How to call module written with argparse in iPython notebook

I am trying to pass BioPython sequences to Ilya Stepanov's implementation of Ukkonen's suffix tree algorithm in iPython's notebook environment. I am stumbling on the argparse component. I have ...
Niels's user avatar
  • 1,533
49 votes
5 answers
29k views

Longest palindrome in a string using suffix tree

I was trying to find the longest palindrome in a string. The brute force solution takes O(n^3) time. I read that there is a linear time algorithm for it using suffix trees. I am familiar with suffix ...
shreyasva's user avatar
  • 13.3k
36 votes
1 answer
15k views

How is it possible to build a suffix tree in linear time?

To build a suffix tree, in the worst case if all the letter of the string are different the complexity would be something like n + (n-1) + (n-2) ... 1 = n*(n+1)/2 which is O(n^2). However ...
shreyasva's user avatar
  • 13.3k
34 votes
5 answers
1k views

String analysis

Given a sequence of operations: a*b*a*b*a*a*b*a*b is there a way to get the optimal subdivision to enable reusage of substring. making a*b*a*b*a*a*b*a*b => c*a*c, where c = a*b*a*b and then ...
Martin Kristiansen's user avatar
28 votes
1 answer
11k views

How and when to create a suffix link in suffix tree?

Could anyone give me an example about how and when to create a suffix link in suffix tree? If my string is ABABABC, but do use a different example if that is better. Hope to give some pictures to ...
lingguang1997's user avatar
27 votes
1 answer
8k views

Understanding Ukkonen's algorithm for suffix trees [duplicate]

I'm doing some work with Ukkonen's algorithm for building suffix trees, but I'm not understanding some parts of the author's explanation for it's linear-time complexity. I have learned the algorithm ...
Vinicius Braz Pinto's user avatar
23 votes
1 answer
13k views

How to generate suffix trees using a python library? [closed]

I need python library that can construct suffix trees and especially generalised suffix trees. Could you suggest me some libraries. Thanks.
ashim's user avatar
  • 24.9k
20 votes
3 answers
7k views

really hard to understand suffix tree

I've been searching for tutorials about suffix tree for quite a while. In SO, I found 2 posts about understanding suffix tree: 1, 2. But I can't say that I understand how to build one, Oops. In ...
Alcott's user avatar
  • 18.2k
20 votes
3 answers
1k views

Strange algorithm performance

For context, I wrote this algorithm to get the number of unique substrings of any string. It builds the suffix tree for the string counting the nodes it contains and returns that as the answer. The ...
mjgalindo's user avatar
  • 866
17 votes
2 answers
6k views

Approximate substring matching using a Suffix Tree

This article discusses approximate substring matching techniques that utilize a suffix tree to improve matching time. Each answer addresses a different algorithm. Approximate substring matching ...
17 votes
2 answers
4k views

Suffix Arrays vs Suffix Trees

I just wanna know, when a suffix tree is superior to an enhanced suffix array. After reading Replacing suffix trees with enhanced suffix arrays i don't see a reason to use suffix trees anymore. Some ...
Nicolas's user avatar
  • 960
15 votes
5 answers
20k views

Generalized Suffix Tree Java Implementation [closed]

I am looking for a Java implementation of the Generalized Suffix Tree (GST) with the following features: After the creation of the GST from say 1000 strings I would like find out how many of these ...
user avatar
14 votes
7 answers
37k views

Finding the longest repeated substring

What would be the best approach (performance-wise) in solving this problem? I was recommended to use suffix trees. Is this the best approach?
kukit's user avatar
  • 307
14 votes
3 answers
14k views

Looking for the suffix tree implementation in C#?

I've implemented a basic search for a research project. I'm trying to make the search more efficient by building a suffix tree. I'm interested in a C# implementation of the Ukkonen algorith. I don't ...
Goran's user avatar
  • 6,816
14 votes
9 answers
7k views

Longest Non-Overlapping Repeated Substring using Suffix Tree/Array (Algorithm Only)

I need to find the longest non-overlapping repeated substring in a String. I have the suffix tree and suffix array of the string available. When overlapping is allowed, the answer is trivial (deepest ...
Genuine Programmer's user avatar
14 votes
2 answers
5k views

Ukkonen's algorithm for Generalized Suffix Trees

I am currently working on my own Suffix Tree implementation (using C++, but the question remains language agnostic). I studied the original paper from Ukkonen. The article is very clear so I got to ...
Rerito's user avatar
  • 6,066
11 votes
4 answers
6k views

Effcient way to find longest duplicate string for Python (From Programming Pearls)

From Section 15.2 of Programming Pearls The C codes can be viewed here: http://www.cs.bell-labs.com/cm/cs/pearls/longdup.c When I implement it in Python using suffix-array: example = open("iliad10....
Hanfei Sun's user avatar
  • 46.2k
10 votes
1 answer
2k views

Kasai Algorithm for Constructing LCP-Array Practical Example

I am attempting to complete the Algorithm's on Strings course on Coursera and am stuck on the method to construct an LCP array described in this video: https://www.coursera.org/learn/algorithms-on-...
roadrunner007's user avatar
10 votes
1 answer
18k views

suffix tree implementation in python [closed]

Just wondering if you are aware of any C based extension in python that can help me construct suffix trees/arrays in linear time ?
Abhi's user avatar
  • 6,165
10 votes
1 answer
3k views

Finding the Longest Common Substring in a Large Data Set

In the past few days I've researched this extensively, I've read so many things that I am now more confused then ever. How does one find the longest common sub string in a large data set? The idea is ...
diffuse's user avatar
  • 101
10 votes
1 answer
2k views

how to get longest repeating string in substring from suffix tree

I need to find the longest repeating string in substring. Let's say I have string "bannana" Wikipedia says following: In computer science, the longest repeated substring problem is the problem ...
Wakan Tanka's user avatar
  • 7,764
10 votes
1 answer
352 views

Generating suffix tree of string S[2..m] from suffix tree of string S[1..m]

Is there a fast (O(1) time complexity) way of generating a suffix tree of string S[2..m] from suffix tree of string S[1..m]? I am familiar with Ukkonen's, so I know how to make fast suffix tree of ...
user2356167's user avatar
9 votes
2 answers
11k views

Is it possible to count the number of distinct substrings in a string in O(n)?

Given a string s of length n, is it possible to count the number of distinct substrings in s in O(n)? Example Input: abb Output: 5 ('abb', 'ab', 'bb', 'a', 'b') I have done some research but i can'...
donrondon's user avatar
  • 103
9 votes
3 answers
5k views

Successive adding of char to get the longest word in the dictionary [closed]

Given a dictionary of words and an initial character. find the longest possible word in the dictionary by successively adding a character to the word. At any given instance the word should be valid ...
AlgoMan's user avatar
  • 2,885
9 votes
1 answer
5k views

Maximum and minimum number of nodes in a suffix tree [closed]

What are the maximum and minimum number of nodes in a suffix tree? And how can I prove it?
user avatar
9 votes
2 answers
1k views

How to use a Trie data structure to find the sum of LCPs for all possible substrings?

Problem Description: References: Fun With Strings Based on the problem description, a naive approach to find sum of length of LCP for all possible substrings (for a given string) is as follows : #...
Saurabh P Bhandari's user avatar
9 votes
1 answer
12k views

Finding all repeated substrings in a string and how often they appear

Problem: I need all the sequences of characters that meet the following: Sequence of characters must be present more than once ((LE, 1) is thus invalid). Sequence of characters must be longer than ...
alvitawa's user avatar
  • 394
8 votes
9 answers
8k views

Efficient String/Pattern Matching in C++ (suffixarray, trie, suffixtree?)

I'm looking for an efficient data structure to do String/Pattern Matching on an really huge set of strings. I've found out about tries, suffix-trees and suffix-arrays. But I couldn't find an ready-to-...
Constantin's user avatar
  • 8,799
8 votes
6 answers
5k views

Given a string, find all its permutations that are a word in dictionary

This is an interview question: Given a string, find all its permutations that are a word in dictionary. My solution: Put all words of the dictionary into a suffix tree and then search each ...
user1002288's user avatar
  • 4,970
8 votes
4 answers
5k views

Suffix trees in javascript?

Is there a nice implementation of suffix trees in JavaScript? Something that will take a string (and a separator) and make the appropriate suffix tree?
silverasm's user avatar
  • 501
8 votes
3 answers
10k views

Short, Java implementation of a suffix tree and usage?

I'm looking for a short, simple suffix tree building/usage algorithm in Java. The best I've found so far lies withing the Semantic Discovery Toolkit, but the implementation is several thousand lines ...
Stefan Kendall's user avatar
8 votes
1 answer
1k views

Matches overlapping lookahead on LZ77/LZSS with suffix trees

Background: I have an implementation of a generic LZSS backend on C++ (available here. The matching algorithm I use in this version is exceedingly simple, because it was originally meant to compress ...
flamewing's user avatar
8 votes
1 answer
1k views

substring finding from a string

Input: string S = AAGATATGATAGGAT. Output: Maximal repeats such as GATA (as in positions 3 and 8), GAT (as in position 3, 8 and 13) and so on... A maximal repeat is a substring t occurs k>1 times in ...
rock's user avatar
  • 145
7 votes
4 answers
2k views

Optimizing construction of a trie over all substrings

I am solving a trie related problem. There is a set of strings S. I have to create a trie over all substrings for each string in S. I am using the following routine: String strings[] = { ... }; // ...
Bhoot's user avatar
  • 2,633
7 votes
2 answers
2k views

How to remove substring from suffix tree?

I reviewed a lot of literature, but I dont found any information about deleting or insertion substrings into suffix tree. There are only Ukkonen's or McCreight's algorithms for building tree. The ...
user2386656's user avatar
6 votes
1 answer
3k views

Can I generate all substrings in complexity < O(n^2)

Currently I am using two nested for loop to generate all the substrings of a string. I heard about Suffix Tree but AFAIK Suffix Tree generates suffix not the substrings. Following is the code which ...
ravi's user avatar
  • 6,260
6 votes
4 answers
5k views

Working with suffix trees in python

I'm relatively new to python and am starting to work with suffix trees. I can build them, but I'm running into a memory issue when the string gets large. I know that they can be used to work with ...
doggysaywhat's user avatar
6 votes
0 answers
871 views

Haskell Data Type With References

I'm implementing Ukkonen's algorithm, which requires that all leaves of a tree contain a reference to the same integer, and I'm doing it in Haskell to learn more about the language. However, I'm ...
Craig's user avatar
  • 255
6 votes
1 answer
2k views

Find longest common substring of multiple strings using factor oracle enhanced with LRS array

Can we use a factor-oracle with suffix link (paper here) to compute the longest common substring of multiple strings? Here, substring means any part of the original string. For example "abc" is the ...
Ray's user avatar
  • 1,655
6 votes
0 answers
296 views

Stream variant of the Longest palindromic substring

Suppose I have a character stream as my input. What is the most optimal way to find the longest palindromic substring after each new character is added without reprocessing the whole string all ...
user avatar
5 votes
1 answer
4k views

Suffix tree VS Tries - in plain English , what's the difference?

I've taken a look at this questions , but I still don't see the difference between a Suffix tree and a Trie . Both have all the substrings of a given string , so where do they differ from one ...
JAN's user avatar
  • 21.6k
5 votes
1 answer
3k views

What is a generalized suffix tree?

I saw the Wikipedia page but still am not clear with the idea. To find the longest common substring of 2 strings (T and S), I've read that we must build a suffix tree for the string T($1)S($2), where`...
batman's user avatar
  • 5,162
5 votes
3 answers
5k views

How do Suffix Trees work?

I'm going through the data structures chapter in The Algorithm Design Manual and came across Suffix Trees. The example states: Input: XYZXYZ$ YZXYZ$ ZXYZ$ XYZ$ YZ$ Z$ $ ...
Anthony's user avatar
  • 34.8k
5 votes
1 answer
651 views

Optimization: Python, Perl, and a C Suffix Tree Library

I've got about 3,500 files that consist of single line character strings. The files vary in size (from about 200b to 1mb). I'm trying to compare each file with each other file and find a common ...
mstcamus's user avatar
5 votes
4 answers
2k views

How to speed up calculation of length of longest common substring?

I have two very large strings and I am trying to find out their Longest Common Substring. One way is using suffix trees (supposed to have a very good complexity, though a complex implementation), and ...
Lazer's user avatar
  • 92.4k
5 votes
1 answer
2k views

Finding all common, non-overlapping substrings

Given two strings, I would like to identify all common sub-strings from longest to shortest. I want to remove any "sub-"sub-strings. As an example, any substrings of '1234' would not be included in ...
mrmagicfluffyman's user avatar
5 votes
1 answer
1k views

How is worst case time complexity of constructing suffix tree linear?

I have trouble understanding how the worst case time complexity of constructing a suffix tree is linear - particularly when we need to build a suffix tree for a string that may be composed of ...
sia831's user avatar
  • 51
5 votes
2 answers
735 views

Algorithm for 2 Pattern String Matching

I need to code an algorithm for longest two pattern prefix/suffix match whose time complexity is O(n+m1+m2), where n is the length of the String and m1, m2 are the lengths of pattern1 and pattern2 ...
user avatar

1
2 3 4 5