jbil.sequence
Class Trie

java.lang.Object
  extended by jbil.sequence.Trie
All Implemented Interfaces:
KMerCounter
Direct Known Subclasses:
FactorTrie

public class Trie
extends java.lang.Object
implements KMerCounter

Class representing an extended character trie over a given alphabet.

Author:
Paulo G. S. da Fonseca

Constructor Summary
Trie(Alphabet alphabet)
          Constructs a new trie for representing strings over a given alphabet.
 
Method Summary
 jbil.sequence.Trie.TrieNode addSequence(Sequence sequence)
          Adds a sequence to the trie and returns its locus.
 jbil.sequence.Trie.TrieNode addSequence(Sequence sequence, int beginIndex, int endIndex)
          Adds a sequence to the trie and returns its locus.
 Pair<java.lang.Integer,java.lang.Integer> countOccurrences(Sequence pattern, int beginIndex, int endIndex, int offset)
          Counts the number of occurrences of the patterns pattern[beginIndex..endIndex-2] and pattern[beginIndex..endIndex-1] respectively, as prefixes of the words represented by the subtries rooted at nodes at a given height.
 Pair<java.lang.Integer,java.lang.Integer> countOccurrences(Sequence pattern, int beginIndex, int endIndex, jbil.sequence.Trie.TrieNode node)
          Counts the number of occurrences of the patterns pattern[beginIndex..endIndex-2] and pattern[beginIndex..endIndex-1] respectively, as prefixes of the words represented by the subtrie rooted at a given node.
 boolean deleteSequence(Sequence sequence)
          Removes a sequence from the trie provided that it belongs to the (multi)set of sequences represented by the trie.
 boolean deleteSequence(Sequence sequence, int beginIndex, int endIndex)
          Deletes a subsequence of a given sequence from the trie.
 Alphabet getAlphabet()
          Returns the base alphabet of the tire.
 int getHeight()
          Gets the height of this trie, i.e., the maximum height of a leaf.
 java.util.List<Sequence> getSequenceList()
          Returns the list of sequences represented by this trie.
 void print()
          Prints a string representation of the trie to System.out.
 void reset()
          Clears the trie.
 int sequenceCount()
          Returns the total number of sequences represented by this trie, taking into account word multiplicity and high order symbols.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Trie

public Trie(Alphabet alphabet)
Constructs a new trie for representing strings over a given alphabet.

Parameters:
alphabet - The alphabet of the strings represented by this trie.
Method Detail

getAlphabet

public Alphabet getAlphabet()
Returns the base alphabet of the tire.


reset

public void reset()
Clears the trie.


getHeight

public int getHeight()
Gets the height of this trie, i.e., the maximum height of a leaf. The root is at height 0, its children at height 1, its grandchildren at height 2 and so on.


getSequenceList

public java.util.List<Sequence> getSequenceList()
Returns the list of sequences represented by this trie.


addSequence

public jbil.sequence.Trie.TrieNode addSequence(Sequence sequence)
Adds a sequence to the trie and returns its locus. If the sequence was already represented in the trie, then this method just updates leaf counts accordingly.

Parameters:
sequence - The sequence to be added.

addSequence

public jbil.sequence.Trie.TrieNode addSequence(Sequence sequence,
                                               int beginIndex,
                                               int endIndex)
Adds a sequence to the trie and returns its locus. If the sequence was already represented in the trie, then this method just updates leaf counts accordingly.

Parameters:
sequence - The sequence to be added.

deleteSequence

public boolean deleteSequence(Sequence sequence)
Removes a sequence from the trie provided that it belongs to the (multi)set of sequences represented by the trie. By saying this we imply that the sequence must be represented directly, and not only as a prefix of other words.

Sequences containing high order symbols are considered as ordinary sequences, meaning that, for instance, 'ANNT' will be removed, only if it was added as so; sequences that were eventually added to the trie and that match this pattern, e.g. 'ACCT', or 'AGNT' will not be removed.

Each delete operation will remove one instance of the sequence, even if the multiset represented by the trie contains several instances of that sequence. This also applies for sequences with high order symbols, but in this case, the leaf counts take into account the multiplicity implied by them. For instance, removing the sequence 'ANNT' will decrease the number of represented string (the leaf count of the root) by 16.

Parameters:
sequence - The sequence to be removed.
Returns:
true/false whether the sequence was/wasn't removed.

deleteSequence

public boolean deleteSequence(Sequence sequence,
                              int beginIndex,
                              int endIndex)
Deletes a subsequence of a given sequence from the trie.

Parameters:
sequence - The base sequence.
beginIndex - The start position of the subsequence.
endIndex - The end position (not inclusive) of the subsequence.
Returns:
true/false whether the subsequence was/wasn't removed.

countOccurrences

public Pair<java.lang.Integer,java.lang.Integer> countOccurrences(Sequence pattern,
                                                                  int beginIndex,
                                                                  int endIndex,
                                                                  jbil.sequence.Trie.TrieNode node)
Counts the number of occurrences of the patterns pattern[beginIndex..endIndex-2] and pattern[beginIndex..endIndex-1] respectively, as prefixes of the words represented by the subtrie rooted at a given node. If beginIndex=endIndex, then the first element of the returned pair will be zero.

Parameters:
pattern - The base pattern.
beginIndex - The start position of the subpattern to be counted.
endIndex - The end position (not inclusive) of the subpattern to be counted.
node - The node at which the search must begin.

countOccurrences

public Pair<java.lang.Integer,java.lang.Integer> countOccurrences(Sequence pattern,
                                                                  int beginIndex,
                                                                  int endIndex,
                                                                  int offset)
Counts the number of occurrences of the patterns pattern[beginIndex..endIndex-2] and pattern[beginIndex..endIndex-1] respectively, as prefixes of the words represented by the subtries rooted at nodes at a given height.

Specified by:
countOccurrences in interface KMerCounter
Parameters:
pattern - The base pattern.
beginIndex - The start position of the subpattern to be counted.
endIndex - The end position (not inclusive) of the subpattern to be counted.
offset - The height of the nodes at which the search must begin.

sequenceCount

public int sequenceCount()
Returns the total number of sequences represented by this trie, taking into account word multiplicity and high order symbols.

Specified by:
sequenceCount in interface KMerCounter

print

public void print()
Prints a string representation of the trie to System.out.