jbil.sequence
Class MarkovFixedLengthSequenceModel

java.lang.Object
  extended by jbil.sequence.AbstractFixedLengthSequenceModel
      extended by jbil.sequence.MarkovFixedLengthSequenceModel
All Implemented Interfaces:
FixedLengthSequenceModel, SequenceModel

public class MarkovFixedLengthSequenceModel
extends AbstractFixedLengthSequenceModel

Probabilistic model for sequences of a fixed length W, in which they are assumed to be generated from a non-homogeneous Markov chain of order K, that is
Pr(x1...xW) = prod_{i=1}^W Pr(xi|xi-1...xi-K).

Author:
Paulo G. S. da Fonseca

Field Summary
static int PVALUE_BRANCH_AND_BOUND
          Branch & bound p-value computation mode.
static int PVALUE_BRUTE_FORCE
          Brute force p-value computation mode.
static int PVALUE_ITERATIVE_REFINEMENT
          Iterative refinememnt p-value computation mode.
 KMerCounter transitionsTree
           
 
Constructor Summary
MarkovFixedLengthSequenceModel(Alphabet alphabet, int length, int order, KMerCounter transitionsTree)
          Constructs a new K-order Markov sequence model.
 
Method Summary
 Alphabet getAlphabet()
          Returns the base alphabet of the modelled sequences.
 int getPvalueMode()
          Gets the p-value computation mode.
 int length()
          Returns the length of the modelled sequences.
 double likelihood(Sequence word, int beginIndex)
          Returns the likelihod of the subword of the given word starting at beginIndex and with the appropriate length.
 double likelihoodThreshold(double significance, SequenceModel nullModel)
          Given p in [0,1], we define the p-value of p as the as the probability under a null model for a sequence to have a likelihood greater or equal to p.
 double positionProbability(int position, Sequence neighbourhoodSeq, int beginIndex, int endIndex, int letterPosition)
          Computes the probability of observing a letter in a given position of the sequence.
 double prefixLikelihood(Sequence sequence, int beginIndex, int endIndex)
          Returns the likelihod of the given prefix under this model, that is, the sum of the probabilities of all words starting with the subword of sequence starting at position beginIndex and ending at position endIndex-1.
 double pvalue(Sequence word, int beginIndex, SequenceModel nullModel)
          Returns the p-value of the subword of the given word starting at beginIndex and with the appropriate length.
 Sequence sample()
          Samples a sequence with the apropriate length from this model.
 void setObservedDataWeight(double weight)
          A uniform pseudocount of ~ [length-order-1]/[samplesize^observedDataWeight] is added to each kmer.
 void setPvalueMode(int pvalueMode)
          Sets the p-value computation mode.
 
Methods inherited from class jbil.sequence.AbstractFixedLengthSequenceModel
likelihood, likelihood, likelihoodThreshold, prefixLikelihood, pvalue, pvalue, pvalue, pvalue, pvalue, sample, sampleN, sampleN
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PVALUE_BRUTE_FORCE

public static final int PVALUE_BRUTE_FORCE
Brute force p-value computation mode.

See Also:
Constant Field Values

PVALUE_BRANCH_AND_BOUND

public static final int PVALUE_BRANCH_AND_BOUND
Branch & bound p-value computation mode.

See Also:
Constant Field Values

PVALUE_ITERATIVE_REFINEMENT

public static final int PVALUE_ITERATIVE_REFINEMENT
Iterative refinememnt p-value computation mode.

See Also:
Constant Field Values

transitionsTree

public KMerCounter transitionsTree
Constructor Detail

MarkovFixedLengthSequenceModel

public MarkovFixedLengthSequenceModel(Alphabet alphabet,
                                      int length,
                                      int order,
                                      KMerCounter transitionsTree)
Constructs a new K-order Markov sequence model.

Parameters:
alphabet - The base alphabet
length - The length of the represented words
order - The order of the MC
transitionsTree - The counter of k-mers necessary for the computation of transition probabilities
Method Detail

getAlphabet

public Alphabet getAlphabet()
Description copied from interface: SequenceModel
Returns the base alphabet of the modelled sequences.


length

public int length()
Description copied from interface: FixedLengthSequenceModel
Returns the length of the modelled sequences.


setObservedDataWeight

public void setObservedDataWeight(double weight)
A uniform pseudocount of ~ [length-order-1]/[samplesize^observedDataWeight] is added to each kmer. This method is used to adjust the weight of the observed data. If the Double.POSITIVE_INFINITY value is used, an infinitesimal pseudocount is added.


likelihood

public double likelihood(Sequence word,
                         int beginIndex)
Description copied from interface: FixedLengthSequenceModel
Returns the likelihod of the subword of the given word starting at beginIndex and with the appropriate length.


prefixLikelihood

public double prefixLikelihood(Sequence sequence,
                               int beginIndex,
                               int endIndex)
Description copied from interface: SequenceModel
Returns the likelihod of the given prefix under this model, that is, the sum of the probabilities of all words starting with the subword of sequence starting at position beginIndex and ending at position endIndex-1.


positionProbability

public double positionProbability(int position,
                                  Sequence neighbourhoodSeq,
                                  int beginIndex,
                                  int endIndex,
                                  int letterPosition)
Description copied from interface: SequenceModel
Computes the probability of observing a letter in a given position of the sequence. For that, it may be necessary to indicate a sequence containig a neighbourhood of the letter.

Parameters:
position - the position at which the letter appears.
neighbourhoodSeq - A sequence containing the neighbourhood.
beginIndex - The start of the neighbourhood within neighbourhoodSeq.
endIndex - The end of the neighbourhood within neighbourhoodSeq.
letterPosition - The position of the target letter within neighbourhoodSeq.
Returns:
Pr(X[position]=neighbourhoodSeq[letterPosition] | neighbourhood of X[position] = neighbourhoodSeq[beginIndex..endIndex-1])

sample

public Sequence sample()
Description copied from interface: FixedLengthSequenceModel
Samples a sequence with the apropriate length from this model.


getPvalueMode

public int getPvalueMode()
Gets the p-value computation mode.


setPvalueMode

public void setPvalueMode(int pvalueMode)
Sets the p-value computation mode.

Parameters:
pvalueMode - the pvalueMode to set

pvalue

public double pvalue(Sequence word,
                     int beginIndex,
                     SequenceModel nullModel)
Description copied from interface: FixedLengthSequenceModel
Returns the p-value of the subword of the given word starting at beginIndex and with the appropriate length.

See Also:
SequenceModel.pvalue(Sequence, SequenceModel)

likelihoodThreshold

public double likelihoodThreshold(double significance,
                                  SequenceModel nullModel)
Description copied from interface: SequenceModel
Given p in [0,1], we define the p-value of p as the as the probability under a null model for a sequence to have a likelihood greater or equal to p. The p-value is a monotonically non-increasing function of p. This method computes the value
t := max { x in [0,1] | p-value_m0(x;m) >= p }

Parameters:
significance - The p of the description above.
nullModel - The null model m0 of the description above.