jbil.sequence
Interface SequenceModel

All Known Subinterfaces:
FixedLengthSequenceModel
All Known Implementing Classes:
AbstractFixedLengthSequenceModel, MarkovFixedLengthSequenceModel, UniformFixedLengthSequenceModel

public interface SequenceModel

Generic interface for a sequence model. A sequence model defines a probability distribution over a set of sequences over a given alphabet.

Author:
Paulo G. S. da Fonseca

Method Summary
 Alphabet getAlphabet()
          Returns the base alphabet of the modelled sequences.
 double likelihood(Sequence word)
          Returns the likelihod of the given word under this model.
 double likelihood(Sequence word, int beginIndex, int endIndex)
          Returns the likelihod of the subword of the given word starting at beginIndex and ending at endIndex-1.
 double likelihoodThreshold(double significance)
          Computes the likelihood threshold for the given significance with a default null model.
 double likelihoodThreshold(double significance, SequenceModel nullModel)
          Given p in [0,1], we define the p-value of p as the as the probability under a null model for a sequence to have a likelihood greater or equal to p.
 double positionProbability(int position, Sequence neighbourhoodSeq, int beginIndex, int endIndex, int letterPosition)
          Computes the probability of observing a letter in a given position of the sequence.
 double prefixLikelihood(Sequence sequence)
          returns the likelihod of the given prefix under this model, that is, the sum of the probabilities of all words starting with the given prefix.
 double prefixLikelihood(Sequence sequence, int beginIndex, int endIndex)
          Returns the likelihod of the given prefix under this model, that is, the sum of the probabilities of all words starting with the subword of sequence starting at position beginIndex and ending at position endIndex-1.
 double pvalue(Sequence word)
          Returns the p-value of the given word with a default null model (usually the uniform null model).
 double pvalue(Sequence word, int beginIndex, int endIndex)
          Returns the p-value of the subword of the given word starting at beginIndex and ending at endIndex-1 with a default null model (typically the uniform model).
 double pvalue(Sequence word, int beginIndex, int endIndex, SequenceModel nullModel)
          Returns the p-value of the subword of the given word starting at beginIndex and ending at endIndex-1.
 double pvalue(Sequence word, SequenceModel nullModel)
          The p-value of a given sequence is defined as the probability under a null model for a sequence to have a likelihood greater or equal to the likelihood of the given sequence under this model.
 Sequence sample(int length)
          Samples a sequence from this model.
 Sequence[] sampleN(int sampleSize, int length)
          Samples a set of i.i.d.
 

Method Detail

getAlphabet

Alphabet getAlphabet()
Returns the base alphabet of the modelled sequences.


sample

Sequence sample(int length)
Samples a sequence from this model.

Parameters:
length - The length of the sequence to be sampled.

sampleN

Sequence[] sampleN(int sampleSize,
                   int length)
Samples a set of i.i.d. sequences from this model.

Parameters:
sampleSize - The number of sampled sequences.
length - The length of the sampled sequences.

likelihood

double likelihood(Sequence word)
Returns the likelihod of the given word under this model.


likelihood

double likelihood(Sequence word,
                  int beginIndex,
                  int endIndex)
Returns the likelihod of the subword of the given word starting at beginIndex and ending at endIndex-1.


prefixLikelihood

double prefixLikelihood(Sequence sequence)
returns the likelihod of the given prefix under this model, that is, the sum of the probabilities of all words starting with the given prefix.


prefixLikelihood

double prefixLikelihood(Sequence sequence,
                        int beginIndex,
                        int endIndex)
Returns the likelihod of the given prefix under this model, that is, the sum of the probabilities of all words starting with the subword of sequence starting at position beginIndex and ending at position endIndex-1.


positionProbability

double positionProbability(int position,
                           Sequence neighbourhoodSeq,
                           int beginIndex,
                           int endIndex,
                           int letterPosition)
Computes the probability of observing a letter in a given position of the sequence. For that, it may be necessary to indicate a sequence containig a neighbourhood of the letter.

Parameters:
position - the position at which the letter appears.
neighbourhoodSeq - A sequence containing the neighbourhood.
beginIndex - The start of the neighbourhood within neighbourhoodSeq.
endIndex - The end of the neighbourhood within neighbourhoodSeq.
letterPosition - The position of the target letter within neighbourhoodSeq.
Returns:
Pr(X[position]=neighbourhoodSeq[letterPosition] | neighbourhood of X[position] = neighbourhoodSeq[beginIndex..endIndex-1])

pvalue

double pvalue(Sequence word)
Returns the p-value of the given word with a default null model (usually the uniform null model).


pvalue

double pvalue(Sequence word,
              int beginIndex,
              int endIndex)
Returns the p-value of the subword of the given word starting at beginIndex and ending at endIndex-1 with a default null model (typically the uniform model).

See Also:
pvalue(Sequence, SequenceModel)

pvalue

double pvalue(Sequence word,
              SequenceModel nullModel)
The p-value of a given sequence is defined as the probability under a null model for a sequence to have a likelihood greater or equal to the likelihood of the given sequence under this model.

Suppose we have an observed sequence X and we want to perform the statistical test of whether 'X was sampled from the null model' (null hypothesis) against the alternative hypothesis 'X was sampled from this model'. Then the p-value of X is used reject the null hypothesis if it falls below a significance threshold established of the test.

Parameters:
word - The word whose p-value is to be calculated.
nullModel - The null model.
Returns:
The p-value of the given word defined as above with the given null model.

pvalue

double pvalue(Sequence word,
              int beginIndex,
              int endIndex,
              SequenceModel nullModel)
Returns the p-value of the subword of the given word starting at beginIndex and ending at endIndex-1.

See Also:
pvalue(Sequence, SequenceModel)

likelihoodThreshold

double likelihoodThreshold(double significance)
Computes the likelihood threshold for the given significance with a default null model.

See Also:
likelihoodThreshold(double, SequenceModel)

likelihoodThreshold

double likelihoodThreshold(double significance,
                           SequenceModel nullModel)
Given p in [0,1], we define the p-value of p as the as the probability under a null model for a sequence to have a likelihood greater or equal to p. The p-value is a monotonically non-increasing function of p. This method computes the value
t := max { x in [0,1] | p-value_m0(x;m) >= p }

Parameters:
significance - The p of the description above.
nullModel - The null model m0 of the description above.