12/16/2003 Alexander Schliep

Outline: Chosing an initial model collection automatically

I. Initial clustering
---------------------

Perform k-means clustering using Euclidean distances with noise clusters

- choose initial means:
  Rank profiles p according to change(p) := max(p) - min(p).

  a) Low change() values  

  [Filter out near constants (i.e. max(p) - min(p) < eps) profiles
  alltogether. Maybe even dont cluster them: They are unaffected
  by time (Christine)]

  Collect near-constant (NC) profiles, i.e., change(p) < eps, in NC

  Replace profiles in NC by their average value

  Find optimal number of representatives (BIC assuming normal
  distribution of averages, prior on sigma ...) k_NC
	   
  NC means: for each representative, create a vector of proper
  length just consisting of the mean.

  b) High change() value

  Consider the following subsets of C = {p| change(p) > c}
  C-   set of profiles where min(p) < 0 and max(p) < eps
  C+   set of profiles where min(p) > -eps
  C+-  set of profiles where min(p) < -eps and max(p) > eps
	
  Sample k times according to the cardinalities |C-|, |C+| and |C+-|
  from the sets C-,  C+, C+-. Use samples as initial means

  [Maybe it would be better to look at the P-, P+, P+- first
  and then form subsets C- etc., maybe taking rank threshold
  instead of value threshold]

  c) Clustering

  Perform (k + k_NC)-clustering using Euclidean distance with the
  modified assignment rule (only assign if distance is below some
  threshold, else assign to noise cluster).
	     
  For a fixed k: Repeat b) and c) a number of times (100) and pick the
  clustering with minimal within cluster variation among the 100.

  Repeat for k in [3, 30] and choose some sort of maximal penalized
  within cluster variation (the more clusters, the less variations).


II. Infer linear models
-----------------------

Given an intial clustering infer linear models the following way.
(adaption of Stolcke's Bayesian Model Merging).

For each cluster from C do the following:

a) Build full length (nr states == nr observations) linear model w/o
   loops. Run Baum-Welch to completion (bound variance from below)

b) For each subsequent pair of states compute a distance between the
   observed emissions. Alternatively, use chi^2-test.

c) Merge the pairs of subsequent states of minimal distance, if 
   above an eps_{merge}-threshold.

   Run Run Baum-Welch to completion (bound variance from below)

Iterate b) + c) until no more merge-pairs are found
[eps_{merge} could be an decreasing function in the number of states,
to stop merging earlier]
 
For clusters from NC choose single-state models.


III. Infer looped models
------------------------

Looped models have a transition from last to first.

Try to merge 0-crossing states, re-estimate and see if 
the loss of likelihood is acceptable.


