Scott Hazelhurst

Supplementary materials

This contains the supplementary material for

Full results of the curated data set

Using Ensembl, we randomly selected 34 non-overlapping genes on the mouse chromosome 4, and used BLAST in dbEST to find ESTs that matched, producing in total 2294 ESTs. We created four reference clusterings M60, M100, M150 and M200: in cluster M$x$, all ESTs that matched a particular gene with at least a score of $x$ were clustered together. The performance of \wcd and PaCE are shown below.
      Sensitivity       Jaccard Index
      PaCE   \wcd       PaCE  \wcd  
M60  0.58    0.66  	0.58 	0.66 
M100 0.62    0.70  	0.61 	0.68 
M150 0.66    0.73       0.64 	0.70 
M200 0.70    0.75  	0.65 	0.67 

Effects of choosing parameters

We made the decision to test each tool using the default parameters of the tools as provided. The same was done to wcd and we made no attempt to tune wcd for this paper. However, we did carry out some experimentation with one parameter each for PaCE and wcd. The sensitivity and Jaccard index are shown.
Effect of PacE's EndToEndScoreRatioThreshold (EES)
EESSEJI
7 0.84 0.64
10 0.87 0.61
Effect of wcd's window length paramter (l)
l SE JI
100 0.93 0.46
125 0.91 0.61
150 0.86 0.71
200 0.77 0.71

The EES ratio was adjusted as suggested by Ananth Kalyanraman. Although these results are interesting and a fuller exploration is desirable, for space reasons we have cut them from the paper.

User interface through EMBOSS

The EMBOSS wrappers allows wcd to be installed as part of EMBOSS. We use the wEMBOSS web front end. This allows remote users to use our EMBOSS system over the internet and interact with wcd through a GUI. There are certain features of wcd currently not supported, like MPI. This picture below shows
EMBOSS GUI for wcd

Performance of Pthreads version

Performance of wcd 4.5 on the Public Cotton Data set using three different computer systems.