Scott Hazelhurst
Supplementary materials
This contains the supplementary material for
- S. Hazelhurst, W Hide, Z. Liptak, R. Nogueira, R. Starfield. An overview
of the
wcd
EST clustering tool. Bioinformatics. doi:
10.1093/bioinformatics/btn203. Abstract, PDF, To appear.
Full results of the curated data set
Using Ensembl, we randomly selected 34 non-overlapping genes on the
mouse chromosome 4, and used BLAST in dbEST to find ESTs that matched,
producing in total 2294 ESTs. We created four reference clusterings
M60, M100, M150 and M200: in cluster M$x$, all ESTs that matched a
particular gene with at least a score of $x$ were clustered together.
The performance of \wcd and PaCE are shown below.
Sensitivity Jaccard Index
PaCE \wcd PaCE \wcd
M60 0.58 0.66 0.58 0.66
M100 0.62 0.70 0.61 0.68
M150 0.66 0.73 0.64 0.70
M200 0.70 0.75 0.65 0.67
Effects of choosing parameters
We made the decision to test each tool using the default parameters of
the tools as provided. The same was done to wcd and we made no attempt
to tune wcd for this paper. However, we did carry out some
experimentation with one parameter each for PaCE and wcd.
The sensitivity and Jaccard index are shown.
Effect of PacE's EndToEndScoreRatioThreshold (EES)
EES | SE | JI |
7 | 0.84 | 0.64 |
10 | 0.87 | 0.61 |
Effect of wcd's window length paramter (l)
l | SE | JI |
100 | 0.93 | 0.46 |
125 | 0.91 | 0.61 |
150 | 0.86 | 0.71 |
200 | 0.77 | 0.71 |
The EES ratio was adjusted as suggested by Ananth
Kalyanraman. Although these results are interesting and a fuller
exploration is desirable, for space reasons we have cut them from
the paper.
User interface through EMBOSS
The EMBOSS wrappers allows wcd to be installed as part of EMBOSS. We
use the wEMBOSS web front end. This allows remote users to use our
EMBOSS system over the internet and interact with wcd through a
GUI. There are certain features of wcd currently not supported, like
MPI. This picture below shows
Performance of Pthreads version
Performance of wcd 4.5 on the Public Cotton Data set using three different computer systems.