Comparing two sequences - wcd Manual 0.

Next: Doing clustering, Previous: Identifying sequences, Up: Running wcd

3.6 Comparing two sequences

Warning: These options may be removed in later versions of d2-cluster, and should be treated with caution.

These options are included to allow exploration and evaluation of data rather than for clustering purposes. The problem is that optimisations made for performance reasons have meant that they do not give completely accurate answers. For example, if we find windows where the d2 score is less than a threshold, we announce success; we don't try to find the pair of windows with the smallest overlap. In subsequent releases there may be a separate program which provides these facilities (though with less efficient code).

All these options can also take as options the options which allow changing of threshold, window and word size.

Usage: wcd [--compare|-E] <filename> <ind1> <ind2>
       wcd [--abbrev_compare|-e] <filename> <ind1> <ind2>
       wcd [--pairwise|-e] <filename> <ind1> <ind2>

wcd --compare dataf i j
Compares the sequences i and j from the datafile dataf and prints out the following
- The i-th sequence, the j-th sequence, and the reverse complement of the j-th sequence
- A line with the following information
  - i and j
  - An estimate the number of samples of the j-th sequence which appears in the i-th.
  - An estimate the number of samples of the j-th sequence which appears in reverse complement of the i-th.
  - An estimate the number of words of the j-th sequence which appears in the i-th.
  - An estimate the number of words s of the j-th sequence which appears in reverse complement of the i-th.
  - the d2 score between i and j
  - the d2 score between i and the reverse complement of j
wcd --abbrev_compare dataf i j
Prints the minimum of the: (1) the d2 score of i an j; and (2) the d2 score of i and the reverse complement of j.
wcd --pairwise dataf i j
First prints a table of the d2 scores of all windows of sequence i compared to all windows of sequence j. Then does the same with the RC of sequence j

Usage: wcd [--cluster_compare, -D]  dataf clusterf

Takes two arguments: a data file with sequences, and a file that gives two clusters. This cluster file should contain exactly two lines. Each line should contain the indices of the sequences belonging to a cluster. The indices should be separated by spaces, and the line terminated by a full stop.

The program will then compare each sequence in one cluster with each sequence in the other and print out the d2 scores (both positive and RC). One each line of output there is the result of one comparison. First the two indices are printed out, and then the two d2 scores.