Next: Format of input files, Previous: Doing clustering, Up: Running wcd
This can be used to merge two known clusterings. The input is two FASTA files with the sequences and two files that give the clusterings.
It assumed that the two FASTA files are disjoint.
Usage: wcd [--merge,-m] <seqf1> <clf1> <seqf2> <clf2>
merge two clusterings
Here you merge two clusterings that have already been computed. The four arguments are: the first FASTA file, the first clustering file, the second FASTA file, and the second clustering file. These are mandatory.
The --constraint option may be of particular use here. This can
be used to constrain the first input file and its related
clustering. You can use the --constraint2 option to constrain the
second input file.
The files that specify the clustering must be in the same format as produced by the compressed clustering format.. The sequences are referred to by index number (the position of the sequence in the input file), numbered from 0. Each cluster is given on a line by itself terminated by a full stop: the indices of the sequences in the cluster are printed out, separated by spaces.
The output is a a new cluster table in the same format as the input cluster table. The indices shown in the table are:
Another useful option for merging is:
[--constraint2, -k] filenameGive the constraint file for the second input data file (it. This is optional. The constraint file enables you to ensure that certain sequences are not clustered together or to ignore certain sequences while clustering. See Format of Constraint File, which gives more details on the required format of the constraint file, and the semantics.
This can be used to add a number of new sequences to an existing cluster. It is assumed that the new sequences do not exist in the original file.
The input is two FASTA files and a cluster table for the first file. The remarks above apply here.
Usage: wcd [--recluster,-r] <clf1> <seqf1>
recluster from a more stringent clustering
This takes a clustering based on a more lenient (or just different)
criterion and reclusters using d^2-scores as the basis for
clustering. The clustering as given by the input cluster table is given
as a scheme. For each cluster of the initial cluster table, wcd
does a d2-clustering on the sequences in that cluster, ignoring all the
other sequences. wcd will never compare the sequences in one
cluster with the sequences in another. The resulting clustering is
therefore a finer partition.