Next: , Previous: Installing and Running Genesis, Up: Top


3 Structure Plots

3.1 Data input format

Genesis requires two input files and an optional third file:

3.2 Inputting Data

Genesis produces structure charts by taking input from Admixture or CLUMPP tools, together with a PLINK-style fam and optionally a phenotype file which would contain population labels.

To input these files, click File click New Admixture or the New Admixture button on the toolbar. On the screen that opens, click Import Data File and navigate to the admixture data file outputted by the Admixture/CLUMPP tools. Then click Import Fam File and navigate to the fam data file. Finally, click Import Pheno File and navigate to the phenotype data file.

You can import multiple data files into the same project by clicking Import Data File again. These files can be imported in any order relative to the importing of the fam and pheno file.

In the drop-down menu select the column of the phenotype file that will be used to group the data. To draw the graph, click Finish or click Next to access the Appearance Options menu (See below...).

Editing Phenotype Column/Importing Additional Data Files

If a graph has already been plotted, and you wish to change the column of the phenotype data used to group the data or import additional data files, you can access the initial menu by clicking the Data Options button on the toolbar, or clicking Graph click Data Options.

3.3 Appearance Options

The Appearance Options menu can be accessed through the New Admixture button by clicking Next after importing the files or by clicking the Appearance Options button on the toolbar, or clicking Graph click Appearance Options.

3.4 Interacting with the Graph

Other settings and options can be changed by interacting with the graph. Certain elements can be clicked to view or modify their options.

3.5 Annotating the Graph

3.6 Useful scripts

3.6.1 structure2CLUMPP: Wrapper script for the Structure tool

Genesis supports Admixture Q files and CLUMPP output files natively. CLUMPP's output format is a derivative of the Structure tool's output. (Thus our naming convention is somewhat inaccurate, since CLUMPP format is really a sub-part of the Structure format). When Structure runs, it produces log information, summary information about the population and the inferred ancestry of individuals. This inferred ancestry is what we want and can be found in the middle of the output file.

The script structure2CLUMPP takes one mandatory argument, the file name of a structure output file.

python structure2CLUMPP testdata1.out_f 

By default, output is placed on standard output. The --outbase flag can be used to set the base of the output name. The script appends to this base the suffix K.Q, where K is the number of columns in the ancestry file (so as to be consistent with admixture).

python structure2CLUMPP --outbase data testdata1.out_f 

If there are 4 columns in the file testdata1.out_f, then this will produce a file data.4.Q.

The script also has a flag --produce-fam which can be used to produce a bare bones fam file if needed.

The full usage is

usage: structure2CLUMPP [-h] [--outbase OUTBASE] [--produce-fam FAM_NAME] N

produce admixture style output from structure output

positional arguments:
  N                     structure file

optional arguments:
  -h, --help            show this help message and exit
  --outbase OUTBASE     output file name base (default output to standard out)
  --produce-fam FAM_NAME
                        produce fam file

Technical details. The output of the structure program contains various information, including log and FST data. Immediately after the lines that start

Inferred ancestry of individuals:
        Label (%Miss) Pop:  Inferred clusters

follow the inferred ancestry of each individual followed by blank line. This is what we want. An extract might look something like this:

  7        7    (0)    1 :  0.017 0.014 0.970 
  8        8    (0)    1 :  0.009 0.005 0.986 
  9        9    (0)    1 :  0.353 0.116 0.531 

Here K=3, and the ancestry of each individual can be found to the right of the colon (for our purposes we can ignore what's to the left of the colon). The structure2CLUMPP script extracts out this part of the output from the structure program and produces output which could be output from CLUMPP. The Genesis program can read this input.

3.6.2 sortfamQwithin.py — ordering invididuals by colour

Although Genesis directly supports some ordering by individuals, we have a script that has more sophisticated functionality. In time this will be migrated into Genesis. Usually individuals are ordered in the fam file by group, but within the group the ordering is arbitrary. In admixed populations this may lead to confusing pictures because individuals who are adjacent to each other may have different admixtures so one gets a very jagged chart where a pattern is hard to discern. The script sortfamQwithin.py consistently sorts a fam and Q file(s) so that within each group, the individuals are sorted by the dominant ancestral population for that group. This script is documented here:

http://www.bioinf.wits.ac.za/software/poputils/

3.6.3 Other scripts

There are other useful scripts like fams2phe and popifyfam.py which can be used to create phenotype files in http://www.bioinf.wits.ac.za/software/poputils/