Next: Saving and Exporting, Previous: Creating Structure Plots, Up: Top
Genesis
takes as input one mandatory file, and one optional file:
We have scripts that convert from other popular PCA formats (PLINK, flashpca) to a format the Genesis understands. These scripts are discussed in the section Advice on data formats below. We hope that in future versions of Genesis that this will be handled natively.
To input Eigenstrat files, click File
, click New PCA
or the New
PCA
button on the toolbar. On the screen that opens, click Import
Data File
and navigate to the PCA data file outputted by the Eigenstrat
software. Then optionally click Import Phenotype File
and
navigate to the phenotype data file. To input SNPRelate Data, click
File
click New PCA
or the New PCA
button on the toolbar. On
the screen that opens, click Import Data File
and navigate to the
PCA data file outputted by the Relate package. The relate package file
includes the phenotype information in the data file.
In the drop-down menus, select the 2 or 3 PCAs to plot as the axes and
select the column of the phenotype file that will be used to group the
data. To draw the graph, click Finish
or click Next
to
access the Appearance Options menu (See below...).
If a graph has already been plotted, and you wish to change the PCAs to be the axes or the column of the phenotype data to represent the data, you can access the initial menu by clicking the Data Options
button on the toolbar, or clicking Graph
click Data Options
.
The Appearance Options menu can be accessed through the New PCA
button by clicking Next
after importing the files or by clicking
the Appearance Options
button on the toolbar, or clicking
Graph
click Appearance Options
.
To set/change the heading, open the Appearance Options
menu and type the heading into the text box that says “Set Heading.” To change the font of the heading, click Select Heading Font
and select the font of choice.
To show/hide the border open the Appearance Options
menu and (un)check the Show Border
checkbox.
To show/hide the axes, axis labels, grid and scale, open the Appearance Options
menu and (un)check the relevant checkboxes.
To set the position of the key, open the Appearance Options
menu and select the key position from the drop down menu. To hide the key, select No Key
from the drop down menu. To change the font of the key, click Select Key Font
and select the font of choice.
Other settings and options can be changed by interacting with the graph. Certain elements can be clicked to view or modify their options.
To select an individual subject on the structure plot, click on the subject on the plot. This will bring up a subject menu where you can view the data about the subject (from the phenotype data file) and gives options to change the subject's icon as well as to delete the subject and hide or place the subject on top.
To change an individual subject's icon, click the subject on the graph and from the dialog that opens, select the icon shape and colour. Selecting the shape Default
will set the shape of the icon to the group's shape and checking Clear all icon data specific to this individual
will set the icon's shape and colour to the group's.
To place an individual subject on top, click the subject on the graph and from the dialog that opens, check the Place this individual on top
checkbox and click Done
.
To change a population group's name, click on the group in the key. This will bring up the Population Group dialog. From here you can set the group name and click Done
.
To change a population group's icon, click on the group in the key. This will bring up the Population Group dialog. From here you can set the shape and colour of the icon and click Done
.
To change a the order of the population groups in the key, click on the group in the key. This will bring up the Population Group dialog. From here you can click Shift Up
or Shift Down
shift the group's order in the key.
To create a label and annotate the graph, right click on the graph where the label is to be placed, and click Create Label at Mouse Pointer
. Then enter the label's text and click OK
to place the label.
To edit a label that has been created, click on the label to bring up the Label dialog. From this dialog you can edit the label's text, reposition the label or delete the label. Labels can be moved: pressing the shift key while dragging creates a snap to grid feature.
Genesis
provides very simple functionality for adding lines and
arrows. Click on the icon and the place the line or arrow by (a)
clicking on the point where the line/arrow should start, (b)
dragging to the point where the line should finish, and (c)
releasint the mouse. Currently no line editing functionality is
provided, other than being able to delete a a line.
To hide a subject from the graph, click on the subject on the graph to bring up the subject dialog. From this menu you can choose to hide the subject from the graph.
To hide a population group from the graph, click on the group in the key to bring up the population group dialog and check the box labelled Hide this group from the graph
.
To reshow a subject that has been hidden, click on the Select Hidden Individuals or Groups
button and from the drop down menu that pops up, select the name of the subject or group that you wish to show. From the subject dialog you can now uncheck the Hide this Individual from the Graph
or Hide this group from the Graph
checkbox.
To search for a subject in the graph by name, click the Search for individual
button in the toolbar. In the dialog, enter the Name (first, last or both) of the individual you wish to find and click Ok
. If the individual was found in the data, it will be selected and the subject dialog for that individual will open. If the individual was not found, a message will display.
To rotate a 3D PCA plot, click the Show/Hide 3D PCA Rotate Panel
button in the toolbar. This will bring up the rotate panel which contains a slider which can be dragged to rotate the graph about the z-axis.
Eigenstrat is directly supported by the Genesis.
The SNPRelate R package of Zheng et al [2012] can be used to do PC-analysis. However, since it is an R-package there is no SNPRelate default format since output is fully programmable in R. We support the following output: a file that contains the eigenvalues, followed by the eigenvectors, produced using the following R commands.
pca <- snpgdsPCA(genofile,snp.id=snpset) write.table(pca$eigenval,"pca.rel",sep="\t",quote=FALSE) tab1 <- data.frame(sample.id = pca$sample.id, pop = factor(pop_code)[match(pca$sample.id, sample.id)], EV1 = pca$eigenvect[,1], EV2 = pca$eigenvect[,2], EV3 = pca$eigenvect[,3], EV4 = pca$eigenvect[,4], EV5 = pca$eigenvect[,5], EV6 = pca$eigenvect[,6], EV7 = pca$eigenvect[,7], EV8 = pca$eigenvect[,8], EV9 = pca$eigenvect[,9], EV10 = pca$eigenvect[,10], stringsAsFactors = FALSE) write.table(tab1,"pca.rel",sep="\t",quote=FALSE,append=TRUE)
FlashPCA is designed to perform PCA on very large data set. It takes as
input a plink BED and BIM file and produces eigenvectors or principal
components. We have a script flashpca2evec which converts the
data into a format that Genesis
can read. Because the flashpca
output has no information about the sample IDs, flashpca2evec
also needs the fam file as input. This script requires Python 2.7.
By default, flashpca calls its output files eigenvalues.txt and eigenvectors.txt and this is (by default) what flash2pca expects. For example:
flashpca2evec --fam data.fam --out data.evec
However, if the files have other names, the appropriate flags can be used
flashpca2evec --fam data.fam --eigenval file1.evals --eigenvec sample.csv --out data.evec
PLINK 2 [Purcell and Chang 2014] (and its alpha release plink 1.9) supports PCA directly. Genesis can handle these files natively but assumes that the default plink's default naming convention is used (e.g., a .eigenvec suffix). If this is not followed, Genesis will not be able to recognise the file type. Thus plink2evec is bundled for that purpose.
plink2evec converts the plink output files into the format that Genesis can read.
By default PLINK calls its output files plink.eigenval and plink.eigenvec and this is (by default) what plink2evec expects. For exmaple:
plink2evec --out result.pca.evec
However, if the files have other names, the appropriate flags can be used
plink2evec --eigenval file1.evals --eigenvec sample.csv --out data.evec
And if as is common in plink usage, the eigenvectors and eigenvalues
file was specified by using the plink --out
flag, then
plink2evec can use its --bfile
flag
plink --bfile sample --pca --out sample plink2evec --bfile sample --out sample.pca.evec