Auxiliary data for the H3Agwas workflow : https://github.com/h3abionet/h3agwas
Strand data for the H3A Chip Manifest 3
This data is provided without any other warranty than this is what we used to do alignment against the GrCh 37 build.
- Reference file for the H3AChip manifest A3: Zip file is here (18 January 2018).
The h3a_A3.ref file can be used as the "reference" parameter for the topbottom.nf work flow. It contains the base of the + strand of the Build 37 reference genome at each position of the chip. Please note that there are two other files
- h3a_A3.err: This is the list of SNPs for which we could not align to the reference genome. Aligning is done by mapping the probes that Illumina use against + and - strrands of the reference genome. It the vast majority of cases, the matching is excellent. Some flexibility is allowed to deal with issues such a highly polymorphic regions, indels, homopolymers. But some do not align even allowing generous edit distances. In our case, the bulk of these mistmathces map to multiple positions. See the Illumina mapping information.
- h3_A3.wrn: These do map, but only after relaxing the alignment constraint. They are also in the ref file above. Most of these are OK but again, you should be skeptical!.
Note that all the SNPs in the warning and error files are in the reference file with the base as specified by the reference genome. It is up to you to decide later whether to include or exclude them.
Suggested reading
I recommend the following paper to understand alignment
- S. C. Nelson, K. F. Doheny, C. C. Laurie, and D. B. Mirel, "Is 'forward' the same as 'plus'?...and other adventures in SNP allele nomenclature." Trends in Genetics, vol. 28, no. 8, pp. 361-363, 2012. http://dx.doi.org/10.1016/j.tig.2012.05.002