How to interpret reference and alternative alleles from raw plink data?
0
0
Entering edit mode
2.3 years ago
genqs • 0

I am getting some raw LD data from plink using this line:

plink --bfile chr1file --recode A --chr 1 --from-bp 123456 --to-bp 987654 --maf 0.001 --out gene_locus_ld.txt

The output gene_locus_ld.txt.raw file contains a layout for example like this:

FID IID PAT MAT SEX PHENOTYPE 
1:123456:GC:G_GC 1:123454:T:TGTC_TGTC 1:12343:A:G_G 1:12345:A:G_G 1:1234:G:A_G 
1:12345:G:T_T 1:50226471:G:A_A 1:123453:C:T_T 1:12341536:C:T_T

My question is, for each SNP ID like "1:12343:A:G_G" which letter out of the 3 here is the reference and which is the alternative allele? Is it the letters separated by ":" or the letters separated by "_"? So in this example would I take A:G or G_G?

I have read about the raw file in plink's documentation but I'm not sure if maybe the answer is there and I'm just not getting it as I don't have the same rsID output they outline:

.raw (additive + dominant component file)
Produced by "--recode A" and "--recode AD", for use with R. This format cannot be loaded by PLINK.

A text file with a header line, and then one line per sample with V+6 (for "--recode A") or 2V+6 (for "--recode AD") fields, where V is the number of variants. The first six fields are:

FID   Family ID
IID   Within-family ID
PAT   Paternal within-family ID
MAT   Maternal within-family ID
SEX   Sex (1 = male, 2 = female, 0 = unknown)
PHENOTYPE Main phenotype value

This is followed by one or two fields per variant:

<Variant ID>_<counted allele> Allelic dosage (0/1/2/'NA' for diploid variants, 0/2/'NA' for haploid)
<Variant ID>_HET  Dominant component (1 = het, 0 otherwise). Requires "--recode AD".

If 'include-alt' was specified, the header line also names alternate allele codes in parentheses, e.g. 'rs5939319_G(/A)'.

plink GWAS LD genomics • 650 views
ADD COMMENT

Login before adding your answer.

Traffic: 3344 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6