How to Convert Illumina Final Report File To Plink
0
0
Entering edit mode
9 months ago
g744695539 • 0

Hi, all!

I only have one such file. How can I convert it to a format that plink can use?

[Header]
GSGT Version    2.0.4
Processing Date 2/14/2022 10:18 AM
Content         GSAMD-24v2-0_20024620_A1.bpm
Num SNPs        759993
Total SNPs      759993
Num Samples     1104
Total Samples   1104
[Data]
SNP Name        Sample Group    Sample Name     Sample ID       Sample Index    GC Score        Theta   R       B Allele Freq   Log R Ratio
1:100292476                     1       1       0.2752  0.205   0.962   0.0000  -0.0748
1:101064936                     1       1       0.2564  0.137   3.483   0.0000  0.2236
1:103380393                     1       1       0.8938  0.950   0.681   0.9889  -0.2339
1:104303716                     1       1       0.4682  0.970   0.751   1.0000  -0.1289
1:104864464                     1       1       0.2388  0.740   1.837   0.8692  0.1015
1:106737318                     1       1       0.8230  0.024   0.585   0.0000  -0.0870
Illumina plink • 1.2k views
ADD COMMENT
0
Entering edit mode

try this tool: SNPchimpRepo

ADD REPLY
0
Entering edit mode

Hi, I see that this tool requires two inputs, but I don’t have the SNP_Map (original from Illumina).

ADD REPLY
0
Entering edit mode

build a Map file by yourself, note that the SNP Name in the finalreport file must be included in the map file, otherwise an error will be reported when converting

ADD REPLY
0
Entering edit mode

I downloaded the GSAMD-24v2-0_20024620_A1-b37.strand file from https://www.well.ox.ac.uk/~wrayner/strand/index.html and generated a map file. Then, based on the strand file, I added the Allele1 - Top and Allele2 - Top columns to the final report. For example, for 1:100292476, both Allele1 - Top and Allele2 - Top are A because BAF is 0. Similarly, for 1:104303716, both Allele1 - Top and Allele2 - Top are G. Please see if there is anything wrong

head GSAMD-24v2-0_20024620_A1-b37.SnpMap.txt
Index   Name    Chromosome  Position    SNP
1   1:100292476 1   100292476   [A/G]
2   1:101064936 1   101064936   [A/G]
3   1:103380393 1   103380393   [A/G]
4   1:104303716 1   104303716   [A/G]
5   1:104864464 1   104864464   [A/G]
...
164 1:156105002 1   156105002   [D/I]
167 1:156107472 1   156107465   [D/I]
168 1:156108399 1   156108399   [D/I]
[Header]
GSGT Version    2.0.4
Processing Date 2/14/2022 10:18 AM
Content         GSAMD-24v2-0_20024620_A1.bpm
Num SNPs        759993
Total SNPs      759993
Num Samples     1104
Total Samples   1104
[Data]
SNP Name        Sample Group    Sample Name     Sample ID       Sample Index    GC Score        Theta   R       B Allele Freq   Log R Ratio    Allele1 - Top    Allele2 - Top
1:100292476                     1       1       0.2752  0.205   0.962   0.0000  -0.0748    A    A
1:101064936                     1       1       0.2564  0.137   3.483   0.0000  0.2236    A    A
1:103380393                     1       1       0.8938  0.950   0.681   0.9889  -0.2339    A    G
1:104303716                     1       1       0.4682  0.970   0.751   1.0000  -0.1289    G    G
1:104864464                     1       1       0.2388  0.740   1.837   0.8692  0.1015    A    G
ADD REPLY
0
Entering edit mode

Note sure if this would be useful: https://--gist.github.com/RyanSchu/301ea0a77a21414391b54193dfcea9e0

Please take out the -- before gist in the link to see code. Added that so biostars does not render the code in-line since it is not my code.

ADD REPLY

Login before adding your answer.

Traffic: 2675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6