Entering edit mode
6.9 years ago
inkprs
▴
70
Hi,
I need to join 2 VCF files - file1.vcf and file2.vcf based on the mapping file - file.coords.
I have below file file.coords generated via mummar tool.
[S1] [E1] [S2] [E2]
1 10 100 110
20 30 120 130
30 40 130 140
40 50 140 150
[S1] start of the alignment region in the reference sequence
[E1] end of the alignment region in the reference sequence
[S2] start of the alignment region in the query sequence
[E2] end of the alignment region in the query sequence
file1.vcf:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
chr1 1 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.
chr1 22 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3
chr1 45 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4
file2.vcf:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
chr1 100 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.
chr1 145 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3
chr1 122 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4
Here, 1 maps to 100, 22 maps to 122 and 45 to 145. I need to join these 2 vcf files based on the mapping from file.coords
I am looking for a python / SQL based solution.