Hi,
I need to join 2 VCF files - file1.vcf and file2.vcf based on the mapping file - file.coords.
I have below file file.coords generated via mummar tool.
[S1] [E1] [S2] [E2]
1 10 100 110
20 30 120 130
30 40 130 140
40 50 140 150
[S1] start of the alignment region in the reference sequence
[E1] end of the alignment region in the reference sequence
[S2] start of the alignment region in the query sequence
[E2] end of the alignment region in the query sequence
file1.vcf:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
chr1 1 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.
chr1 22 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3
chr1 45 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4
file2.vcf:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
chr1 100 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.
chr1 145 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3
chr1 122 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4
Here, 1 maps to 100, 22 maps to 122 and 45 to 145. I need to join these 2 vcf files based on the mapping from file.coords
I am looking for a python / SQL based solution.
Hello inkprs!
You have a couple of open questions in which you got helpful suggestions/answers and should provide feedback. For this reason, we have closed your question.
If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Interesting guidelines for posting can be found in the following posts:
If you disagree please tell us why in a reply below, we'll be happy to talk about it.
Cheers!