Question: (Closed) Merging 2 files based on the 3rd mapping file
0
gravatar for inkprs
18 months ago by
inkprs60
inkprs60 wrote:

Hi,

I need to join 2 VCF files - file1.vcf and file2.vcf based on the mapping file - file.coords.

I have below file file.coords generated via mummar tool.

[S1]    [E1]    [S2]    [E2]

1       10      100     110
20      30      120     130
30      40      130     140 
40      50      140     150

[S1] start of the alignment region in the reference sequence

[E1] end of the alignment region in the reference sequence

[S2] start of the alignment region in the query sequence

[E2] end of the alignment region in the query sequence

file1.vcf:

#CHROM   POS     ID        REF  ALT    QUAL FILTER  INFO                              FORMAT      NA00001        NA00002        NA00003
chr1     1   rs6054257 G      A       29   PASS   NS=3;DP=14;AF=0.5;DB;H2           GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.
chr1     22  .         T      A       3    q10    NS=3;DP=11;AF=0.017               GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3   0/0:41:3
chr1     45      rs6040355 A      G,T     67   PASS   NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2   2/2:35:4

file2.vcf:

#CHROM   POS     ID        REF  ALT    QUAL FILTER  INFO                              FORMAT      NA00001        NA00002        NA00003
chr1     100     rs6054257 G      A       29   PASS   NS=3;DP=14;AF=0.5;DB;H2           GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.
chr1     145     .         T      A       3    q10    NS=3;DP=11;AF=0.017               GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3   0/0:41:3
chr1     122     rs6040355 A      G,T     67   PASS   NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2   2/2:35:4

Here, 1 maps to 100, 22 maps to 122 and 45 to 145. I need to join these 2 vcf files based on the mapping from file.coords

I am looking for a python / SQL based solution.

sequencing mapping vcf • 525 views
ADD COMMENTlink modified 18 months ago by RamRS19k • written 18 months ago by inkprs60

Hello inkprs!

You have a couple of open questions in which you got helpful suggestions/answers and should provide feedback. For this reason, we have closed your question.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

Interesting guidelines for posting can be found in the following posts:

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink modified 18 months ago • written 18 months ago by WouterDeCoster35k
0
gravatar for RamRS
18 months ago by
RamRS19k
Houston, TX
RamRS19k wrote:

Why SQL?

I'd go with awk for this, but the logic is to pull the coordinate, add 100 to it and match it to the other file - you can implement that using any language that can process tab separated files.

ADD COMMENTlink written 18 months ago by RamRS19k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1407 users visited in the last hour