Question: Merge two vcf's, keep only intersection of REF/ALT alleles
0
gravatar for just learning
6 weeks ago by
just learning0 wrote:

Hi all,

I would like to merge two vcf files chr1g.vcf.gz and chr1hk.vcf.gz. I would like the resulting file to have only the intersection of the two files upon merge.

chr1g.vcf.gz excerpt:

CHROM .POS ....ID... REF... ALT ... QUAL... FILTER....... INFO

chr1 ..... 10031 .. NA .. T ....... C .. NA .. AS_VQSR ..AC=0,etc

chr1 ..... 10055 .. NA .. T ....... C .. NA .. AS_VQSR ..AC=0,etc

chr1 ..... 10061... NA... T........ C .. NA .. AS_VQSR ..AC=0,etc

chr1 ..... 10061... NA ... T ...... TAACC.. .. NA .. AS_VQSR ...AC=0,etc

chr1 .....10109 ... NA ... A....... T .. NA .. AS_VQSR ..AC=0,etc

chr1 .... 10109... NA... AACCCT A .. NA .. AS_VQSR .. AC=0,etc

..

chr1hk.vcf.gz excerpt:

CHROM .POS ....ID... REF... ALT ... QUAL... FILTER....... INFO

chr1 ..... 10055 .. NA .. T ....... C .. NA .. AS_VQSR ..AC=0,etc

chr1 ..... 10061... NA... T........ C .. NA .. AS_VQSR ..AC=0,etc

chr1 .....10109 ... NA ... A....... T .. NA .. AS_VQSR ..AC=0,etc

chr1 .... 10109... NA... AACCCT A .. NA .. AS_VQSR .. AC=0,etc

..

Merging goal ex:

CHROM .POS ....ID... REF... ALT ... QUAL... FILTER....... INFO

chr1 ..... 10055 .. NA .. T ....... C .. NA .. AS_VQSR ..AC=0,etc

chr1 ..... 10061... NA... T........ C .. NA .. AS_VQSR ..AC=0,etc

chr1 .....10109 ... NA ... A....... T .. NA .. AS_VQSR ..AC=0,etc

chr1 .... 10109... NA... AACCCT A .. NA .. AS_VQSR .. AC=0,etc

.. .. The code I have been working with is as follows: "bcftools merge --merge none chr1g.vcf.gz chr1hk.vcf.gz > chr1merge.vcf" This code works to merge based off of REF/ALT allele matches, but is the union of the two original files. How can I tweak it to keep only the intersection?

Thank you!

ADD COMMENTlink modified 16 days ago by Elucidata120 • written 6 weeks ago by just learning0
0
gravatar for Pierre Lindenbaum
6 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum134k wrote:

How can I tweak it to keep only the intersection?

process the output of bcftools merge with an invocation of bcftools isec

ADD COMMENTlink written 6 weeks ago by Pierre Lindenbaum134k

Thank you for the reply! I was attempting to avoid using bcftools isec after the merge because it outputs four extremely large data sets- however if this is the only option I will work with it.

ADD REPLYlink written 6 weeks ago by just learning0
0
gravatar for Elucidata
16 days ago by
Elucidata120
Elucidata120 wrote:

One of the common tools used to merge and intersect vcf files based on the REF/ALT alleles is bedtools intersect (also can be used as intersectbed). One can use this tool to find overlapping entries between files or exclusive entries between files by mentioning the corresponding flags of the tool.

You can look up the variety of uses the tool offers here.

In your case, to get the overlapping (intersecting) entries of the two files with the output files containing the entries of the chr1g.vcf.gz file (including the REF/ALT entries) you can use the following command:

bedtools intersect -header -wa -a chr1g.vcf.gz -b chr1hk.vcf.gz > intersect.vcf OR intersectBed -header -wa -a chr1g.vcf.gz -b chr1hk.vcf.gz > intersect.vcf

You can change the -wa flag to -wb flag to check the entries of file chr1hk.vcf.gz

ADD COMMENTlink written 16 days ago by Elucidata120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1332 users visited in the last hour
_