Output a vcf restricted to variant sites listed in a file
2
0
Entering edit mode
8.2 years ago

I'm looking for a tool that will output a vcf restricted to variant sites (not genomic intervals) listed in a tab-delimited file. I understand GATK SelectVariants can do this, but I'd prefer an alternative tool that doesn't reorder samples.

vcf SNP • 2.7k views
ADD COMMENT
3
Entering edit mode
8.2 years ago

Not quite sure I understand why this won't work, as long as your bed file is composed of 1bp variant sites and not intervals.

bedtools intersect -a my.vcf -b my.bed -wa >output

If you have something different in mind, then example data would be helpful.

ADD COMMENT
0
Entering edit mode

this works nice, but is there a way to keep the vcf metadata so i can pipe to bcftools?

ADD REPLY
1
Entering edit mode

Do you mean the header? A couple of redirects will get you there:

cat <(head -n 10000 my.vcf | grep "^#") <(bedtools intersect -a my.vcf -b my.bed -wa) | downstreamToolX
ADD REPLY
0
Entering edit mode

great, thanks!

ADD REPLY
1
Entering edit mode
8.2 years ago

Convert your vcf / target region to bed. (For a VCF it's something like:

grep -v "^#" your.vcf | awk -F '\t' '{printf("%s\t%d\t%d\n",$1,int($2)-1,int($2)+length($4)-1);}' > input.bed

and then use bcftools view with option --regions-file

or

bedtools intersect..

ADD COMMENT

Login before adding your answer.

Traffic: 2569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6