Is it possible to get a subset of VCF file every 10 Mbp across a particular chromosomes
1
0
Entering edit mode
4.5 years ago
Hann ▴ 110

Hello all,

I have a vcf file (SNP data) of 157 a cereal crop individuals. I have extracted only SNPs in chromosome 9 as the following:

vcftools --vcf file.vcf --chr chr09 --recode --recode-INFO-all --out chr9_output

I am interested in getting a subset of VCF files in a window 10 Mbp from chromosome 9 vcf file. Is this possible?

.

Thanks,

snp • 1.5k views
ADD COMMENT
0
Entering edit mode

You want to split your vcf in 10 Mbp windows? Or do you want to extract a certain 10 Mbp window?

ADD REPLY
0
Entering edit mode

I want to have a vcf file has the SNPs from 1 to 10 Mbp another vcf file has SNPs from 10 Mbp to 20 Mbp ... and so on

one of the ideas that I think it will work is use bedtools intersect like this

bedtools intersect -a Fonio_cultivated_noSingletons_chr09A.recode.vcf -bed wind.bed -wo test

wind.bed is a tab-delimited file has one line as following

chr09A   1   1000000

But I am getting error Error: unable to open file or unable to determine types for file Fonio_cultivated_noSingletons_chr09A.recode.vcf

ADD REPLY
1
Entering edit mode

That means that your VCF is not formatted properly, maybe the header is missing. Also -bed is the wrong flag, use -b, please read the docs why. Also, tabix will be notably faster. Simply make a file with the windows you want and loop through it with the linked tabix code.

ADD REPLY
0
Entering edit mode

Very good! worked! Thanks a lot!

ADD REPLY
2
Entering edit mode
4.5 years ago
ATpoint 81k

Compress your file with bgzip, then use tabix to index it. From there on you can extract any given interval, e.g. your 10Mb window with a simple command as described here in the manual: http://www.htslib.org/doc/tabix.html#EXAMPLE

ADD COMMENT

Login before adding your answer.

Traffic: 2516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6