Question: How to extract specific chromosome from vcf file
8
gravatar for MAPK
2.8 years ago by
MAPK1.4k
United States
MAPK1.4k wrote:

I have a vcf file with 23 chromsomes and other unwanted contigs. I want to extract a VCF file with chromsome 1 to chromsome 5 in one file. I want to include the header line as well. How can I do this in the most efficient way? Thanks

vcf • 7.8k views
ADD COMMENTlink modified 12 months ago by mg30 • written 2.8 years ago by MAPK1.4k
4
grep -w '^#\|^[1-5]' my.vcf > my_new.vcf

or if your chromosomes have a chr prefix:

grep -w '^#\|^chr[1-5]' my.vcf > my_new.vcf
ADD REPLYlink written 2.8 years ago by rbagnall1.4k

Thanks, how can I update the vcf header?

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by MAPK1.4k

Better extend the pattern string by #CHROM to retain the column names. If this is missing, tools like VCFtools will complain.

grep -w '^#\|^#CHROM\|^chr[1-5]' my.vcf > my_new.vcf
ADD REPLYlink modified 19 months ago • written 19 months ago by ATpoint15k
1

How to split vcf file by chromosome?

ADD REPLYlink written 2.8 years ago by genomax65k

Thanks, but this only extracts per chromsome, right? I want chr1 to chr5 in one file.

ADD REPLYlink written 2.8 years ago by MAPK1.4k
3
gravatar for mg
12 months ago by
mg30
mg30 wrote:

bcftools can be used, and this will preserve the header as well.

bcftools view input.vcf.gz --regions chr1

To extract mutiple chromosomes pass them as comma separated. eg. --regions chr1,chr5

ADD COMMENTlink written 12 months ago by mg30

Note that this method is better than grep as it includes the VCF header. However, it won't change the header of the VCF file so the unselected chromosomes will still have their ID line, e.g ##contig=<id=chr1>. So don't rely on bcftools view -h subset.vcf to verify what chromosomes are left in your VCF file.

ADD REPLYlink written 9 months ago by Johan Zicola40

This worked well for me, too! For many chromosomes, do: -R, --regions-file <file> restrict to regions listed in a file

ADD REPLYlink written 5 weeks ago by FatihSarigol120
2
gravatar for Vincent Laufer
2.8 years ago by
Vincent Laufer1.0k
United States
Vincent Laufer1.0k wrote:

In addition to the solutions already posted, you might try VCF Tools:

http://vcftools.sourceforge.net/man_latest.html

At this URL note the following ability:

    SITE FILTERING OPTIONS
    These options are used to include or exclude certain sites from any analysis being performed by the program.

    POSITION FILTERING

    --chr <chromosome> 
    --not-chr <chromosome>

Includes or excludes sites with indentifiers matching <chromosome>. **These options may be used multiple times to include or exclude more than one chromosome.**

This will preserve the header of course. In addition, the code posted above in the comments will also get the header as it is getting lines with # as well as chr[1-5] (the statement includes an or that will grab lines starting with # or with chr1, chr2, chr3, etc.

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Vincent Laufer1.0k

Can you do something like --chr 1-23 ?

ADD REPLYlink modified 23 months ago • written 23 months ago by jespinoz20

This won't work, separating chromosome names by commas neither (though it works for bcftools view --regions 1,2,3). Do rather vcftools --gzvcf <file.vcf.gz> --chr 1 --chr 2 [ etc. until 23] --recode --out subset_chr1-23

ADD REPLYlink written 9 months ago by Johan Zicola40
1
gravatar for ATpoint
19 months ago by
ATpoint15k
Germany
ATpoint15k wrote:

Keep in mind that the posted solution only works for single-digit chromosomes, so chr1, chr2, chr3 (...), but not chr10-22 and X. Using chr[1-22] will also not work, as you have to specify to search for double digits. If you want all regular chromosomes, so 1-22 and X, but discard U, random contigs and stuff from a VCF, use:

grep -w '^#\|chr[1-9]\|chr[1-2][0-9]\|chr[X]' in.vcf
ADD COMMENTlink modified 19 months ago • written 19 months ago by ATpoint15k

Hi I want to separate only chr21 from vcf how to do it. I tried above commands its not generating. can any one help me. Thanks for your suggestions.

ADD REPLYlink written 15 months ago by Ramana0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2435 users visited in the last hour