Filter vcf file
1
0
Entering edit mode
6.7 years ago
samocarp • 0

Hello all, I´d like to remove entire lines from a "vcf file" using a "txt file" containing the following columns: #CHROM POS

What´s the best way to do this?

Thank you

SNP alignment • 2.7k views
ADD COMMENT
1
Entering edit mode
6.7 years ago
miaowzai ▴ 390

You can use bcftools. https://samtools.github.io/bcftools/bcftools.html

bcftools view can be used for subsetting.

In your case, you will be removing sites with regions from a file, and their -r or -R arguments do not accept exlusion (https://github.com/samtools/bcftools/issues/527). You will have to use -T. Your command should be like: bcftools view -T ^FILE vcf_file.vcf.gz > output.vcf

Note that -T only checks for the start position.

Just make sure your region file has the right format:

Regions can be specified either on command line or in a VCF, BED, or tab-delimited file (the default). The columns of the tab-delimited file are: CHROM, POS, and, optionally, POS_TO, where positions are 1-based and inclusive. The columns of the tab-delimited BED file are also CHROM, POS and POS_TO (trailing columns are ignored), but coordinates are 0-based, half-open. To indicate that a file be treated as BED rather than the 1-based tab-delimited file, the file must have the ".bed" or ".bed.gz" suffix (case-insensitive). Uncompressed files are stored in memory, while bgzip-compressed and tabix-indexed region files are streamed. Note that sequence names must match exactly, "chr20" is not the same as "20". Also note that chromosome ordering in FILE will be respected, the VCF will be processed in the order in which chromosomes first appear in FILE. However, within chromosomes, the VCF will always be processed in ascending genomic coordinate order no matter what order they appear in FILE. Note that overlapping regions in FILE can result in duplicated out of order positions in the output. This option requires indexed VCF/BCF files.

ADD COMMENT

Login before adding your answer.

Traffic: 1754 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6