Hello all, I´d like to remove entire lines from a "vcf file" using a "txt file" containing the following columns: #CHROM POS
What´s the best way to do this?
Thank you
Hello all, I´d like to remove entire lines from a "vcf file" using a "txt file" containing the following columns: #CHROM POS
What´s the best way to do this?
Thank you
You can use bcftools. https://samtools.github.io/bcftools/bcftools.html
bcftools view
can be used for subsetting.
In your case, you will be removing sites with regions from a file, and their -r
or -R
arguments do not accept exlusion (https://github.com/samtools/bcftools/issues/527). You will have to use -T
.
Your command should be like: bcftools view -T ^FILE vcf_file.vcf.gz > output.vcf
Note that -T
only checks for the start position.
Just make sure your region file has the right format:
Regions can be specified either on command line or in a VCF, BED, or tab-delimited file (the default). The columns of the tab-delimited file are: CHROM, POS, and, optionally, POS_TO, where positions are 1-based and inclusive. The columns of the tab-delimited BED file are also CHROM, POS and POS_TO (trailing columns are ignored), but coordinates are 0-based, half-open. To indicate that a file be treated as BED rather than the 1-based tab-delimited file, the file must have the ".bed" or ".bed.gz" suffix (case-insensitive). Uncompressed files are stored in memory, while bgzip-compressed and tabix-indexed region files are streamed. Note that sequence names must match exactly, "chr20" is not the same as "20". Also note that chromosome ordering in FILE will be respected, the VCF will be processed in the order in which chromosomes first appear in FILE. However, within chromosomes, the VCF will always be processed in ascending genomic coordinate order no matter what order they appear in FILE. Note that overlapping regions in FILE can result in duplicated out of order positions in the output. This option requires indexed VCF/BCF files.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.