Question: vcftools separate only selected chrom allocation from 4gb vcf file
0
gravatar for dev.info.2021
9 months ago by
dev.info.20210 wrote:

Hi, I have a 4 GB *.vcf file and would like to filter only the chrome allocations that I need and write to a new file.

for example this one :

 17    7571720    7590868c
 3    10141635    10153670

i saved it to *.bed file and try it this command:

vcftools --gzvcf /home/user/Documents/*.vcf --bed /home/user/Documents/list.bed --out /home/sentinel/Documents/test

return: -> No data left for analysis!

VCFtools - 0.1.17
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
--gzvcf /home/user/Documents/*.vcf
--out /home/user/Documents/test
--recode
--bed list.bed

Using zlib version: 1.2.11
Warning: Expected at least 2 parts in INFO entry: ID=BLOCKAVG_min30p3a,Number=0,Type=Flag,Description="Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)).">
Warning: Expected at least 2 parts in INFO entry: ID=BLOCKAVG_min30p3a,Number=0,Type=Flag,Description="Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)).">
Warning: Expected at least 2 parts in INFO entry: ID=BLOCKAVG_min30p3a,Number=0,Type=Flag,Description="Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)).">
Warning: Expected at least 2 parts in INFO entry: ID=BLOCKAVG_min30p3a,Number=0,Type=Flag,Description="Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)).">
Warning: Expected at least 2 parts in INFO entry: ID=BLOCKAVG_min30p3a,Number=0,Type=Flag,Description="Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)).">
Warning: Expected at least 2 parts in INFO entry: ID=BLOCKAVG_min30p3a,Number=0,Type=Flag,Description="Non-variant multi-site block. Non-variant blocks are defined independently for each sample. All sites in such a block are constrained to be non-variant, have the same filter value, and have sample values {GQX,DP,DPF} in range [x,y], y <= max(x+3,(x*1.3)).">
Warning: Expected at least 2 parts in FORMAT entry: ID=GQX,Number=1,Type=Integer,Description="Empirically calibrated genotype quality score for variant sites, otherwise minimum of {Genotype quality assuming variant position,Genotype quality assuming non-variant position}">
Warning: Expected at least 2 parts in FORMAT entry: ID=GQX,Number=1,Type=Integer,Description="Empirically calibrated genotype quality score for variant sites, otherwise minimum of {Genotype quality assuming variant position,Genotype quality assuming non-variant position}">
Warning: Expected at least 2 parts in FORMAT entry: ID=FT,Number=1,Type=String,Description="Sample filter, 'PASS' indicates that all filters have passed for this sample">
Warning: Expected at least 2 parts in FORMAT entry: ID=DPI,Number=1,Type=Integer,Description="Read depth associated with indel, taken from the site preceding the indel">
Warning: Expected at least 2 parts in FORMAT entry: ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
After filtering, kept 1 out of 1 Individuals
Outputting VCF file...
Read 2 BED file entries.
After filtering, kept 0 out of a possible 41203829 Sites
No data left for analysis!
Run Time = 40.00 seconds

any ideas ?

vcftools allocation filter vcf • 293 views
ADD COMMENTlink modified 9 months ago • written 9 months ago by dev.info.20210

use bedtools intersect (https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html). But make sure that your VCF is formatted well @ dev.info.2021

ADD REPLYlink modified 9 months ago • written 9 months ago by cpad011214k
2
gravatar for Jorge Amigo
9 months ago by
Jorge Amigo12k
Santiago de Compostela, Spain
Jorge Amigo12k wrote:

it's difficult to say anything without knowing the content of your VCF file, but here are a couple of suggestions:

  1. have you checked your VCF and your BED files refer to the same reference? one quick dirty check is making sure that you're either using or not using in both files the "chr" prefix, although proper check would be to get the reference information from the VCF header and build the BED file with the corresponding reference positions.

  2. you're using the --gzvcf option with a plain VCF file. I don't know if vcftools is able to handle plain text files with the --gzvcf option, but an easy check would be to use the simple --vcf option

as an additional suggestion, I'd consider using bcftools. it's definitely much faster than vcftools, and if your VCF file is 4GB it'll definitely make a difference. these would be the commands to use, considering that you're aming to all VCF present (*.vcf) and assuming that those VCF files are not compressed:

for file in *.vcf; do
  bgzip -f $file; tabix -fp vcf $file.gz
  bcftools view -R file.bed $file.gz > ${file/.vcf}.filtered.vcf
done
ADD COMMENTlink written 9 months ago by Jorge Amigo12k
0
gravatar for dev.info.2021
9 months ago by
dev.info.20210 wrote:

already found solution:

1] my bed file missed "chr"

 chr17    7571720    7590868c
 chr3    10141635    10153670

2] command missed --recode in output

vcftools --gzvcf /home/user/Documents/*.vcf --bed /home/user/Documents/list.bed --out /home/sentinel/Documents/test.vcf --recode

work perfectly.

ADD COMMENTlink written 9 months ago by dev.info.20210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1502 users visited in the last hour