Question: extract and generate a VCF file with only "PASS"
1
gravatar for zengtony743
4.0 years ago by
zengtony74370
Canada
zengtony74370 wrote:
I have a VCF file and I want to generate a new VCF file with the variants which have only FILTER as "PASS" left

I tried

 1) grep 4751snpf.vcf > 4751PASS.vcf | grep “FILTER=PASS”
 2) grep 4751snpf.vcf > 4751PASS.vcf | grep “PASS”

They both failed. Here is the VCF file looks like:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  MO_4751

chr10   3124905 .   A   C   7797.30 PASS    ABHet=0.588;ABHom=0.995;
AC=1;AF=0.500;AN=2;BaseQRankSum=-4.919;DP=8;Dels=0.00;FS=7.563;HaplotypeScore=0.
4537;InbreedingCoeff=-0.5755;MQ=55.56;MQ0=1;MQRankSum=-4.937;OND=3.265e-03;QD=12
.97;ReadPosRankSum=5.683    GT:AD:DP:GQ:PL  0/1:7,1:8:9:9,0,228
chr10   3176529 .   G   T   6361.10 PASS    ABHet=0.651;ABHom=0.864;
AC=1;AF=0.500;AN=2;BaseQRankSum=-9.379;DP=14;Dels=0.01;FS=0.292;HaplotypeScore=5
.7125;InbreedingCoeff=-0.7938;MQ=55.81;MQ0=0;MQRankSum=-7.360;OND=0.035;QD=8.60;
ReadPosRankSum=-2.204   GT:AD:DP:GQ:PL  0/1:9,5:14:99:109,0,266
chr10   3257558 .   A   C   349.31  LowCoverage ABHet=0.543;ABHo
m=1.00;AC=1;AF=0.500;AN=2;BaseQRankSum=-1.917;DP=3;Dels=0.00;FS=1.412;HaplotypeS
core=0.4039;InbreedingCoeff=0.0676;MQ=54.44;MQ0=0;MQRankSum=-2.510;QD=15.88;Read
PosRankSum=-1.127   GT:AD:DP:GQ:PL  0/1:2,1:3:26:26,0,63
chr10   3505793 .   A   C   406.07  LowCoverage ABHet=0.556;ABHo
m=0.885;AC=1;AF=0.500;AN=2;BaseQRankSum=-0.055;DP=2;Dels=0.00;FS=136.819;Haploty
peScore=0.2675;InbreedingCoeff=-0.1981;MQ=48.36;MQ0=0;MQRankSum=-3.192;OND=0.090
;QD=7.66;ReadPosRankSum=-1.637  GT:AD:DP:GQ:PL  0/1:1,1:2:21:21,0,27
filter vcf • 5.5k views
ADD COMMENTlink modified 8 weeks ago by carmen.bugarin.diz0 • written 4.0 years ago by zengtony74370

Just found that vcftool has --remove-filtered-all. Has anyone tried it?

ADD REPLYlink written 4.0 years ago by zengtony74370
4
gravatar for Vivek
4.0 years ago by
Vivek2.4k
Denmark
Vivek2.4k wrote:

You are grepping the file name instead of the pattern and writing it to another file before using a pipe to grep for the pattern you want. Try this command instead.

awk -F '\t' '{if($0 ~ /\#/) print; else if($7 == "PASS") print}' 4751snpf.vcf > 4751PASS.vcf

Edited to add an apastrophe.

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Vivek2.4k

Thanks, Vivek. I am running the one that you posted It shows this, Is it normal?

$ awk -F '\t' '{if($0 ~ /#/) print; else if($7 == "PASS") print} 4751snpf.vcf > 4751PASS1.vcf &

ADD REPLYlink written 4.0 years ago by zengtony74370
3

There's an apostrophe missing between "print}" and "4751". I think the correct syntax is:

awk -F '\t' '{if($0 ~ /\#/) print; else if($7 == "PASS") print}' 4751snpf.vcf > 4751PASS1.vcf
ADD REPLYlink written 4.0 years ago by spvensko220

Ah thank you, I typed that post right before leaving home and didn't get a chance to check the error.

ADD REPLYlink written 4.0 years ago by Vivek2.4k
3
gravatar for finswimmer
13 months ago by
finswimmer13k
Germany
finswimmer13k wrote:
$ bcftools view -i 'ID="PASS"' input.vcf > output.vcf
ADD COMMENTlink written 13 months ago by finswimmer13k
1

I like bcftools, it is so fast and can handel larege vcf files from whole genome sequencing (WGS) with high efficiency. However, the bcftools option -i 'ID="PASS"' will not work for all versions of vcf files, at least it did not work with me with illumenia WGS SNV.vcf files. To make sure it will work with all types of vcf files, I would use the following command instead:

bcftools view -f PASS input.vcf > output.vcf -Ahmed Alhendi

ADD REPLYlink written 5 months ago by Ahmed Alhendi50
2
gravatar for spvensko
4.0 years ago by
spvensko220
Raleigh, NC, USA
spvensko220 wrote:

I believe this should work:

awk '$7=="PASS" {print $0}' 4751snpf.vcf > 4751PASS.vcf
ADD COMMENTlink written 4.0 years ago by spvensko220

Thanks, spvensko, it works by producing the following file, however, all information on the header line lost too

chr10   3176529 .   G   T   6361.10 PASS    ABHet=0.651;ABHom=0.864;
AC=1;AF=0.500;AN=2;BaseQRankSum=-9.379;DP=14;Dels=0.01;FS=0.292;HaplotypeScore=5
.7125;InbreedingCoeff=-0.7938;MQ=55.81;MQ0=0;MQRankSum=-7.360;OND=0.035;QD=8.60;
ReadPosRankSum=-2.204   GT:AD:DP:GQ:PL  0/1:9,5:14:99:109,0,266
chr10   3696700 .   T   G   988.21  PASS    ABHet=0.604;ABHom=0.869;
AC=1;AF=0.500;AN=2;BaseQRankSum=1.305;DP=5;Dels=0.00;FS=387.692;HaplotypeScore=0
.3825;InbreedingCoeff=-0.3615;MQ=49.17;MQ0=0;MQRankSum=-5.599;OND=0.066;QD=5.95;
ReadPosRankSum=-7.691   GT:AD:DP:GQ:PL  0/1:3,2:5:44:44,0,61
chr10   4033401 .   G   A,T 16302.20    PASS    AC=1,0;AF=0.500,
0.00;AN=2;BaseQRankSum=-7.317;DP=34;Dels=0.00;FS=235.717;HaplotypeScore=12.5074;
InbreedingCoeff=-0.4964;MQ=55.46;MQ0=1;MQRankSum=-29.193;QD=6.18;ReadPosRankSum=
-5.443  GT:AD:DP:GQ:PL  0/1:27,7,0:34:99:111,0,818,186,836,1023
chr10   4403027 .   A   G   3098.16 PASS    ABHet=0.673;ABHom=0.977;
AC=1;AF=0.500;AN=2;BaseQRankSum=6.290;DP=10;Dels=0.00;FS=0.000;HaplotypeScore=6.
4682;InbreedingCoeff=-0.4427;MQ=49.74;MQ0=0;MQRankSum=5.366;OND=8.775e-03;QD=7.9
4;ReadPosRankSum=-9.160 GT:AD:DP:GQ:PL  0/1:8,2:10:42:42,0,149
ADD REPLYlink written 4.0 years ago by zengtony74370
1

You should follow Vivek's approach (A: extract and generate a VCF file with only "PASS") as it maintains the header lines.

ADD REPLYlink written 4.0 years ago by spvensko220

Thanks, Vivek. I am running the one that you posted It shows this, Is it normal?

$ awk -F '\t' '{if($0 ~ /\#/) print; else if($7 == "PASS") print} 4751snpf.vcf > 4751PASS1.vcf &
> 
> 
> 
> 
> 
> 
> 
> 
>
ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by zengtony74370
1
gravatar for Lakshman Teja
7 months ago by
BANGALORE, INDIA
Lakshman Teja40 wrote:

Hello,
You can try the below GATK command to filter variants by 'PASS':

gatk --java-options '-Xmx20G -XX:+UseParallelGC -XX:ParallelGCThreads=8' SelectVariants -R reference.fasta -V snps.vcf.gz --exclude-filtered true -O filter.vcf.gz

Alternatively, you can use vcffilter tools:

vcffilter -f "FILTER = PASS" snps.vcf.gz > filter.vcf.gz

Hope this works

-Lakshman

ADD COMMENTlink written 7 months ago by Lakshman Teja40
1
gravatar for Ahmed Alhendi
5 months ago by
University of Southampton, UK
Ahmed Alhendi50 wrote:

If you only want to filter for PASS records then you don't need to use 'AWK '. This simply can be done using bcftools.

bcftools view -f PASS input.vcf > output.vcf
ADD COMMENTlink written 5 months ago by Ahmed Alhendi50
0
gravatar for osowiecki
13 months ago by
osowiecki0
osowiecki0 wrote:
$ zcat ./4_strelka.vcf.gz | grep -P "\tPASS\t|^#" > test
ADD COMMENTlink modified 13 months ago by finswimmer13k • written 13 months ago by osowiecki0
0
gravatar for carmen.bugarin.diz
8 weeks ago by
King's College London
carmen.bugarin.diz0 wrote:

vcftools --remove-filtered-all --recode 4751snpf.vcf > 4751PASS.vcf

ADD COMMENTlink written 8 weeks ago by carmen.bugarin.diz0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1140 users visited in the last hour