How can I filter a vcf file by DP and GQ using the R package VariantAnnotation?
2
1
Entering edit mode
9.1 years ago
r.jaffe ▴ 10

Hi guys, need some help filtering my vcf files with the VariantAnnotation package (v1.12.9) from Bioconductor. I have a vcf file (v4.2) containing 28238 SNPs for 160 individuals, and I'm running R version 3.1.3

I opened my file and ran:

hist(geno(vcf)$DP) ##check the distribution of Read Depth
hist(geno(vcf)$GQ) ##check the distribution of Phred-Scaled Genotype Quality

I then tried to filter out all samples with a DP<10 and a GQ<20 and create a new vcf file:

vcf2 <- vcf[geno(vcf)$DP>10 && geno(vcf)$GQ>20]

The problem is that this code does not seem to be working, because both files have exactly the same dimensions:

dim(vcf) ##28238   160
dim(vcf2) ##28238   160

So my questions are:

  1. How can I filter vcf files by DP and GQ?
  2. Is it better to eliminate SNPs that have low DP & GQ, or to re-code genotypes with low DP and GQ as missing genotypes (./.)?

Any help would be much appreciated, thanks!

Rodolfo

VariantAnnotation Bioconductor R vcf • 8.2k views
ADD COMMENT
0
Entering edit mode

To fix your code, use & (vector and) rather than && (scalar and) as in response to your other version of this question.

ADD REPLY
1
Entering edit mode
9.1 years ago
NB ▴ 960

Hi,

Not sure how to do it in R but I usually use vcftools to filter my variants based on DP and GQ

vcftools --vcf myfile.vcf --minGQ 20 --minDP 10 --recode --out new_myfile

You can also use GATK's --filterExpression to do the same.

ADD COMMENT
0
Entering edit mode

Thank you sukmb!

Yes I used vcftools and bcftools already, but was wandering if I could do it in R.

ADD REPLY
0
Entering edit mode
9.1 years ago

when you look at this:

str(geno(vcf)$DP>10)

it probably returns a logical matrix

logi [1:28238, 1:160]

if you use that to subset a data.frame how is R supposed to behave? subset rows, columns, or cells? How do you subset cells?

Maybe you meant to filter by combined depth across samples, which is in info

ADD COMMENT
0
Entering edit mode

Thank you Jeremy. Yes I get a logical matrix, but I still don`t know how to subset the rows of my vcf...

ADD REPLY
0
Entering edit mode

I think you need to be clear about what you are asking for - samples, snps, or sample/snp incidents

ADD REPLY

Login before adding your answer.

Traffic: 3101 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6