Question: RNAseq variant filtering criteria
0
gravatar for prasundutta87
22 months ago by
prasundutta87330
prasundutta87330 wrote:

Hello,

Can anyone guide me or share what parameters should be considered for filtering RNAseq variants and why? I haven't got any published material on it as nothing is standardised for this. I am working on a non-model organism (water buffalo) for which there is no truth set data (such as dbsnp, although dbsnp data is itself questionable). If anyone has come across on any document mentioning about RNAseq variant filtering criteria, I would be grateful if it can be shared with me.

I have made some distribution graphs but I am unable to decide a threshold based on that. Parameters chosen- QUAL, DP, GQ (Genotype quality), SP (phred scaled strand bias P value)

PS- Variants called using bcftools mpileup and bcftools call. Kindly let me know if any more information is needed for this question

snp rna-seq • 906 views
ADD COMMENTlink modified 22 months ago by andrew.j.skelton735.7k • written 22 months ago by prasundutta87330
0
gravatar for grant.hovhannisyan
22 months ago by
grant.hovhannisyan1.5k wrote:

Here is the GATK pipeline with some recommendations regarding variant filtering https://gatkforums.broadinstitute.org/gatk/discussion/3892/the-gatk-best-practices-for-variant-calling-on-rnaseq-in-full-detail. But overall the answer on your question is quite tricky, and for different organisms (at least if they are taxonomicaly distant) parameters should be different. In my case (yeasts) GATK hard filtering parameters eliminate 70% of SNP. And I guess that GATK filters are designed for human data (but I am not sure).

ADD COMMENTlink written 22 months ago by grant.hovhannisyan1.5k

Thanks for sharing this. I had already gone through this. You are right that gatk is developed, tested and validated keeping human data in mind. Anything other than human may not perform as expected with gatk pipelines. Furthermore, their RNAseq pipeline is not validated or tested as their dnaseq variant calling pipeline (mentioned in their document).

ADD REPLYlink modified 22 months ago • written 22 months ago by prasundutta87330
0
gravatar for andrew.j.skelton73
22 months ago by
London
andrew.j.skelton735.7k wrote:

As @grant.hovhannisyan shared the GATK RNA Seq best practises, there's not much more you can do than their suggested hard filters. Hard filtering is, by definition, hard. Without truth sets (of which there are none for RNASeq, regardless of species), you can't use VQSR, and therefore have to rely on the hard filters.

ADD COMMENTlink written 22 months ago by andrew.j.skelton735.7k

I agree. Thats one of the main reasons I am stuck.

ADD REPLYlink modified 22 months ago • written 22 months ago by prasundutta87330
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1366 users visited in the last hour