Question

SNP analysis based on RNAseq data using GATK PIPELINE

0

Entering edit mode

7.0 years ago

Bioinfonext ▴ 460

Hi...

I have analyzed SNP in two contrasting genotypes based on RNAseq data using GATK pipeline. In some cases, there is a string of nucleotide in the reference position and the only single alternate nucleotide in genotype. what does it mean?

After calling SNP by using GATK, How should I filter raw VCF to find only confirmed SNP?

CHROM   POS ID                  REF                          ALT    QUAL    FILTER  INFO

R1        1119  .                C                            T     311.78  .             

R1        1132                CACTTGG                         C     302.75  .                 

R1       1275                   .T                            C     146.9   .           
.

SNP • 2.0k views

ADD COMMENT • link 7.0 years ago by Bioinfonext ▴ 460

0

Entering edit mode

7.0 years ago

Bioinfonext ▴ 460

Thanks a lot. Can you please also suggest which quality score can be used as a cutoff to find confirm SNP. What are the other parameter to filter raw VCF?

ADD COMMENT • link 7.0 years ago by Bioinfonext ▴ 460

0

Entering edit mode

Depends. Typically thats something you'll need to research.

http://gatkforums.broadinstitute.org/gatk/discussion/2806/howto-apply-hard-filters-to-a-call-set

ADD REPLY • link 7.0 years ago by mforde84 ★ 1.4k

0

Entering edit mode

7.0 years ago

Bioinfonext ▴ 460

Thanks a lot. please also suggest

Do I need to do Indel Realignment and Base Recalibration while calling SNP from RNAseq data?

If yes, then How do I get these two files for Indel Realignment: -known indels.vcf \ -targetIntervals intervalListFromRTC.interval

ADD COMMENT • link 7.0 years ago by Bioinfonext ▴ 460

0

Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

Supplemental questions should not be entered as New Answers.

ADD REPLY • link 7.0 years ago by GenoMax 141k

0

Entering edit mode

From some limited experience with variant calling on RNAseq data, I'm not entirely sure base score recalibration adds a whole lot to the sensitivity and specificity of the calls. Indel realignment just seems like a good idea, imo. Though the question I think is a bit moot anyway. If you have the computational resources, adding a couple hours to the analysis is a tradeoff a lot of people take just to ensure their analysis is thorough. Ideally, if you want to know how alterations to your pipeline alter call quality, you need to run permutations using a known data set (i.e., validated calls), and then decide on which gives you the most optimal results. Unfortunately there's no real apriori way to settle a lot of these issues. But hey, this is science, so why not test it out.

ADD REPLY • link 7.0 years ago by mforde84 ★ 1.4k

score 1 · Accepted Answer · 2017-04-25

1

Entering edit mode

7.0 years ago

mforde84 ★ 1.4k

That would be a deletion of ACTTGG, and if the columns were reversed it would be an insert.

ADD COMMENT • link 7.0 years ago by mforde84 ★ 1.4k