Question: SNP analysis based on RNAseq data using GATK PIPELINE
0
gravatar for Bioinfonext
2.3 years ago by
Bioinfonext150
Korea
Bioinfonext150 wrote:

Hi...

I have analyzed SNP in two contrasting genotypes based on RNAseq data using GATK pipeline. In some cases, there is a string of nucleotide in the reference position and the only single alternate nucleotide in genotype. what does it mean?

After calling SNP by using GATK, How should I filter raw VCF to find only confirmed SNP?

CHROM   POS ID                  REF                          ALT    QUAL    FILTER  INFO

R1        1119  .                C                            T     311.78  .             

R1        1132                CACTTGG                         C     302.75  .                 

R1       1275                   .T                            C     146.9   .           
.
snp • 825 views
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Bioinfonext150
1
gravatar for mforde84
2.3 years ago by
mforde841.2k
mforde841.2k wrote:

That would be a deletion of ACTTGG, and if the columns were reversed it would be an insert.

ADD COMMENTlink written 2.3 years ago by mforde841.2k
0
gravatar for Bioinfonext
2.3 years ago by
Bioinfonext150
Korea
Bioinfonext150 wrote:

Thanks a lot. Can you please also suggest which quality score can be used as a cutoff to find confirm SNP. What are the other parameter to filter raw VCF?

ADD COMMENTlink written 2.3 years ago by Bioinfonext150

Depends. Typically thats something you'll need to research.

http://gatkforums.broadinstitute.org/gatk/discussion/2806/howto-apply-hard-filters-to-a-call-set

ADD REPLYlink written 2.3 years ago by mforde841.2k
0
gravatar for Bioinfonext
2.3 years ago by
Bioinfonext150
Korea
Bioinfonext150 wrote:

Thanks a lot. please also suggest

Do I need to do Indel Realignment and Base Recalibration while calling SNP from RNAseq data?

If yes, then How do I get these two files for Indel Realignment: -known indels.vcf \ -targetIntervals intervalListFromRTC.interval

ADD COMMENTlink written 2.3 years ago by Bioinfonext150

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

Supplemental questions should not be entered as New Answers.

ADD REPLYlink written 2.3 years ago by genomax70k

From some limited experience with variant calling on RNAseq data, I'm not entirely sure base score recalibration adds a whole lot to the sensitivity and specificity of the calls. Indel realignment just seems like a good idea, imo. Though the question I think is a bit moot anyway. If you have the computational resources, adding a couple hours to the analysis is a tradeoff a lot of people take just to ensure their analysis is thorough. Ideally, if you want to know how alterations to your pipeline alter call quality, you need to run permutations using a known data set (i.e., validated calls), and then decide on which gives you the most optimal results. Unfortunately there's no real apriori way to settle a lot of these issues. But hey, this is science, so why not test it out.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by mforde841.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1308 users visited in the last hour