Question: SNP analysis based on RNAseq data using GATK PIPELINE
0
gravatar for Bioinfonext
22 months ago by
Bioinfonext120
Korea
Bioinfonext120 wrote:

Hi...

I have analyzed SNP in two contrasting genotypes based on RNAseq data using GATK pipeline. In some cases, there is a string of nucleotide in the reference position and the only single alternate nucleotide in genotype. what does it mean?

After calling SNP by using GATK, How should I filter raw VCF to find only confirmed SNP?

CHROM   POS ID                  REF                          ALT    QUAL    FILTER  INFO

R1        1119  .                C                            T     311.78  .             

R1        1132                CACTTGG                         C     302.75  .                 

R1       1275                   .T                            C     146.9   .           
.
snp • 704 views
ADD COMMENTlink modified 22 months ago • written 22 months ago by Bioinfonext120
1
gravatar for mforde84
22 months ago by
mforde841.2k
mforde841.2k wrote:

That would be a deletion of ACTTGG, and if the columns were reversed it would be an insert.

ADD COMMENTlink written 22 months ago by mforde841.2k
0
gravatar for Bioinfonext
22 months ago by
Bioinfonext120
Korea
Bioinfonext120 wrote:

Thanks a lot. Can you please also suggest which quality score can be used as a cutoff to find confirm SNP. What are the other parameter to filter raw VCF?

ADD COMMENTlink written 22 months ago by Bioinfonext120

Depends. Typically thats something you'll need to research.

http://gatkforums.broadinstitute.org/gatk/discussion/2806/howto-apply-hard-filters-to-a-call-set

ADD REPLYlink written 22 months ago by mforde841.2k
0
gravatar for Bioinfonext
22 months ago by
Bioinfonext120
Korea
Bioinfonext120 wrote:

Thanks a lot. please also suggest

Do I need to do Indel Realignment and Base Recalibration while calling SNP from RNAseq data?

If yes, then How do I get these two files for Indel Realignment: -known indels.vcf \ -targetIntervals intervalListFromRTC.interval

ADD COMMENTlink written 22 months ago by Bioinfonext120

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

Supplemental questions should not be entered as New Answers.

ADD REPLYlink written 22 months ago by genomax62k

From some limited experience with variant calling on RNAseq data, I'm not entirely sure base score recalibration adds a whole lot to the sensitivity and specificity of the calls. Indel realignment just seems like a good idea, imo. Though the question I think is a bit moot anyway. If you have the computational resources, adding a couple hours to the analysis is a tradeoff a lot of people take just to ensure their analysis is thorough. Ideally, if you want to know how alterations to your pipeline alter call quality, you need to run permutations using a known data set (i.e., validated calls), and then decide on which gives you the most optimal results. Unfortunately there's no real apriori way to settle a lot of these issues. But hey, this is science, so why not test it out.

ADD REPLYlink modified 22 months ago • written 22 months ago by mforde841.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 819 users visited in the last hour