Question: RNA-seq dedupe PCR contamination before or after mapping
0
gravatar for umn_bist
3.6 years ago by
umn_bist330
umn_bist330 wrote:

Will deduping (marking duplicates) with Picard before mapping affect my variant calling? Is it common practice to dedupe after mapping?

Reason why I ask is because my FastQC reports still have a lot of Kmer, overrepresented sequences, and bad GC content. I figured these can be corrected by removing PCR contamination. This is after trimming adapter and low quality (10) bases using BBDuk.

rna-seq • 1.8k views
ADD COMMENTlink modified 3.6 years ago by Carlo Yague4.6k • written 3.6 years ago by umn_bist330

Depends on what data you have, but a slight bimodal distribution of GC content in whole exome data, seems to be the norm (I haven't figured out a reason why, but it appears to be commonplace)

ADD REPLYlink written 3.6 years ago by andrew.j.skelton735.8k

I'm working with tumor/normal PE RNA-seq samples from TCGA. The distribution varies across the board. Some are slight, some are drastic. I fear that mapping my reads without correcting GC and Kmer bias may muddle my variant calling downstream.

http://p08i.imgup.net/ScreenShot4fe0.png

http://i86i.imgup.net/ScreenShot1738.png

http://t38i.imgup.net/ScreenShote90f.png

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by umn_bist330

I highly recommend you look at the GATK best practises, it includes caveats for using RNA seq data (providing the samples have suitable depth) https://www.broadinstitute.org/gatk/guide/best-practices.php 

ADD REPLYlink written 3.6 years ago by andrew.j.skelton735.8k
2
gravatar for Carlo Yague
3.6 years ago by
Carlo Yague4.6k
Belgium
Carlo Yague4.6k wrote:

Overrepresented sequences / skewed GC content is expected in RNA-seq data. It usually comes from the most highly expressed transcripts (such as rRNA). However, it can also come from PCR duplicates and those can completely skew variant calling. For this reason, while people usually don't dedupe RNA-seq data for differential expression analysis, it is still recommended to do so for variant calling.

Some reference : http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0058815

ADD COMMENTlink written 3.6 years ago by Carlo Yague4.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1627 users visited in the last hour