Question: RNA-seq dedupe PCR contamination before or after mapping
gravatar for umn_bist
4.1 years ago by
umn_bist370 wrote:

Will deduping (marking duplicates) with Picard before mapping affect my variant calling? Is it common practice to dedupe after mapping?

Reason why I ask is because my FastQC reports still have a lot of Kmer, overrepresented sequences, and bad GC content. I figured these can be corrected by removing PCR contamination. This is after trimming adapter and low quality (10) bases using BBDuk.

rna-seq • 2.0k views
ADD COMMENTlink modified 4.1 years ago by Carlo Yague4.9k • written 4.1 years ago by umn_bist370

Depends on what data you have, but a slight bimodal distribution of GC content in whole exome data, seems to be the norm (I haven't figured out a reason why, but it appears to be commonplace)

ADD REPLYlink written 4.1 years ago by andrew.j.skelton735.9k

I'm working with tumor/normal PE RNA-seq samples from TCGA. The distribution varies across the board. Some are slight, some are drastic. I fear that mapping my reads without correcting GC and Kmer bias may muddle my variant calling downstream.

ADD REPLYlink modified 4.1 years ago • written 4.1 years ago by umn_bist370

I highly recommend you look at the GATK best practises, it includes caveats for using RNA seq data (providing the samples have suitable depth)

ADD REPLYlink modified 10 weeks ago by RamRS26k • written 4.1 years ago by andrew.j.skelton735.9k
gravatar for Carlo Yague
4.1 years ago by
Carlo Yague4.9k
Carlo Yague4.9k wrote:

Overrepresented sequences / skewed GC content is expected in RNA-seq data. It usually comes from the most highly expressed transcripts (such as rRNA). However, it can also come from PCR duplicates and those can completely skew variant calling. For this reason, while people usually don't dedupe RNA-seq data for differential expression analysis, it is still recommended to do so for variant calling.

Some reference :

ADD COMMENTlink written 4.1 years ago by Carlo Yague4.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 768 users visited in the last hour