Question: Many overrepresented sequences in RNA-seq data. Should them be trimmed out?
0
gravatar for dgtiezzi
4.1 years ago by
dgtiezzi0
United Kingdom
dgtiezzi0 wrote:

I downloaded data from SRA database and fastqc shows many overrepresented sequences with no hits. I blast some sequences and they match with rRNA and mtDNA. The per sequence GC content is weird due to those contaminants. Should I trim the out before alignment or I should ignore them. I believe they will not align to the reference genome, do they?

rna-seq • 3.8k views
ADD COMMENTlink modified 4.1 years ago by karl.stamm3.5k • written 4.1 years ago by dgtiezzi0
1
gravatar for karl.stamm
4.1 years ago by
karl.stamm3.5k
United States
karl.stamm3.5k wrote:

RNASeq is supposed to have duplication when a gene is in heavy use. FastQC just says no hits because it searches a small database of artifacts. Chances are this is a real spliced gene FastQC won't know about. 

 

ADD COMMENTlink written 4.1 years ago by karl.stamm3.5k
0
gravatar for Chirag Nepal
4.1 years ago by
Chirag Nepal2.2k
Copenhagen
Chirag Nepal2.2k wrote:

If it is rRNA, they should align it to the genome. You could actually map the reads to rRNA and filter unmapped reads. Check the quality of unmapped reads and over-represented sequences. If they look OK, then you could mapped those unmapped reads to the genome.
 

ADD COMMENTlink written 4.1 years ago by Chirag Nepal2.2k

The reads quality are all good. My objective is to analyze gene expression and I'm going to align the sequences to gene regions. So, those rRNA sequences may interfere in my analysis? They are supposed to not align to gene, right? 

 

ADD REPLYlink written 4.1 years ago by dgtiezzi0
1

They will align to rRNA genes when aligned to the genome. It can probably skew quantification, so remove rRNA, and map the remaining reads.

ADD REPLYlink written 4.1 years ago by Chirag Nepal2.2k
1

I think how well they align (for human) will be determined by whether you include the unassembled contigs in the reference. In reality, the rRNA genes are on several different chromosomes and MT, but most of my rRNA reads align to one of those extra contigs. Not everyone uses a good version with all the extra contigs at the bottom so if you have a lot you could see different alignment rates by not using the good reference.

ADD REPLYlink written 4.1 years ago by Michele Busby2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1411 users visited in the last hour