Should I remove an overrepresented sequence from one Fastq file from my RNA Seq data?
3
0
Entering edit mode
3.5 years ago
robeaumont • 0

Hello,

New to RNA Seq analysis and I couldn't find an answer to this question elsewhere. I have an overrepresented sequence in one of my Fastq files (I did pair-end sequencing), which is for one of five drug-treated samples (five control samples, same five drug-treated). This sequence makes up 0.107% of total reads, flagging an amber warning. This sequence was not overrepresented in any of the other control or drug-treated samples, so I'm unsure if this is contamination or something about this sample in particular (the FASTQC report did not identify the source of the sequence = 'No Hit'). I blasted the sequence and the hit was an inflammatory mediator, which isn't totally unrelated to what the research question is addressing.

Should I remove this sequence? Or is it likely this is just biological variance in this one sample responding more profoundly to the treatment?

Thanks in advance!

fastqc overrepresented RNA-Seq sequences • 1.2k views
ADD COMMENT
0
Entering edit mode

The only over-represented sequences you'd have to remove are adapter dimers (if there are any).

ADD REPLY
3
Entering edit mode
3.5 years ago
GenoMax 148k

I blasted the sequence and the hit was an inflammatory mediator, which isn't totally unrelated to what the research question is addressing.

There is your answer. Sounds like an expected observation.

There is no rule that every category in FastQC needs to be "green" before you can continue with further analysis.

ADD COMMENT
3
Entering edit mode
3.5 years ago

FastQC was designed for examining DNA, so many of the issues it flags are not problems in RNASeq. An overrepresented sequence taking up a fraction of a percent is not a problem for you.

ADD COMMENT
1
Entering edit mode
3.5 years ago
Mensur Dlakic ★ 28k

The whole purpose of doing RNA seq is to measure the changes in gene expression. Or, as you call it, to find over-represented signals (and under-represented). It sounds like you got exactly what you should have, so it is now a matter of properly quantifying your data.

ADD COMMENT

Login before adding your answer.

Traffic: 1215 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6