Question

Should I remove an overrepresented sequence from one Fastq file from my RNA Seq data?

0

Entering edit mode

2.8 years ago

robeaumont • 0

Hello,

New to RNA Seq analysis and I couldn't find an answer to this question elsewhere. I have an overrepresented sequence in one of my Fastq files (I did pair-end sequencing), which is for one of five drug-treated samples (five control samples, same five drug-treated). This sequence makes up 0.107% of total reads, flagging an amber warning. This sequence was not overrepresented in any of the other control or drug-treated samples, so I'm unsure if this is contamination or something about this sample in particular (the FASTQC report did not identify the source of the sequence = 'No Hit'). I blasted the sequence and the hit was an inflammatory mediator, which isn't totally unrelated to what the research question is addressing.

Should I remove this sequence? Or is it likely this is just biological variance in this one sample responding more profoundly to the treatment?

Thanks in advance!

fastqc overrepresented RNA-Seq sequences • 1.1k views

ADD COMMENT • link updated 2.8 years ago by Dunois ★ 2.5k • written 2.8 years ago by robeaumont • 0

0

Entering edit mode

The only over-represented sequences you'd have to remove are adapter dimers (if there are any).

ADD REPLY • link 2.8 years ago by Dunois ★ 2.5k

score 3 · Answer 1 · 2021-06-20

I blasted the sequence and the hit was an inflammatory mediator, which isn't totally unrelated to what the research question is addressing.

There is your answer. Sounds like an expected observation.

There is no rule that every category in FastQC needs to be "green" before you can continue with further analysis.

score 3 · Answer 2 · 2021-06-20

3

Entering edit mode

2.8 years ago

swbarnes2 14k

FastQC was designed for examining DNA, so many of the issues it flags are not problems in RNASeq. An overrepresented sequence taking up a fraction of a percent is not a problem for you.

ADD COMMENT • link 2.8 years ago by swbarnes2 14k

score 1 · Answer 3 · 2021-06-20

1

Entering edit mode

2.8 years ago

Mensur Dlakic ★ 27k

The whole purpose of doing RNA seq is to measure the changes in gene expression. Or, as you call it, to find over-represented signals (and under-represented). It sounds like you got exactly what you should have, so it is now a matter of properly quantifying your data.

ADD COMMENT • link 2.8 years ago by Mensur Dlakic ★ 27k