Specific over-represented sequence FastQC
0
2
Entering edit mode
3.7 years ago

Hi guys, Had a quick question RE, an over-represented sequence from human transcriptomic data. I've already trimmed for adapters and aligned to the genome with star - After running a fastQC on one of the BAM files I have a warning about an over represented sequence "GGTGGCGCGTGCCTGTAGTCCCAGCTACTCGGGAGGCTGAGGTGGGAGGA" Comes up just over 36000 times and after a quick google search I have seen it come up in a few other instances online. After a blast search I found that it shares 100% homology with a few regions, such as, "Homo sapiens RNA component of signal recognition particle 7SL2 (RN7SL2), small cytoplasmic RNA"

Anyone know what might be happening and if it is going to be a problem going forward?

Thanks in advance!

rna-seq fastQC • 776 views
ADD COMMENT
1
Entering edit mode

Since it only shows up 36000 times, out of I'm assuming 30 million+ reads, I wouldn't worry about it yet. Go on to alignment and see how it looks.

ADD REPLY

Login before adding your answer.

Traffic: 2427 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6