Overrepresented sequences poly(C) followed by poly(T)
1
0
Entering edit mode
6.1 years ago

I'm working on an RNA-seq project and fastqc keeps identifying overrepresented sequences consisting of poly(C) followed by poly(T). I see a range from

CCCCCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT to CCCCCCCCCCCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

I know the poly(T) is probably from how the RNA was enriched bu where is the poly(C) coming from? Has anybody else seen this before?

RNA-Seq rna-seq next-gen • 2.5k views
ADD COMMENT
0
Entering edit mode
4.1 years ago

There have been a few threads on this topic already:

In conclusion, I would just remove the standard adapters that are known to CutAdapt (or whatever program that you are using) from the sequences, and also filter / trim reads based on length and quality, and then proceed to alignment. My feeling is that the main thing that is affected by trimming and filtering reads is the quality metrics like percent alignment. Most 'junk' reads, including poly A and T, will not align anyway.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 1544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6