Question

Overrepresented sequences poly(C) followed by poly(T)

0

Entering edit mode

6.1 years ago

Jakesaunders • 0

I'm working on an RNA-seq project and fastqc keeps identifying overrepresented sequences consisting of poly(C) followed by poly(T). I see a range from

CCCCCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT to CCCCCCCCCCCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

I know the poly(T) is probably from how the RNA was enriched bu where is the poly(C) coming from? Has anybody else seen this before?

RNA-Seq rna-seq next-gen • 2.5k views

ADD COMMENT • link updated 4.1 years ago by Kevin Blighe 87k • written 6.1 years ago by Jakesaunders • 0

score 0 · Answer 1 · 2020-03-01

There have been a few threads on this topic already:

In conclusion, I would just remove the standard adapters that are known to CutAdapt (or whatever program that you are using) from the sequences, and also filter / trim reads based on length and quality, and then proceed to alignment. My feeling is that the main thing that is affected by trimming and filtering reads is the quality metrics like percent alignment. Most 'junk' reads, including poly A and T, will not align anyway.

Kevin