Question: RNA-seq GC content Bimodal distribution
Is it normal for RNA-seq from human to have bimodal distribution in GC content?

For DNA seq bimodal distribution in GC content is a sign of contamination.

How about RNA-seq?

That's rather extreme GC content (around 90%), and such a secondary peak is not expected in an RNA-seq run for a specific species. What is expected is some noise at the start of the per-base nucleotide distribution due to not-so-random hexamers.

A very specific contaminant or PCR amplicon would be visible as a sharp peak, and also also be found as an overrepresented sequence. If quite sharp and close to the main distribution it could represent read through into adapters. None of these two scenarios fit here.

In your case, if you want to find out what caused this, check some of your sequences are runs of GpC or a triplet repeat. Are any such sequences visible as an overrepresented sequence?

It might also be contamination from a different species, although I can't think of anything with an overall GC content that high. You could try fastq_screen to check for other species you work with regularly (or e.g. a food source). Alt fish out some of the high GC sequences and Blast them to see what they are.



i see that second peak all the time with rRNA depleted libraries. rRNA depletion methods are never complete, which is reflected by the lower density peak at ~85%. You want that peak to be as low as possible.

