Increased GC content after trimming RNAseq data
10 weeks ago
guillaume.rbt ▴ 990

Hi,

I've trimmed rnaseq data (PE 100bp reads) thanks to trimgalore ro remove adapters.

After trimmming I've noticed an increased on GC content (see below), specifically for reverse reads, and for high value of GC %.

I'm having trouble to understand where it came from.

And I wonder if I should worry about it, or maybe avoid trimming step, as adapter contamination is rather low.

Any input is welcome !

before trimming :

after trimming :

Personally, I do not think this is a matter of concern. That might be due to the shortening of reads after trimming, which alters the GC% overall content. It's not that trimming reads introduce more GCs. Those are just warnings and your GC% content for all samples still roughly follows a normal distribution. I would proceed with the trimmed samples without worrying too much, this will most likely have no effect on downstream data processing.

seqkit grep -s -p '".*[CG]{10,70}.*[CG]{5,70}.*[CG]{10,70}"' -r < your.fastq > CGrich.fastq


should extract at least some of those very CG-rich reads?

I haven't tried this but it's a good idea !

From which organism these reads come from?

it's human culture cells (keratinocytes transfected with HPV + fibroblasts)