Increased GC content after trimming RNAseq data
0
0
Entering edit mode
10 weeks ago
guillaume.rbt ▴ 990

Hi,

I've trimmed rnaseq data (PE 100bp reads) thanks to trimgalore ro remove adapters.

After trimmming I've noticed an increased on GC content (see below), specifically for reverse reads, and for high value of GC %.

I'm having trouble to understand where it came from.

And I wonder if I should worry about it, or maybe avoid trimming step, as adapter contamination is rather low.

Any input is welcome !

before trimming :

after trimming :

rnaseq • 465 views
1
Entering edit mode

Personally, I do not think this is a matter of concern. That might be due to the shortening of reads after trimming, which alters the GC% overall content. It's not that trimming reads introduce more GCs. Those are just warnings and your GC% content for all samples still roughly follows a normal distribution. I would proceed with the trimmed samples without worrying too much, this will most likely have no effect on downstream data processing.

0
Entering edit mode

1
Entering edit mode

seqkit grep -s -p '".*[CG]{10,70}.*[CG]{5,70}.*[CG]{10,70}"' -r < your.fastq > CGrich.fastq


should extract at least some of those very CG-rich reads?

0
Entering edit mode

I haven't tried this but it's a good idea !

0
Entering edit mode

From which organism these reads come from?

0
Entering edit mode

it's human culture cells (keratinocytes transfected with HPV + fibroblasts)