Increased GC content after trimming RNAseq data
0
0
Entering edit mode
21 months ago
guillaume.rbt ★ 1.0k

Hi,

I've trimmed rnaseq data (PE 100bp reads) thanks to trimgalore ro remove adapters.

After trimmming I've noticed an increased on GC content (see below), specifically for reverse reads, and for high value of GC %.

I'm having trouble to understand where it came from.

And I wonder if I should worry about it, or maybe avoid trimming step, as adapter contamination is rather low.

Any input is welcome !

before trimming : enter image description here

after trimming :

enter image description here

rnaseq • 1.0k views
ADD COMMENT
1
Entering edit mode

Personally, I do not think this is a matter of concern. That might be due to the shortening of reads after trimming, which alters the GC% overall content. It's not that trimming reads introduce more GCs. Those are just warnings and your GC% content for all samples still roughly follows a normal distribution. I would proceed with the trimmed samples without worrying too much, this will most likely have no effect on downstream data processing.

ADD REPLY
0
Entering edit mode

thank you very much for your advice

ADD REPLY
1
Entering edit mode

Have you already taken a look at those reads? Something like

seqkit grep -s -p '".*[CG]{10,70}.*[CG]{5,70}.*[CG]{10,70}"' -r < your.fastq > CGrich.fastq

should extract at least some of those very CG-rich reads?

ADD REPLY
0
Entering edit mode

I haven't tried this but it's a good idea !

ADD REPLY
0
Entering edit mode

From which organism these reads come from?

ADD REPLY
0
Entering edit mode

it's human culture cells (keratinocytes transfected with HPV + fibroblasts)

ADD REPLY

Login before adding your answer.

Traffic: 1765 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6