Question: GC content biases in RNA-seq data of A.thaliana
1
gravatar for rioualen
9 weeks ago by
rioualen220
France
rioualen220 wrote:

Hello,

I'm currently working on RNA-seq data from A.thaliana, and I have questions about quality and GC content. I guess it's normal to have a higher GC% in RNA-seq data than in the genome itself, since coding sequences usually show a bias toward GC. However, A.thaliana has a GC rate of 36% and my samples go up to 51-53%, isn't that a bit too much?

I'm wondering because although the quality of the sequencing looked OK from the FastQC reports, I have a very low rate of mapping, like 10-20% of reads. I have only one sample that maps over 60%, and this one has a GC rate of 44%.

I tried mapping with bowtie2 and subread-align, both with default params (meaning 0 mismatches and 3 mismatches respectively).

I'm a bit confused here, any idea someone?

EDIT

I tried aligning on the TAIR10 assembly instead of Araport11 and now I've got >90% of mapping for each sample! I'm still confused but at least it works...

rna-seq mapping athaliana gc • 212 views
ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by rioualen220
1

Is it paired-end data? If yes, you could try to align the reads separately as single-end data. If alignment rate seems reasonable, you can try to increase fragment size (see advanced parameters of aligners). Furthermore, I would recommend STAR-aligner. I don't know much about subread-align, but bowtie2 designed for DNA data.

ADD REPLYlink written 9 weeks ago by e.rempel540
1
gravatar for theobroma22
9 weeks ago by
theobroma221.0k
theobroma221.0k wrote:

If you did it correctly you have to consider that 36 percent and 52 percent are percentages, or ratios. Your transcriptome is certainly smaller than the genome, and as you said your transcriptome has a higher GC content. So, it's quite plausible there's nothing wrong.

ADD COMMENTlink written 9 weeks ago by theobroma221.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1370 users visited in the last hour