What GC content should I expect?

0

Entering edit mode

12 months ago

BioinfGuru ★ 2.1k

Hi all,

I am analysing a large Pig RNA-seq dataset of over 200 samples across 5 tissues. According to NCBI, the GC content of the Pig genome is 42%. I understand RNA-seq data should return higher GC content than whole genome data

During testing of the pipeline on a few random samples, the fastqc graphs returned a normal distribution peaking around 50%, with 1 sample shifted slightly left at around 47%, and over-represented sequences returned were just adapters.

So I have 2 questions:

1) What should I expect the RNA-seq GC content to be? My guess is ~ 47% (genome GC + 5%). Should I be concerned if all samples return a GC content of 50%

2) If there are a small number of samples showing a lower GC content by a few percent than the rest, should they be removed from the analysis? How should that be handled? Is a few % nothing to be concerned about?

Thanks in advance,

Kenneth

RNA-seq GC-content • 564 views

ADD COMMENT • link updated 12 months ago by ATpoint 87k • written 12 months ago by BioinfGuru ★ 2.1k

1

Entering edit mode

My recommendation is to ignore meaningless metrics such as GC content and focus on relevqnt QC. That is mapping rate and how samples look downstream, e.g. in PCA to assess group separation.