Question: What Is The Gc-Content Across Different Human Chromosomes?
7
gravatar for Dan12345
7.8 years ago by
Dan12345130
Dan12345130 wrote:

Does anyone know what is the GC-content of different human chromosomes?

gc chromosome • 26k views
ADD COMMENTlink modified 3.4 years ago by sacha1.8k • written 7.8 years ago by Dan12345130
2

Funny that there are three different GC% answers for Chr1 ...

ADD REPLYlink written 7.8 years ago by Martin A Hansen3.0k
2

Depends on the genome build and version that you use - it's perfectly 'legit', as they say in Cockney London slang.

The truth of the matter is that we do not have an honest representation of the true GC content because the reference genome builds exclude / mask telomeric and centromeric regions, where GC content is high.

Thus, all values represented in this thread are based on the genome builds and are not reflective of the actual GC content, which would be larger and which would differ from individual to individual.

ADD REPLYlink written 19 months ago by Kevin Blighe49k
15
gravatar for Simon Cockell
7.8 years ago by
Simon Cockell7.3k
Newcastle
Simon Cockell7.3k wrote:

**EDIT**

OK, so I felt bad about not actually answering your question, so here you go (generated by the method outlined below):

#Sequence   GC content
chr1          0.43
chr2          0.40
chr3          0.40
chr4          0.38
chr5          0.40
chr6          0.40
chr7          0.41
chr8          0.40
chr9          0.43
chr10         0.42
chr11         0.42
chr12         0.41
chr13         0.40
chr14         0.43
chr15         0.44
chr16         0.45
chr17         0.46
chr18         0.40
chr19         0.48
chr20         0.44
chr21         0.43
chr22         0.49
chrX          0.40
chrY          0.46
chrM          0.44

**EDIT ENDS**

The GC content of human chromosomal DNA is very heterogeneous, rendering chromosome-wide statistics relatively meaningless. It has been shown that the human genome is a mosaic of GC-rich and GC-poor regions, of around 300kb in length, called isochores.

You can plot these regions of varying content using the Emboss program isochore. For example, for chromosome 1.

wget <http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chr1.fa.gz>
gunzip chr1.fa.gz
isochore -sequence chr1.fa -outfile chr1.isochore -graph png

Gives the following result:

Isochores of Chr 1

You could also get the sequences of the individual chromosomes and work out their overall GC content, also using Emboss, this time geecee:

   geecee -sequence chr1.fa

Gives us an answer of 43% for Chromosome 1.

ADD COMMENTlink modified 7.8 years ago • written 7.8 years ago by Simon Cockell7.3k

Well, the number I got for chr1 is 41.7% with my program. I guess EMBOSS is count "N" as 50% GC, but it should not do that! We should send a bug report, I think.

ADD REPLYlink written 7.8 years ago by lh331k

hmm, yes agreed, the treatment of 'N' will affect the results considerably. From http://emboss.sourceforge.net/apps/cvs/emboss/apps/geecee.html - 'It sums the number of G and C bases in the input sequence(s) and writes the result to file as the fraction (in the interval 0.0 to 1.0) of the length of the whole sequence.'

ADD REPLYlink written 7.8 years ago by Simon Cockell7.3k
12
gravatar for lh3
7.8 years ago by
lh331k
United States
lh331k wrote:

GRCh37/hg19/b37:

1   0.417439
2   0.402438
3   0.396943
4   0.382479
5   0.395163
6   0.396109
7   0.407513
8   0.401757
9   0.413168
10  0.415849
11  0.415657
12  0.40812
13  0.385265
14  0.408872
15  0.42201
16  0.447894
17  0.455405
18  0.39785
19  0.483603
20  0.441257
21  0.408325
22  0.479881
X   0.394963
Y   0.391288
MT  0.443626

Done by:

seqtk comp hs37m.fa.gz | awk '/^[0-9MXY]/{x=$4+$5;y=x+$3+$6;print $1"\t"x/y}'

ChrY has lots of ambiguous bases and that is why my result differs most on chrY in comparison to the EMBOSS result. EMBOSS is wrong, IMHO.

ADD COMMENTlink written 7.8 years ago by lh331k
1
gravatar for Martin A Hansen
7.8 years ago by
Martin A Hansen3.0k
Denmark
Martin A Hansen3.0k wrote:

I was about to tell you, but then someone crashed the server. Here is how (using Biopieces):

read_fasta -i /home/DATA/downloads/Homo_sapiens/human_hg19.fasta.gz | analyze_gc | write_tab -ck SEQ_NAME,GC% -x
#SEQ_NAME       GC%
gi|89161184|ref|AC_000044.1| Homo sapiens chromosome 1, alternate assembly Celera, whole genome shotgun sequence        40.77
ADD COMMENTlink written 7.8 years ago by Martin A Hansen3.0k

Well, AC_000044 is the Celera assembly, not hg19. In addition, Perl is notoriously inefficient for looping through each base.

ADD REPLYlink written 7.8 years ago by lh331k
1
gravatar for sacha
3.4 years ago by
sacha1.8k
France
sacha1.8k wrote:

using bedtools nuc on hg19 :

1 chr1 0.377295
2 chr2 0.394172
3 chr3 0.390478
4 chr4 0.375491
5 chr5 0.388130
6 chr6 0.387498
7 chr7 0.397821
8 chrX 0.384356
9 chr8 0.392218
10 chr9 0.351521
11 chr10 0.402901
12 chr11 0.403720
13 chr12 0.397843
14 chr13 0.319767
15 chr14 0.336276
16 chr15 0.336248
17 chr16 0.391037
18 chr17 0.436335
19 chr18 0.380423
20 chr20 0.416613
21 chrY 0.172677
22 chr19 0.456450
23 chr22 0.326388
24 chr21 0.297838

ADD COMMENTlink written 3.4 years ago by sacha1.8k
2

Ehm your results are remarkably different from what was obtained earlier here in this topic. Also not sure if this topic was worth reviving after 4.4 years.

ADD REPLYlink written 3.4 years ago by WouterDeCoster41k
1

Someone else revived it just now... after 6 years! They up-voted lh3's answer. I also then gave my own comment at the very top

ADD REPLYlink written 19 months ago by Kevin Blighe49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 773 users visited in the last hour