Does anyone know what is the GC-content of different human chromosomes?
Does anyone know what is the GC-content of different human chromosomes?
**EDIT**
OK, so I felt bad about not actually answering your question, so here you go (generated by the method outlined below):
#Sequence   GC content
chr1          0.43
chr2          0.40
chr3          0.40
chr4          0.38
chr5          0.40
chr6          0.40
chr7          0.41
chr8          0.40
chr9          0.43
chr10         0.42
chr11         0.42
chr12         0.41
chr13         0.40
chr14         0.43
chr15         0.44
chr16         0.45
chr17         0.46
chr18         0.40
chr19         0.48
chr20         0.44
chr21         0.43
chr22         0.49
chrX          0.40
chrY          0.46
chrM          0.44
**EDIT ENDS**
The GC content of human chromosomal DNA is very heterogeneous, rendering chromosome-wide statistics relatively meaningless. It has been shown that the human genome is a mosaic of GC-rich and GC-poor regions, of around 300kb in length, called isochores.
You can plot these regions of varying content using the Emboss program isochore. For example, for chromosome 1.
wget <http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chr1.fa.gz>
gunzip chr1.fa.gz
isochore -sequence chr1.fa -outfile chr1.isochore -graph png
Gives the following result:

You could also get the sequences of the individual chromosomes and work out their overall GC content, also using Emboss, this time geecee:
   geecee -sequence chr1.fa
Gives us an answer of 43% for Chromosome 1.
hmm, yes agreed, the treatment of 'N' will affect the results considerably. From http://emboss.sourceforge.net/apps/cvs/emboss/apps/geecee.html - 'It sums the number of G and C bases in the input sequence(s) and writes the result to file as the fraction (in the interval 0.0 to 1.0) of the length of the whole sequence.'
GRCh37/hg19/b37:
1   0.417439
2   0.402438
3   0.396943
4   0.382479
5   0.395163
6   0.396109
7   0.407513
8   0.401757
9   0.413168
10  0.415849
11  0.415657
12  0.40812
13  0.385265
14  0.408872
15  0.42201
16  0.447894
17  0.455405
18  0.39785
19  0.483603
20  0.441257
21  0.408325
22  0.479881
X   0.394963
Y   0.391288
MT  0.443626
Done by:
seqtk comp hs37m.fa.gz | awk '/^[0-9MXY]/{x=$4+$5;y=x+$3+$6;print $1"\t"x/y}'
ChrY has lots of ambiguous bases and that is why my result differs most on chrY in comparison to the EMBOSS result. EMBOSS is wrong, IMHO.
I was about to tell you, but then someone crashed the server. Here is how (using Biopieces):
read_fasta -i /home/DATA/downloads/Homo_sapiens/human_hg19.fasta.gz | analyze_gc | write_tab -ck SEQ_NAME,GC% -x
#SEQ_NAME       GC%
gi|89161184|ref|AC_000044.1| Homo sapiens chromosome 1, alternate assembly Celera, whole genome shotgun sequence        40.77
using bedtools nuc on hg19 :
1         chr1  0.377295 
2         chr2  0.394172 
3         chr3  0.390478 
4         chr4  0.375491 
5         chr5  0.388130 
6         chr6  0.387498 
7         chr7  0.397821 
8         chrX  0.384356 
9         chr8  0.392218 
10        chr9  0.351521 
11       chr10  0.402901 
12       chr11  0.403720 
13       chr12  0.397843 
14       chr13  0.319767 
15       chr14  0.336276 
16       chr15  0.336248 
17       chr16  0.391037 
18       chr17  0.436335 
19       chr18  0.380423 
20       chr20  0.416613 
21        chrY  0.172677 
22       chr19  0.456450 
23       chr22  0.326388 
24       chr21  0.297838  
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Funny that there are three different GC% answers for Chr1 ...
Depends on the genome build and version that you use - it's perfectly 'legit', as they say in Cockney London slang.
The truth of the matter is that we do not have an honest representation of the true GC content because the reference genome builds exclude / mask telomeric and centromeric regions, where GC content is high.
Thus, all values represented in this thread are based on the genome builds and are not reflective of the actual GC content, which would be larger and which would differ from individual to individual.