How to calculate GC content of a bam ?
2
3
Entering edit mode
17 months ago
Sh1von ▴ 30

I want an overall GC content statistics, not the GC content of each reads

bam • 945 views
ADD COMMENT
2
Entering edit mode
17 months ago
GenoMax 141k

Using BBMap suite (it will produce some other stats but you can ignore those) :

$ reformat.sh -Xmx2g in=your.bam out=stdout.fa | stats.sh -Xmx2g in=stdin.fa


Could not find sambamba.
Found samtools 1.16
Input is being processed as unpaired
Input:                          9207996 reads           684032432 bases
Output:                         9207996 reads (100.00%)         684032432 bases (100.00%)

Time:                           14.205 seconds.
Reads Processed:       9207k    648.22k reads/sec
Bases Processed:        684m    48.15m bases/sec
A       C       G       T       N       IUPAC   Other   GC      GC_stdev
0.2179  0.2731  0.2738  0.2351  0.0000  0.0000  0.0000  0.5470  0.0632
ADD COMMENT
1
Entering edit mode
17 months ago

slow version

 samtools view  in.bam | cut -f 10 | fold -w 1 | awk '($1=="G" || $1=="C") {N++;} END {print (N/(1.0*NR)*100);}'
ADD COMMENT

Login before adding your answer.

Traffic: 2725 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6