GC content calculation in percentage
2
1
Entering edit mode
21 months ago
Neo_42 ▴ 10

Hello!

I have an output of DNA sequences generated in the CSV file format. I would like to know the percentage of GC Content. Is there any way I can write a command statement in Linux to get the GC content in percentage?

Many thanks!

Linux GC-Content • 1.2k views
ADD COMMENT
2
Entering edit mode
21 months ago
mark.ziemann ★ 1.9k

There is a bioinformatics tool for linux called Emboss which is great for such tasks. The emboss command you want is geecee

$ sudo apt update && sudo apt install emboss
$ geecee mtdna.fa 
Calculate fractional GC content of nucleic acid sequences
Output file [nc_012920.geecee]: 
$ cat nc_012920.geecee
#Sequence   GC content
NC_012920.1   0.44
ADD COMMENT
0
Entering edit mode
21 months ago
Ernest Bonat ▴ 10

In Python using the PyDNA library:

https://medium.com/mlearning-ai/apply-machine-learning-algorithms-for-genomics-data-classification-132972933723#c97e

dna_sequence_string = "ATATATCCCGGGAATTTTCGTAGTTAGGCTGATTTTATTGGCGCGAAAATTT"
gc_content = PyDNA.dna_count_gc_content(dna_sequence_string)
print(“DNA sequence string:\n{}”.format(dna_sequence_string))
print(“GC-content:\n{}”.format(gc_content))

Result:

DNA sequence string: ATATATCCCGGGAATTTTCGTAGTTAGGCTGATTTTATTGGCGCGAAAATTT

GC-content: 36.5%
ADD COMMENT

Login before adding your answer.

Traffic: 2633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6