How To Compute The Nucleotide Composition Of Large Number Of Sequences (Preferably With Anonline Tool)
4
0
Entering edit mode
8.8 years ago

I need a server or a software to calculate nucleotide compositin of many sequences Simultaneously .can you help me?(except CAIcal server) Please ask me at once.

Tank you

server tool • 5.8k views
2
Entering edit mode

when you say nucleotide composition, what do you mean exactly. Plus it is important to understand the scope of the problem. How many sequences do you need to process and how long is the longest?

Solutions could be as simple as writing something like:

s = "ATGCA"
# how many As?
s.count("A")


If your sequences are longer than say a ten million bases or you have tens of millions of sequences to process then it needs to be a little more sophisticated.

4
Entering edit mode
8.8 years ago

if the number of sequences is not too big, you can save and open the following HTML page (it uses the HTML5 file-API to load the data):

EDIT:: i've updated my code. It handles multiple FASTA files.

1
Entering edit mode

That's a neat solution Pierre!

2
Entering edit mode

Indeed, I'm even willing to overlook your ridiculous bracket usage in this case and commend you for a very nice solution! :)

0
Entering edit mode
0
Entering edit mode

I've updated the code (multiple FASTA files)

1
Entering edit mode
8.8 years ago
mlongo ▴ 40

Easy command line tool (faCount) from UCSC (download page for linux- http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/) will give you nucleotide composition of each sequence or can give you a summary of total nt content for the input fasta.

(faSize is another handy one to summarize the number of reads and total number of nt)

0
Entering edit mode
8.8 years ago
qiyunzhu ▴ 430

Simply type command:

grep -o "A" * | wc -l

0
Entering edit mode
8.8 years ago

You can also use any k-mer counter and set k=1. My own python k-mer counter is here (needs khmer and screed python libraries).