Question: How To Compute The Nucleotide Composition Of Large Number Of Sequences (Preferably With Anonline Tool)
0
gravatar for mahsa.alemi
6.4 years ago by
mahsa.alemi0 wrote:

I need a server or a software to calculate nucleotide compositin of many sequences Simultaneously .can you help me?(except CAIcal server) Please ask me at once.

Tank you

server tool • 4.8k views
ADD COMMENTlink modified 6.4 years ago by mlongo40 • written 6.4 years ago by mahsa.alemi0
2

when you say nucleotide composition, what do you mean exactly. Plus it is important to understand the scope of the problem. How many sequences do you need to process and how long is the longest?

Solutions could be as simple as writing something like:

s = "ATGCA"
# how many As?
s.count("A")

If your sequences are longer than say a ten million bases or you have tens of millions of sequences to process then it needs to be a little more sophisticated.

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Istvan Albert ♦♦ 81k
4
gravatar for Pierre Lindenbaum
6.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum122k wrote:

if the number of sequences is not too big, you can save and open the following HTML page (it uses the HTML5 file-API to load the data):

EDIT:: i've updated my code. It handles multiple FASTA files.

ADD COMMENTlink modified 4.5 years ago • written 6.4 years ago by Pierre Lindenbaum122k
1

That's a neat solution Pierre!

ADD REPLYlink written 6.4 years ago by Istvan Albert ♦♦ 81k
2

Indeed, I'm even willing to overlook your ridiculous bracket usage in this case and commend you for a very nice solution! :)

ADD REPLYlink written 6.4 years ago by Daniel Standage3.9k

original post: http://plindenbaum.blogspot.fr/2011/04/playing-with-html5-file-api-translating.html ( dna -> protein )

ADD REPLYlink written 6.4 years ago by Pierre Lindenbaum122k

I've updated the code (multiple FASTA files)

ADD REPLYlink written 6.4 years ago by Pierre Lindenbaum122k
1
gravatar for mlongo
6.4 years ago by
mlongo40
mlongo40 wrote:

Easy command line tool (faCount) from UCSC (download page for linux- http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/) will give you nucleotide composition of each sequence or can give you a summary of total nt content for the input fasta.

(faSize is another handy one to summarize the number of reads and total number of nt)

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by mlongo40
0
gravatar for qiyunzhu
6.4 years ago by
qiyunzhu420
Buffalo
qiyunzhu420 wrote:

Simply type command:

grep -o "A" * | wc -l
ADD COMMENTlink written 6.4 years ago by qiyunzhu420
0
gravatar for Manu Prestat
6.4 years ago by
Manu Prestat3.9k
Marseille, France
Manu Prestat3.9k wrote:

You can also use any k-mer counter and set k=1. My own python k-mer counter is here (needs khmer and screed python libraries).

ADD COMMENTlink written 6.4 years ago by Manu Prestat3.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1649 users visited in the last hour