Question: Calculating dinucleotide frequencies for each sequence in a multi fasta file
0
gravatar for emilyc
11 weeks ago by
emilyc10
emilyc10 wrote:

Bonjour,

I am looking for an easy way to calculate the dinucleotide frequencies of every sequence in a multi fasta file, and to have these saved with the sequence names associated with each frequency. Is there an easy command, tool, or way to do this using bash, or R, etc.?

I can think of some complicated methods of doing this, but if something is already programmed to do this then that would be amazing.

Merci pour votre temps,

Emily

bash dinucleotide linux R fasta • 151 views
ADD COMMENTlink modified 11 weeks ago by trausch1.4k • written 11 weeks ago by emilyc10
1

Jellyfish is a popular tool for counting kmers in a multi-fasta file.

ADD REPLYlink written 11 weeks ago by a.zielezinski8.8k

Script To Calculate Dinucleotide Frequency For Many Sequences
Biostrings package in R (see page 72).

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by genomax70k

My understanding of this was that it's calculating the dinucleotide frequencies for all of the sequences over all, and not each. Am I incorrect in thinking this? I did not test this one but the OP said "I should note that I do not need to calculate each sequence's dinucleotide frequency. I need to get the di-nt frequency from all sequences. Thank you for your attention." in a comment.

Thank you.

ADD REPLYlink written 11 weeks ago by emilyc10

You can iterate through your sequences as input and get those for each sequence. Or are you asking something else?

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by genomax70k

I mean yeah I can do this in a round about way, however, I am posting here to see if someone has/knows of a nicely done tool that fits my needs already with some organised outputs such as a dataframe with all frequencies for each seq with the original ID, etc. I can do the work, but if I don't need to then that is helpful.

ADD REPLYlink written 11 weeks ago by emilyc10
1
gravatar for trausch
11 weeks ago by
trausch1.4k
Germany
trausch1.4k wrote:

The kent utils from the UCSC genome browser have a rapid tool for this

faCount -dinuc genome.fa

Precompiled binaries are here.

ADD COMMENTlink written 11 weeks ago by trausch1.4k
0
gravatar for jrj.healey
11 weeks ago by
jrj.healey13k
United Kingdom
jrj.healey13k wrote:

There is a solution in this thread that should work: A long run time problem

ADD COMMENTlink written 11 weeks ago by jrj.healey13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1073 users visited in the last hour