Question: Text Histogram From Sequence Lengths Of Fasta File?
2
gravatar for 14134125465346445
7.4 years ago by
United Kingdom
141341254653464453.5k wrote:

I swear I have seen this somewhere but I can't find it again. Is there a script somewhere to produce text Stem-and-Leaf Histograms from sequence lengths of fasta file?

fasta • 3.8k views
ADD COMMENTlink modified 3.1 years ago by Biostar ♦♦ 20 • written 7.4 years ago by 141341254653464453.5k
5
gravatar for Leonor Palmeira
7.4 years ago by
Leonor Palmeira3.7k
Li├Ęge, Belgium
Leonor Palmeira3.7k wrote:

You can do this in R with a combination of the seqinR and the aplpack packages:

library(seqinr)
data=read.fasta("file.fasta")
l=getLength(x)

library(aplpack)
stem.leaf(l)

There are a number of options to suit your needs:

stem.leaf(l, unit=100)

Just have a look here for more output options and details:

?stem.leaf

This functions extends far more than the simple stem() function in the graphics package.

ADD COMMENTlink written 7.4 years ago by Leonor Palmeira3.7k
2
gravatar for Jeroen Van Goey
7.4 years ago by
Jeroen Van Goey2.2k
Ghent, Belgium
Jeroen Van Goey2.2k wrote:

My initial idea was to use Biopython for the parsing of the fasta file and matplotlib for the drawing of the stem-leaf plot. I couldn't find a function that draws a text histogram in matplotlibto my amazement, so here is my take:

from Bio import SeqIO
from collections import defaultdict

input_file = r'path\to\file.fasta'
lengths = [len(record.seq) for record in SeqIO.parse(open(input_file), 'fasta')]
d = defaultdict(list)
for length in lengths:
    d[length/10].append(length%10)

for nr in range(min(d.keys()), max(d.keys())+1):
    print "{0:>3} | {1}".format(nr, ' '.join(map(str, sorted(d[nr]))) if nr in d else '')

For this fictitious input file it produces this histogram:

  5 | 3 4 8
  6 | 0 0 6 7 9
  7 | 0
  8 | 6
  9 | 0
 10 | 9
 11 | 2 5 6
 12 | 0 0 0 0 5 8
ADD COMMENTlink modified 7.4 years ago • written 7.4 years ago by Jeroen Van Goey2.2k
1
gravatar for Madelaine Gogol
7.4 years ago by
Madelaine Gogol5.1k
Kansas City
Madelaine Gogol5.1k wrote:

There is something in the kent source tree called textHistogram. It takes a file with one number per line, which you can get from fasta using awk...

textHistogram <(awk < test.fa '$0 !~ /^>/ {print length($0)}')
ADD COMMENTlink written 7.4 years ago by Madelaine Gogol5.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2666 users visited in the last hour