How to prepare histogram for uniq reads count
1
1
Entering edit mode
9.8 years ago

Hello,

I have just started with sequencing data, I would like to prepare a graph of uniq reads against read counts.I have tab delimited file, as follows:

>t0000001 1243667
TGAGGTAGTAGGTTGTATAGTT
>t0000002 1036829
TGAGGTAGTAGATTGTATAGTT
>t0000003 572202
TGAGGTAGTAGGTTGTGTGGTT
>t0000004 347737
TGAGGTAGTAGGTTGTGTGGTTT
>t0000005 194555
TGAGGTAGTAGGTTGTATGGTT
>t0000006 138816
TGAGGTAGTAGGTTGTATAGT
>t0000007 115676
TGAGGTAGTAGGTTGTATAGTTT

The first column is sequence identifier and second column is read count. I want to plot read count against the frequency of that read count, i.e. how many reads have read count equal to 1, between 1-5, 6-10, 10 - 50 etc.

I do not know how to do this, could someone please guide me on how to do this?

Thank you

sequencing RNA-Seq • 2.6k views
ADD COMMENT
1
Entering edit mode
9.8 years ago
David Fredman ★ 1.1k

Have a look at this thread for how to count occurrences for each read. You can use some built-in *nix tools and bioawk to great effect for this problem, e.g. (code copied from one of Istvan Albert's posts in the thread referenced)

# create unique sequence counts
$cat sample1.fq | bioawk -c fastx ' { print substr($seq, 1, 50) } ' | sort | uniq -c | sort -rg > seqstats.txt

Then aggregate counts to see how many "1", "2" three counts you have. E.g. cut out the count from the start of the line and do another uniq -c.

# count frequency of sequence counts
cat sample1.fq |\
  bioawk -c fastx ' { print substr($seq, 1, 50) } ' |\
  sort |\
  uniq -c |\
  sort -rg |\
  cut -f4 -d " " |\
  uniq -c > readcount_stats.txt

Not exactly what you asked for, but Michele Busby's rarefaction plot might be useful to you "where the number of reads sequenced is the x axis and the number of unique reads is the y axis. For high complexity libraries that are not sequenced to saturation this will form a fairly straight line. For over-sequenced low complexity libraries the line will asymptote at the point where you stop adding new information when you add more reads".

ADD COMMENT

Login before adding your answer.

Traffic: 2208 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6