Alignment-free counting of sequence abundance
0
0
Entering edit mode
4.1 years ago
Saima ▴ 10

I have used the solution posted here in the past for counting the abundance of unique sequences in multiple fasta files. Is there a better tool (memory-efficient and fast) for doing the counting for large queries (>100 million reads)? I don't have a reference genome for my samples, so I am trying to find an alignment-free approach for counting the abundance of such a large number of reads ( length <100 nt). Any suggestions will be much appreciated!

sequence • 751 views
ADD COMMENT
0
Entering edit mode

Do you have a transcriptome? I think this is the bare minimum you'll need. Also what it the organism?

ADD REPLY
0
Entering edit mode

These are sRNA-seq data, so I don't want to just map to the transcriptome. It's from a plant without a published genome, I am using a list of unique sequences and counting their abundance in different tissue samples.

ADD REPLY
0
Entering edit mode

Did you try assembling the sRNAs? I assume it won't do much but you will have a sort of reference you can map against with Salmon

ADD REPLY
0
Entering edit mode

Thanks, that's a good suggestion, I am also working on assembling the reads besides direct counting.

ADD REPLY

Login before adding your answer.

Traffic: 2407 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6