Alignment-free counting of sequence abundance

0

Entering edit mode

4.1 years ago

Saima ▴ 10

I have used the solution posted here in the past for counting the abundance of unique sequences in multiple fasta files. Is there a better tool (memory-efficient and fast) for doing the counting for large queries (>100 million reads)? I don't have a reference genome for my samples, so I am trying to find an alignment-free approach for counting the abundance of such a large number of reads ( length <100 nt). Any suggestions will be much appreciated!

sequence • 751 views

ADD COMMENT • link 4.1 years ago by Saima ▴ 10

0

Entering edit mode

Do you have a transcriptome? I think this is the bare minimum you'll need. Also what it the organism?

ADD REPLY • link 4.1 years ago by Asaf 10k

0

Entering edit mode

These are sRNA-seq data, so I don't want to just map to the transcriptome. It's from a plant without a published genome, I am using a list of unique sequences and counting their abundance in different tissue samples.

ADD REPLY • link 4.1 years ago by Saima ▴ 10

0

Entering edit mode

Did you try assembling the sRNAs? I assume it won't do much but you will have a sort of reference you can map against with Salmon

ADD REPLY • link 4.1 years ago by Asaf 10k

0

Entering edit mode

Thanks, that's a good suggestion, I am also working on assembling the reads besides direct counting.

ADD REPLY • link 4.1 years ago by Saima ▴ 10

Login before adding your answer.