Comparing millions of trimmed reads to large database
1
0
Entering edit mode
12 days ago
geneticatt ▴ 120

Hi all,

I have a set of reads which I've trimmed down to 21nt based on the sequencing experiment. I'd like to compare these 21nt sequences to a database of 300,000 21nt sequences to annotate each read. I attempted to use bowtie2 by making a indices for the database then mapping the reads, but the mapping rate was lower than expected, suggesting that the bowtie read mapping method isn't amenable to this type of comparison.

Next I tried using Blastn, but it's apparently too slow for this scale of comparison.

Can someone please recommend a tool or approach for making so many exact comparisons?

Thanks

bowtie2 blastn • 119 views
1
Entering edit mode
1. You should try using bowtie v.1.x. You need that to do ungapped alignments with small reads such as these.
2. You may be able to use blat as well.
3. Using seqkit grep.
4. Using bbmap.sh with ambig=all vslow perfectmode maxsites=1000 options.
0
Entering edit mode
12 days ago
h.mon 32k

With these short sequences (is this microRNA), I suspect clustering will be more efficient than mapping. First, deduplicate both query and subject files with, e.g., VSEARCH, CD-HIT or Dedupe.sh. Then, the same tools can be used to find the common sequences between both datasets.