Hello everyone,
I have got 18S rRNA illumina sequence data and I performed Frederick Mahe's metabarcoding pipeline successfully on it. In a taxonomic assignment script I used an NGS database, and I got a lot of results. But since I am interested in protists, I want to use the "Protist Ribosomal Reference database" PR2 for that. However, for some reason I dont get any hits. I tried both pr2_UTAX.fasta.gz and pr2_taxo_long.fasta.gz (see here). The only difference between the NGS database and the PR2 database I can see is that the sequences in PR2 are multiple times longer than in NGS where sequences are only one-liners. Do I have to adjust something in the vsearch script? Clearly I am missing something when there are no hits whatsoever.
I see you asked this a couple of months ago already, but perhaps an answer is still useful. I believe a reference database should only contain sequences you expect to sequence: so not a whole gene region, but the specific target region (also without primers and tags). Extracting this target region can be achieved through an in-silico PCR. Did you do this on your PR2 database before using it as a reference database?