I am supposed to analyze some metabarcoding reads. However, the forward and reverse reads are unable to be merged due to lack of overlap. I was informed that this is because the sequence was too long, so the forward and reverse sequences couldn't extend far enough to overlap adequately. My question is, what would be the problem of using the forward reads alone, as if they were single-end? My first thought is that they are too short to be a legitimate barcode sequence for identifying taxa. But I'm not sure. It's COI. I gather from Meusnier et al. 2008 that a 95% success rate of species identification was obtained with 250-bp mini barcodes. My forward sequences are that long. But this region by itself has not been tested for specificity. How does this impact its reliability? Thank you for your input.
Using just the forward read is a good idea. Just watch out that depending on your library preparation method read 1 might not correspond to the forward direction but forward and reverse is mixes ~50:50. Here trimming the forward primer on read one and two, using e.g. Cutadapt and then reverse complementing the read 2 can help. If you are concerned about the reliability of the identification with just one direction, you could also analyze read 2 the same way and compare results, or fill in the missing basses in between the reads with Ns to obtain a "full-length sequence". If you do so, however, make sure to apply strict filtering afterward to discard reads of poor quality especially read 2 ends. You can also concatenate sequences, and "reformat" the sequences in your reference database to match these. But this is maybe a bit much effort, using only forward direction should be sufficient in most cases my opinion =)