Hello all,
I am making a reference data base that must have the following format for all sequences:
>MG8515941;tax=k:SAR,p:Alveolata,c:Dinophyceae,o:Suessiales,f:Borghiellaceae,g:Borghiella,s:Borghiella sp
TGGCGAATGAACAGGGACAAGCTCGGCATGGAAATTGGGGCCTCTGGCCTTGAATTGTAGCCTCGAGAAG
Somewhere there is at least one error that is causing the file to not be recognized as a fasta file by the amplicon sequencing pipeline I'm using (AMPtk). I have tried searching for every error motif I can think of (using textedit), but I can't find the problem. I think the most likely issue is that one (or more) of the sequences is missing a hard return after the taxonomy string.
The file (RDPSILVA_LSUdatabase_error.fasta
) is here https://osf.io/cz3mh/
Any suggestions for how I can find the error(s) without going through the file line by line?