I have an interesting (to me) problem that I'm not sure how to approach. I have a library of randomized short inserts (21 nt) that has been sequenced using the SOLiD platform, with 25 nt reads. The insert will be at the very start of the reads. I want to count the distinct insert sequences. The straightforward way appears to be to convert the reads to fastq, filter based on quality, and count in base space. I'm worried about errors, as I have no way of checking for them that I can see, other than the last 4 nt (22-25) which should be identical in all reads. Any suggestions or interesting approaches to accomplish this? I am not experienced with NGS projects, so sorry if this is a dumb question.