Hi! I have an alignment/mapping question for the following dataset. I have ~1 million different reference sequences with an average of 110 nucleotides, and an actual sequencing of 50 million reads, where one read completely covers a reference sequence (the theoretical Illumina read length is longer than the longest reference possible) . I'd like to uniquely match every sequencing read to a reference sequence. Of course we might have sequencing error, or experimental errors coming from steps before the libprep/sequencing, so a given read might contain mismatches, indels and match multiple references.
I was thinking about creating a bwa index with 1 million artificial chromosomes, run bwa and process the resulting alignments. Is there anything better/faster/easier to parse?
This and this question is somewhat similar, but not really the same.