Efficient read mapping and alignment to reference set

0

Entering edit mode

4.0 years ago

endre.sebestyen ▴ 10

Hi! I have an alignment/mapping question for the following dataset. I have ~1 million different reference sequences with an average of 110 nucleotides, and an actual sequencing of 50 million reads, where one read completely covers a reference sequence (the theoretical Illumina read length is longer than the longest reference possible) . I'd like to uniquely match every sequencing read to a reference sequence. Of course we might have sequencing error, or experimental errors coming from steps before the libprep/sequencing, so a given read might contain mismatches, indels and match multiple references.

I was thinking about creating a bwa index with 1 million artificial chromosomes, run bwa and process the resulting alignments. Is there anything better/faster/easier to parse?

This and this question is somewhat similar, but not really the same.

alignment sequencing • 438 views

ADD COMMENT • link 4.0 years ago by endre.sebestyen ▴ 10

Login before adding your answer.