Hi All, I have been working on an alignment algorithm lately. I want to know, what causes multiple alignments in a genome (human or any other). Particularly, Is there is a possibility of same read repeating itself at multiple positions with or without actual SNPs/sequencing errors? Thanks in advance!
Reads can (and at many times do) map to multiple locations on a genome. A simple example would be a read that aligns to an exon belonging to a duplicated gene.
When dealing with high throughput sequencing data, reads are often de-duplicated/ignored after alignment.
Particularly, Is there is a possibility of same read repeating itself at multiple positions with or without actual SNPs/sequencing errors?
Genomes of higher eukaryotes tend to have many repeats, and polyploidy adds another level of complexity.
A concrete example: L1 retrotransposons (aka LINE1) are present in human genome in up to half a million copies. These copies are not identical, but large chunks of them are. So yes, there will be many reads that map to multiple positions within a genome.
Reads can (and at many times do) map to multiple locations on a genome. A simple example would be a read that aligns to an exon belonging to a duplicated gene.
When dealing with high throughput sequencing data, reads are often de-duplicated/ignored after alignment.