Hello, I am doing a K-mer base genome wide association mapping. I have some significant K-mers of which I want to find the position in the genome. For that, I have extracted the reads from which each K-mer originated. For example, for one significant K-mer, I have ~30 paired end short reads extracted from one sample fastq file (R1 and R2). Then I used the following code to map those reads against the reference genome with bowtie2:
bowtie2 --very-sensitive-local -p 2 -x ref.genome -1 R1.fastq -2 R2.fastq -S output.sam
Then I performed the necessary steps with samtools to convert and sort it and finally with sambamba remove the unmapped reads. Then I load the bam file in IGV with the reference genome.
My question is, as there are multiple reads for One K-mer, which read should I use to get the location? If there are multiple alignments of the same reads in different locations how should I choose the best one? Is there any mapping score in .bam file that I should look for?
Looking forward to getting some suggestions.
Happy Easter!
k-mers by their characteristic are going to present in more than one read. You did not start with k-mers that were unique to your dataset so how can you expect them to have only one hit. All multi-mapping reads are going to be given a poor mapping score so there is no help in that direction.
I am not sure exactly what you are trying to do here but if you were using
bbmap.shyou would have the option of randomly selecting one of the best sites by usingambig=randomoption.Actually, in the paper that I am following, there they mentioned that they choose the best alignment according to the mapping score. But I really did not understand. I can imagine that there will be multiple reads for one K-mer. That was the case in the reference paper as well. But they chose the location of the K-mer according to the best alignment. So, I was wondering how could I do that. Any idea?