Hello, I am doing a K-mer base genome wide association mapping. I have some significant K-mers of which I want to find the position in the genome. For that, I have extracted the reads from which each K-mer originated. For example, for one significant K-mer, I have ~30 paired end short reads extracted from one sample fastq file (R1 and R2). Then I used the following code to map those reads against the reference genome with bowtie2:
bowtie2 --very-sensitive-local -p 2 -x ref.genome -1 R1.fastq -2 R2.fastq -S output.sam
Then I performed the necessary steps with samtools
to convert and sort it and finally with sambamba
remove the unmapped reads. Then I load the bam file in IGV with the reference genome.
My question is, as there are multiple reads for One K-mer, which read should I use to get the location? If there are multiple alignments of the same reads in different locations how should I choose the best one? Is there any mapping score in .bam
file that I should look for?
Looking forward to getting some suggestions.
Happy Easter!
k-mers by their characteristic are going to present in more than one read. You did not start with k-mers that were unique to your dataset so how can you expect them to have only one hit. All multi-mapping reads are going to be given a poor mapping score so there is no help in that direction.
I am not sure exactly what you are trying to do here but if you were using
bbmap.sh
you would have the option of randomly selecting one of the best sites by usingambig=random
option.Actually, in the paper that I am following, there they mentioned that they choose the best alignment according to the mapping score. But I really did not understand. I can imagine that there will be multiple reads for one K-mer. That was the case in the reference paper as well. But they chose the location of the K-mer according to the best alignment. So, I was wondering how could I do that. Any idea?