Hello,
I am new to data analysis and nanopore sequencing. I have transfected a library of synonymous variants of GFPs into HEK293 cells and sequenced the mRNA on the nanopore. Then I basecalled the data on guppy basecaller and filtered reads with low quality. After that, I mapped all the reads to the reference database of the GFP library using the minimap. From mapped reads, I filtered the primary alignments and visualised them on IGV. I used the following commands:
awk '$2==0||$2==16{print $0}' (to extracts the reads with flag 0 & 16, as they are the primary alignments)
samtools view -bS (to convert the SAM to BAM)
samtools sort (sort the BAM file)
samtools rmdup (to remove the duplicate reads and retain the reads with the best quality)
samtools index (to index the BAM file)
After this, I visualize this BAM file in IGV. Here I see some grey lines and lots of white lines. IGV explanation says:
"Note that alignments that are displayed with light gray borders and transparent or white fill, as shown in the screenshot, have a mapping quality equal to zero. Interpretation of this mapping quality depends on the mapping aligner as some commonly used aligners use this convention to mark a read with multiple alignments. In such a case, the read also maps to another location with equally good placement. It is also possible the read could not be uniquely placed but the other placements do not necessarily give equally good quality hits."
I want to know how to get rid of this situation and get unique reads with good quality.
Thanks a lot.
Hi, Thanks alot, This was really helpful and worked, however, I am looking for references where I can find how the minimap assign the MAPQ values and in what range, if you something about this, will be great help. Thanks