why I got "bitwise FLAG" of "unmapped" and "mapping quality" in high value of "37", in my bwa-derived sam file
1
0
Entering edit mode
9.2 years ago
liuhui ▴ 20

Dear all in Biostars,

Here, I have a question related to bwa mapping and sam format.

I did bwa (the most updated version 0.7.12-r1039) mapping. In the output sam file, I got "bitwise FLAG" (column NO. 2 in the sam file) of "segment unmapped" (in values of 4 or 20) and "mapping quality" (column NO. 5) in high value of "37", in my bwa-derived sam file.

I have few experience in NGS data analysis. I do not understand why I got "bitwise FLAG" of "unmapped" and "mapping quality" in high value of "37" in the same time for some cases. How could I explain it? Do you have any background/algorithm explain to me? Or any other instruction and advice?

Thanks a lot in advance. Looking forward to hearing from you.

Best wishes

Hui Liu

(1) some lines of the sam file I got, by executing the shell command (cat Chaetosphaeridium_globosum_1kp_PreHC.sam | awk '!/@SQ/' | awk '!/@PG/ {print $1,$2,$3,$5}' | awk '!/\*/'| awk '($4>30)' | less)

Amb_319 16 SCAFFOLD-DRGY-0010745-CHAETOSPHAERIDIUM_GLOBOSUM 37
Amb_425 0 SCAFFOLD-DRGY-0031508-CHAETOSPHAERIDIUM_GLOBOSUM 37
Amb_9129 4 SCAFFOLD-DRGY-0007394-CHAETOSPHAERIDIUM_GLOBOSUM 37
Amb_9332 16 SCAFFOLD-DRGY-0008651-CHAETOSPHAERIDIUM_GLOBOSUM 37
Amb_12330 16 SCAFFOLD-DRGY-0026127-CHAETOSPHAERIDIUM_GLOBOSUM 37
Amb_12724 0 SCAFFOLD-DRGY-0031353-CHAETOSPHAERIDIUM_GLOBOSUM 37
Amb_15140 16 SCAFFOLD-DRGY-0012013-CHAETOSPHAERIDIUM_GLOBOSUM 37
Amb_15821 0 SCAFFOLD-DRGY-0031130-CHAETOSPHAERIDIUM_GLOBOSUM 37
Amb_15822 0 SCAFFOLD-DRGY-0031130-CHAETOSPHAERIDIUM_GLOBOSUM 37
Amb_16158 0 SCAFFOLD-DRGY-0027739-CHAETOSPHAERIDIUM_GLOBOSUM 37

(2) some of the command lines I used

bwa index /media/10TB/HClncRNA/PlDB_a${s}/${i}/${i}_PreHC.fasta
bwa aln -t 60 -o 0 -i 0 -e 0 -d 0 /media/10TB/HClncRNA/PlDB_a${s}/${i}/${i}_PreHC.fasta \
/media/10TB/fasta2fastq/smRNA.fastq > /media/10TB/HClncRNA/PlDB_a${s}/${i}/${i}_PreHC.sai
bwa samse -f /media/10TB/HClncRNA/PlDB_a${s}/${i}/${i}_PreHC.sam \
/media/10TB/HClncRNA/PlDB_a${s}/${i}/${i}_PreHC.fasta \
/media/10TB/HClncRNA/PlDB_a${s}/${i}/${i}_PreHC.sai \
/media/10TB/fasta2fastq/smRNA.fastq
alignment mapping bwa next-gen • 2.8k views
ADD COMMENT
0
Entering edit mode

Can you paste the entire line for the following?

grep "^Amb_9129" Chaetosphaeridium_globosum_1kp_PreHC.sam
ADD REPLY
0
Entering edit mode

Hi, Geek_y, Thanks for your kind reply. Here is the output from your command:

Amb_9129        4       SCAFFOLD-DRGY-0007394-CHAETOSPHAERIDIUM_GLOBOSUM        524     37      19M     *       0       0       AAAAACACGATTGCCTGCC     IIIIIIIIIIIIIIIIIII     XT:A:U  NM:i:1  X0:i:1  X1:i:0       XM:i:1  XO:i:0  XG:i:0  MD:Z:0T18
ADD REPLY
3
Entering edit mode
9.2 years ago

From SAM file specification:

Bit 0x4 is the only reliable place to tell whether the read is unmapped. If 0x4 is set, no assumptions can be made about RNAME, POS, CIGAR, MAPQ, bits 0x2, 0x10, 0x100 and 0x800, and the bit 0x20 of the previous read in the template.

So assume its unmapped irrespective of MAPQ. May be also look into genome browser

ADD COMMENT
1
Entering edit mode

A lot of people ignore this, but it is an important aspect of the specification.

ADD REPLY

Login before adding your answer.

Traffic: 2996 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6