I'd like to compare the alignment quality of
bwa for my files.
- The first 2 packages output the percentage of reads aligned concordantly exactly 1 time,
bwadoes not, because does not output alignment summary.
samtools flagstat report is not enough, because it outputs only the general level of alignment, and I want to compare the other (percentage of reads aligned concordantly exactly 1 time).
I could filter the
sam file by these reads, but I'm not sure I know how to do it correctly. I learned how to filter the
sam file after
hisat2 with the command
samtools view -hf 0x2 -q 3,I have not learned how to filter the sam file after
bowtie2 - none of the solutions I found worked, BUT in these cases I can at least check myself. In the case of
bwa, I can't check myself due to the lack of alignment summary.
Could you please advise how to get a fraction of reads aligned concordantly exactly 1 time, and extract them from the
sam file after
bwa (version 0.7.17)?
samtools view -f 0x2 | grep -v -e 'XA:Z:' -e 'SA:Z:' (based on the combination of the previous link and this), but obtained unpaired reads in the final file.
XA:Z is it is put down by bwa if there are alternative alignments. But in some cases
XA:Z exist in only one string of pair, for example:
ERR3316120.48768420 99 NODE_5_length_137336913_cov_22.55 93837250 60 24S127M = 93837315 216 GTTGAACCATGGCACCCC TTGTTAAGGCTACCTTTTGCATGCCCAGAGATGCAAGCACCAAGTTCTGCTATCAATTTACATTGTGACAGTTTGCAGATGACTTCTGCACGCACCACGTGTCCTGGACACATGGAATTCTCTTTTGCTGGCC AA<A<FJJJF7-AFAJAJ JJJJJJFFFJJFFJ7JJJFFFJJJF<JFFFJJAFF<FJJJJ<JFFFJJJFJFJJ--JAAJJJJFJ-7FF-AJJJJFJA-F<FFFJJJFJFJFAFFJJJJJFJFJFFAAFJJFJJFA<-FFJFF-FAAJJAFFA NM:i:0 MD:Z:127 M C:Z:151M AS:i:127 XS:i:92 ERR3316120.48768420 147 NODE_5_length_137336913_cov_22.55 93837315 60 151M = 93837250 -216 TTTGCAGATGACTTCTGC ACGCACCACGTGTCCTGGACACATGGAATTCTCTTTTGCTGGCCTGTATCAATGTTTTGGATCTTTTTCTACGTTCAGGCTATGTAAACATTCATTTCTAACGACAAATGAATTCCTTGAGTTACATATTTAA JFFAFAFJJJJAJFJA7< A<FAAJAJJJJF7<<FFJF<FFJFFJ<FA-FA-FF7FAFJFF-JJJJJAAJJJJJ<JFJFJJFA<FF<7FF-<AF7AFA-JJJJJJA7FJJJJJJFF<JJJFAF-J-AJJJJFAAF<JFJFJFFFFJAFAAA- NM:i:0 MD:Z:151 M C:Z:24S127M AS:i:151 XS:i:123 XA:Z:NODE_5_length_137336913_cov_22.55,-93864831,151M,6;
So, in the final file I have only the first string. Should I remove such alignments manually, using smth like this?
Thank you in advance!
Best regards, Poecile