bwa: reads aligned concordantly exactly 1 time
1
0
Entering edit mode
8 weeks ago
poecile.pal ▴ 50

Good evening,

I'd like to compare the alignment quality of hisat2, bowtie2 and bwa for my files.

• The first 2 packages output the percentage of reads aligned concordantly exactly 1 time,
• bwa does not, because does not output alignment summary.

The samtools flagstat report is not enough, because it outputs only the general level of alignment, and I want to compare the other (percentage of reads aligned concordantly exactly 1 time).

I could filter the sam file by these reads, but I'm not sure I know how to do it correctly. I learned how to filter the sam file after hisat2 with the command samtools view -hf 0x2 -q 3,I have not learned how to filter the sam file after bowtie2 - none of the solutions I found worked, BUT in these cases I can at least check myself. In the case of bwa, I can't check myself due to the lack of alignment summary.

Could you please advise how to get a fraction of reads aligned concordantly exactly 1 time, and extract them from the sam file after bwa (version 0.7.17)?

EDIT:

I tried samtools view -f 0x2 | grep -v -e 'XA:Z:' -e 'SA:Z:' (based on the combination of the previous link and this), but obtained unpaired reads in the final file. XA:Z is it is put down by bwa if there are alternative alignments. But in some cases XA:Z exist in only one string of pair, for example:

ERR3316120.48768420     99      NODE_5_length_137336913_cov_22.55       93837250        60      24S127M =       93837315        216     GTTGAACCATGGCACCCC
TTGTTAAGGCTACCTTTTGCATGCCCAGAGATGCAAGCACCAAGTTCTGCTATCAATTTACATTGTGACAGTTTGCAGATGACTTCTGCACGCACCACGTGTCCTGGACACATGGAATTCTCTTTTGCTGGCC   AA<A<FJJJF7-AFAJAJ
JJJJJJFFFJJFFJ7JJJFFFJJJF<JFFFJJAFF<FJJJJ<JFFFJJJFJFJJ--JAAJJJJFJ-7FF-AJJJJFJA-F<FFFJJJFJFJFAFFJJJJJFJFJFFAAFJJFJJFA<-FFJFF-FAAJJAFFA   NM:i:0  MD:Z:127 M
C:Z:151M        AS:i:127        XS:i:92
ERR3316120.48768420     147     NODE_5_length_137336913_cov_22.55       93837315        60      151M    =       93837250        -216    TTTGCAGATGACTTCTGC
ACGCACCACGTGTCCTGGACACATGGAATTCTCTTTTGCTGGCCTGTATCAATGTTTTGGATCTTTTTCTACGTTCAGGCTATGTAAACATTCATTTCTAACGACAAATGAATTCCTTGAGTTACATATTTAA   JFFAFAFJJJJAJFJA7<
A<FAAJAJJJJF7<<FFJF<FFJFFJ<FA-FA-FF7FAFJFF-JJJJJAAJJJJJ<JFJFJJFA<FF<7FF-<AF7AFA-JJJJJJA7FJJJJJJFF<JJJFAF-J-AJJJJFAAF<JFJFJFFFFJAFAAA-   NM:i:0  MD:Z:151 M
C:Z:24S127M     AS:i:151        XS:i:123        XA:Z:NODE_5_length_137336913_cov_22.55,-93864831,151M,6;


So, in the final file I have only the first string. Should I remove such alignments manually, using smth like this?

Best regards, Poecile

bowtie2 samtools hisat2 alignment bwa • 293 views
1
Entering edit mode
4 weeks ago
poecile.pal ▴ 50

I came to the following decision:

1) samtools view -f 0x2 - remove not aligned concordantly

2) grep -v -e 'XA:Z:' -e 'SA:Z:' - remove alignments with XA (set if there are alternative ones), with SA (set if it is additional)

3) samtools sort -n | samtools view -h | awk -f script.awk - remove the remaining unpaired entries. Where do unpaired entries come from? It happens that XA is placed at one of the members of the pair. In this case, grep removes this member of the pair, and leaves the other one.

Works for version 0.7.17.