Dear Biostars,
I'm trying to extract multi-mapped and unique reads from a SAM/BAM file after doing an alignment using bowtie aligner with single end reads.
./bowtie -p 6 -v 0 -a -f --sam -- best '/hg38' '/myfasta.fa' '/results.sam'
results.sam:
URS00001E85BB_9606 0 chr19 37279510 255 26M * 0 0 TCCAATGCAGTGAGGCCGAAAGGGTC IIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:26 NM:i:0 XM:i:15
URS00001E85BB_9606 0 chr19 37292136 255 26M * 0 0 TCCAATGCAGTGAGGCCGAAAGGGTC IIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:26 NM:i:0 XM:i:15
URS00001E85BB_9606 0 chr19 37297187 255 26M * 0 0 TCCAATGCAGTGAGGCCGAAAGGGTC IIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:26 NM:i:0
...
URS00003A9162_9606 4 * 0 0 * * 0 0 TAATGATCTGCCTCAAACCTAGTTTTCNGCC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII XM:i:0
URS00005E6300_9606 0 chrX 99575628 255 28M * 0 0 TGCAGTAAATTGGTATTGAGGTTGTGGG IIIIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:28 NM:i:0 XM:i:2
URS00003E8A0A_9606 16 chr18 405642 255 28M * 0 0 AGGTATTTCAATCACGCTGATCCATTCA IIIIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:28 NM:i:0 XM:i:2
URS00002B3810_9606 4 * 0 0 * * 0 0 TGCTTGTGAACTGCTTGTGAATGGGTGTGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIII XM:i:0
URS00000E8FF1_9606 16 chr3 12851709 255 29M * 0 0 TCACCCTAGAGCTTTGCACACTGTTACCA IIIIIIIIIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:29 NM:i:0 XM:i:2
After that, I convert it to bam, then extracted only mapped reads:
samtools view -u -F 4 'results.bam' > mappedResults.bam
and in order to get only unique mapped reads I did:
samtools view mappedResults.bam | awk '{print 10}' | awk 'NF > 0' | sort | uniq -u > uniqResultsSeq.txt
"Extract with awk each sequence, take into account only non-zero field lines, sort them and extract only unique sequences."
If I want to extract unique and multi-mapped reads but "collapsed", then I used:
samtools view mappedResults.bam | awk '{print 10}' | awk 'NF > 0' | sort -u > uniqResultsSeq.txt
Still, I haven't found a way to extract only the multi-mapped reads. Maybe I should use XM> 1? In this answer they use XS flag to find the uniquely mapped reads but it was used on bowtie2. Is there another straightforward way to get unique/muti-mapped reads using samtools from bowtie results?
Thank you for your time
Konstantinos