Closed:Extract unique and multimapped reads from bowtie 1 sam/bam output
0
0
Entering edit mode
5.5 years ago

Dear Biostars,

I'm trying to extract multi-mapped and unique reads from a SAM/BAM file after doing an alignment using bowtie aligner with single end reads.

./bowtie -p 6 -v 0 -a -f --sam -- best  '/hg38' '/myfasta.fa' '/results.sam'

results.sam:

URS00001E85BB_9606  0   chr19   37279510    255 26M *   0   0   TCCAATGCAGTGAGGCCGAAAGGGTC  IIIIIIIIIIIIIIIIIIIIIIIIII  XA:i:0  MD:Z:26 NM:i:0  XM:i:15
URS00001E85BB_9606  0   chr19   37292136    255 26M *   0   0   TCCAATGCAGTGAGGCCGAAAGGGTC  IIIIIIIIIIIIIIIIIIIIIIIIII  XA:i:0  MD:Z:26 NM:i:0  XM:i:15
URS00001E85BB_9606  0   chr19   37297187    255 26M *   0   0   TCCAATGCAGTGAGGCCGAAAGGGTC  IIIIIIIIIIIIIIIIIIIIIIIIII  XA:i:0  MD:Z:26 NM:i:0 
...
URS00003A9162_9606  4   *   0   0   *   *   0   0   TAATGATCTGCCTCAAACCTAGTTTTCNGCC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII XM:i:0
URS00005E6300_9606  0   chrX    99575628    255 28M *   0   0   TGCAGTAAATTGGTATTGAGGTTGTGGG    IIIIIIIIIIIIIIIIIIIIIIIIIIII    XA:i:0  MD:Z:28 NM:i:0  XM:i:2
URS00003E8A0A_9606  16  chr18   405642  255 28M *   0   0   AGGTATTTCAATCACGCTGATCCATTCA    IIIIIIIIIIIIIIIIIIIIIIIIIIII    XA:i:0  MD:Z:28 NM:i:0  XM:i:2
URS00002B3810_9606  4   *   0   0   *   *   0   0   TGCTTGTGAACTGCTTGTGAATGGGTGTGT  IIIIIIIIIIIIIIIIIIIIIIIIIIIIII  XM:i:0
URS00000E8FF1_9606  16  chr3    12851709    255 29M *   0   0   TCACCCTAGAGCTTTGCACACTGTTACCA   IIIIIIIIIIIIIIIIIIIIIIIIIIIII   XA:i:0  MD:Z:29 NM:i:0  XM:i:2

After that, I convert it to bam, then extracted only mapped reads:

samtools view -u  -F 4 'results.bam' > mappedResults.bam

and in order to get only unique mapped reads I did:

samtools view mappedResults.bam | awk '{print 10}' | awk 'NF > 0' | sort | uniq -u > uniqResultsSeq.txt

"Extract with awk each sequence, take into account only non-zero field lines, sort them and extract only unique sequences."

If I want to extract unique and multi-mapped reads but "collapsed", then I used:

samtools view mappedResults.bam | awk '{print 10}' | awk 'NF > 0' | sort -u > uniqResultsSeq.txt

Still, I haven't found a way to extract only the multi-mapped reads. Maybe I should use XM> 1? In this answer they use XS flag to find the uniquely mapped reads but it was used on bowtie2. Is there another straightforward way to get unique/muti-mapped reads using samtools from bowtie results?

Thank you for your time

Konstantinos

bowtie samtools multimapped reads RNA-Seq sRNA • 247 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2554 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6