Question: How To Find Unique Reads?
1
gravatar for alok.helix
5.1 years ago by
alok.helix70
India
alok.helix70 wrote:

i have am working on a transcriptomics project of a non-model plant and i want to know from you how to count the uniquely mapped reads in sam format or the bam format. I have done the indexing of my file with BWA-mem algo for the paired end illumina data and generated the sam and bam file using samtools. I used the command $ cut -f1 myfile.sam | sort | uniq -u | wc -l but i got zero for all the input file. After closely reading the BWA manual i noticed a TAG of XT which i guess will be used to distinguish unique/repeat/Mate -sw etc. but after using the grep in following command:

$ samtools view bwa.bam | grep "XT:A:U" it yielded zero...

Now i am seriously doubting my commands and getting caught in the vicious circle of various commands including this one:

$ samtools view accepted_hits.bam | awk '$5==255{print $0}' > uniq_mapped_reads.sam

The samtools manual says NO alignment should be assigned a mapping quality of 255. I request you to look into the above query and throw some light on uniquely mapped reads and its searching?? I request you to mail the explanation or links on alok.helix@gmail.com. Thanking you Alok

rnaseq reads • 9.6k views
ADD COMMENTlink modified 5.1 years ago by KCC3.9k • written 5.1 years ago by alok.helix70
3
gravatar for Istvan Albert
5.1 years ago by
Istvan Albert ♦♦ 78k
University Park, USA
Istvan Albert ♦♦ 78k wrote:

Note that the SAM specification does not require that any of the tags be present.

bwa-mem is a preeliminary release, as far as I know will not fill in the XT tag (or perhaps it won't fill the XA tag, one or both I can't recall which)

use the bwa aln method or a different aligner

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Istvan Albert ♦♦ 78k
3
gravatar for KCC
5.1 years ago by
KCC3.9k
Cambridge, MA
KCC3.9k wrote:

I once looked at this problem as well. I recall three ways of identifying a read that was mapped to multiple positions with BWA:

  1. the presence of the characters "XT:A:R"

  2. the presence of the characters "XA:Z:"

  3. mapping quality (MAPQ) of 0.

Based on my notes, these results all gave similar, but slightly different answers. I have found that people in the bioinformatics community tend to favor option 3.

You can just do simple grep searches on the SAM file for lines with the "XT:A:R" or "XA:Z:" present for the first two. You can use `samtools view -q1 my.bam' to get the last one.

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by KCC3.9k
1
gravatar for ancient_learner
5.1 years ago by
India
ancient_learner610 wrote:

If your input is in sam format check XS:i:<N> it will only be present if the SAM record is for an aligned read and more than one alignment was found for the read. so grep -v XS filename would solve your problem

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by ancient_learner610

hi i have used the following command

$ samtools view .bam | fgrep "XS:i:0" | wc -l
46,953,719.

This definately cannot be uniquley mapped. then i used

$samtools view .bam |fgrep "NM:i:1" | wc -l
5,865,151
$ samtools view .bam | fgrep "XS:i:1" | wc -l
1,862,839

but all are in vain i guess....

ADD REPLYlink modified 5.1 years ago by Istvan Albert ♦♦ 78k • written 5.1 years ago by alok.helix70

kindly read my reply properly "it will only be present if the SAM record is for an aligned read and more than one alignment was found for the read".

ADD REPLYlink written 5.1 years ago by ancient_learner610
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1075 users visited in the last hour