Is correct finding unique reads with the flag -F 0x104?
2
0
Entering edit mode
9.7 years ago
miquinhap • 0

Hello,

I read a lot of topics about how to find uniquely mapped reads. I mapped my PE RNA-Seq data with Bowtie2 and BWA to the genome at different condition (-a, -k, --local for Bowtie2 and BWA-mem and BWA-sampe) to analyze which one is better. For Bowtie2, I saw that some people used to filter the alignment with the flag -F 0x100 and others with the MAPQ 30. For BWA, the most filters the alignment by using the MAPQ 1.

However, I was doing some tests and I noticed that I can't use only the flag -F 0x100, because the mapper attribute this flag to unmapped reads too. I conclude that to get only the unique reads I should use the -F 0x104. Does anyone agree with me or noticed the same?

For example:

$ samtools view -bf 0x4 file.bam > unmapped.bam
$ samtools view -c unmapped.bam
15678472

$ samtools view -cF 0x100 unmapped.bam
15678472

I also filtered my reads with the -q 30, but I realize that I had more uniquely reads using the flag -F 0x104 than the -q 30. Could this be due to false positives? And how is the best way to filter the reads with this too aligners? I'm intended to use the flag -F 0x104 for both.

Thanks, Michele

bwa RNA-Seq bowtie2 mapq flag • 3.7k views
ADD COMMENT
2
Entering edit mode
9.7 years ago
David Fredman ★ 1.1k

To retain only uniquely mapped reads mapped with BWA one can filter by the BWA XT flag value U for unique. I did not find a simple way to do this with samtools or bamtools, so grep to the rescue:

samtools view reads.bam | grep 'XT:A:U' | \
samtools view -bS -T referenceSequence.fa - > reads.uniqueMap.bam

You could do a diff of sam output from this, and that produced by your samtools filter, to check if it's correct.

ADD COMMENT
0
Entering edit mode
9.7 years ago
Ian 6.0k

If your aim is to perform RNA-seq analysis, perhaps you should be using Tophat or RNA-STAR? A certain amount of mapping ambiguity is expected in RNA-seq analysis. A colleague of mine uses uniquely mapped reads when looking for novel transcripts. RNA-STAR can be run to explicitly find uniquely mapped reads.

Also something to consider is whether you final set of mapped reads contains properly-paired reads, which can be achieved using:

samtools view -f 2
ADD COMMENT
0
Entering edit mode

I don't use Tophat because I'm studying trypanosomatids and they don't have introns. About RNA-Star I'm going to have a look. I saw some comparison studies saying that the best one for PE is BWA-mem. But I still don't know the best way to filter my alignment. I'm already not convinced about it, but I'm using the flag -F 0x104.

ADD REPLY

Login before adding your answer.

Traffic: 2637 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6