Question

Is correct finding unique reads with the flag -F 0x104?

0

Entering edit mode

11.0 years ago

miquinhap • 0

Hello,

I read a lot of topics about how to find uniquely mapped reads. I mapped my PE RNA-Seq data with Bowtie2 and BWA to the genome at different condition (-a, -k, --local for Bowtie2 and BWA-mem and BWA-sampe) to analyze which one is better. For Bowtie2, I saw that some people used to filter the alignment with the flag -F 0x100 and others with the MAPQ 30. For BWA, the most filters the alignment by using the MAPQ 1.

However, I was doing some tests and I noticed that I can't use only the flag -F 0x100, because the mapper attribute this flag to unmapped reads too. I conclude that to get only the unique reads I should use the -F 0x104. Does anyone agree with me or noticed the same?

For example:

$ samtools view -bf 0x4 file.bam > unmapped.bam
$ samtools view -c unmapped.bam
15678472

$ samtools view -cF 0x100 unmapped.bam
15678472

I also filtered my reads with the -q 30, but I realize that I had more uniquely reads using the flag -F 0x104 than the -q 30. Could this be due to false positives? And how is the best way to filter the reads with this too aligners? I'm intended to use the flag -F 0x104 for both.

Thanks, Michele

bwa RNA-Seq bowtie2 mapq flag • 4.1k views

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 11.0 years ago by miquinhap • 0

Ram · Answer 1 · 2014-08-04

To retain only uniquely mapped reads mapped with BWA one can filter by the BWA XT flag value U for unique. I did not find a simple way to do this with samtools or bamtools, so grep to the rescue:

samtools view reads.bam | grep 'XT:A:U' | \
samtools view -bS -T referenceSequence.fa - > reads.uniqueMap.bam

You could do a diff of sam output from this, and that produced by your samtools filter, to check if it's correct.

Ram · Answer 2 · 2014-07-30

0

Entering edit mode

11.0 years ago

Ian 6.1k

If your aim is to perform RNA-seq analysis, perhaps you should be using Tophat or RNA-STAR? A certain amount of mapping ambiguity is expected in RNA-seq analysis. A colleague of mine uses uniquely mapped reads when looking for novel transcripts. RNA-STAR can be run to explicitly find uniquely mapped reads.

Also something to consider is whether you final set of mapped reads contains properly-paired reads, which can be achieved using:

samtools view -f 2

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 11.0 years ago by Ian 6.1k

0

Entering edit mode

I don't use Tophat because I'm studying trypanosomatids and they don't have introns. About RNA-Star I'm going to have a look. I saw some comparison studies saying that the best one for PE is BWA-mem. But I still don't know the best way to filter my alignment. I'm already not convinced about it, but I'm using the flag -F 0x104.

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by miquinhap • 0