How to get unique mapped reads?
3
0
Entering edit mode
5.8 years ago
biolab ★ 1.3k

Dear all,

Previously I thought when the -g/--max-multihits option in tophat was set to 1, then I acquired the unique mapped reads.  However, after careful thinking, I am probably wrong.

Now my idea is: when doing mapping, set the -g/--max-multihits option to 2 or above 2, and then write a script to eliminate those reads occurring twice or more times.  Am I right?  Are there any tools or simple approach to obtain unique mapped reads?   I appreciate any of your comments and answers.  THANKS A LOT!

bam mapping • 5.7k views
1
Entering edit mode

Using BBMap, you can set the flag "ambig=toss" which will ensure all mapped reads have unique mappings.  Reads that multimap will instead be marked as unmapped (and discarded, if you use "outm" instead of "out").

1
Entering edit mode

You can fgrep all the reads with tag NH:i:1, which filter outs only unique mapped reads.

sam/bam | fgrep -w NH:i:1

3
Entering edit mode
5.8 years ago

This has been asked before a few times - here's one discussion about bowtie2 (since tophat uses bowtie2 internally you could use the same points)

How to extract unique mapped results from Bowtie2 bam results?

Most notably:

>You're better served by simply filtering on a meaningful MAPQ (5 or 10 are often reasonable choices), which has the benefit of actually doing what you want, namely filtering according the the likelihood that an alignment is correct.

So using "samtools view -q 10 input > filtered_output" (or even higher? check your distribution of scores)

2
Entering edit mode
5.8 years ago
    \samtools view -f 0x02 in.bam | sort | uniq >unique_mapped_reads.txt

Obviously you would want to specify a region

    \samtools view -f 0x02 in.bam 15:20000000-30000000 | sort | uniq >unique_mapped_praderWilliLocus_reads.txt

for example.

1
Entering edit mode

0
Entering edit mode

you work on CNVs too mate ;)

1
Entering edit mode
5.8 years ago
seidel 7.7k

On thing you could do is to examine the NH tag in the tophat output file (accepted_hits.bam). It lists the number of alignments for that read, and thus you could filter out reads with NH > 1.