How to get unique mapped reads?
3
0
Entering edit mode
5.8 years ago
biolab ★ 1.3k

Dear all,

Previously I thought when the -g/--max-multihits option in tophat was set to 1, then I acquired the unique mapped reads.  However, after careful thinking, I am probably wrong. 

Now my idea is: when doing mapping, set the -g/--max-multihits option to 2 or above 2, and then write a script to eliminate those reads occurring twice or more times.  Am I right?  Are there any tools or simple approach to obtain unique mapped reads?   I appreciate any of your comments and answers.  THANKS A LOT!

bam mapping • 5.7k views
ADD COMMENT
1
Entering edit mode

Using BBMap, you can set the flag "ambig=toss" which will ensure all mapped reads have unique mappings.  Reads that multimap will instead be marked as unmapped (and discarded, if you use "outm" instead of "out").

ADD REPLY
1
Entering edit mode

You can fgrep all the reads with tag NH:i:1, which filter outs only unique mapped reads.

sam/bam | fgrep -w NH:i:1

ADD REPLY
3
Entering edit mode
5.8 years ago

This has been asked before a few times - here's one discussion about bowtie2 (since tophat uses bowtie2 internally you could use the same points)

How to extract unique mapped results from Bowtie2 bam results?

Most notably:

>You're better served by simply filtering on a meaningful MAPQ (5 or 10 are often reasonable choices), which has the benefit of actually doing what you want, namely filtering according the the likelihood that an alignment is correct.

So using "samtools view -q 10 input > filtered_output" (or even higher? check your distribution of scores)

ADD COMMENT
2
Entering edit mode
5.8 years ago
    \samtools view -f 0x02 in.bam | sort | uniq >unique_mapped_reads.txt

Obviously you would want to specify a region

    \samtools view -f 0x02 in.bam 15:20000000-30000000 | sort | uniq >unique_mapped_praderWilliLocus_reads.txt

for example.

ADD COMMENT
1
Entering edit mode

Thank you all for your comments and answers.  They are definitely helpful for my work.

ADD REPLY
0
Entering edit mode

you work on CNVs too mate ;)

ADD REPLY
1
Entering edit mode
5.8 years ago
seidel 7.7k

On thing you could do is to examine the NH tag in the tophat output file (accepted_hits.bam). It lists the number of alignments for that read, and thus you could filter out reads with NH > 1.

ADD COMMENT

Login before adding your answer.

Traffic: 1757 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6