Help with Samtools flags to get only primary, unique, non duplicate aligned reads
1
2
Entering edit mode
5.2 years ago
kerem.wainer ▴ 20

I am trying to use samtools (samtools 1.7Using htslib 1.7-2) to extract only reads that were mapped once and were not marked as duplicates by Picard.

As I understand samtools view -F -"flag" excludes following flags from the bam file.

The flags I want to exlude are: flag 4 is unmapped, flag 256 is secondary alignment. So using flag 260 I am getting:

samtools view -F -260 $f | grep 'NH:i:1\b' | wc -l

I get 9104 reads (It is a single cell).

This includes only reads that have flag 0. (Although I was hoping to get also 16 which is on the reverse strand).

When I want to exclude also the duplicates (flag 1024 but also 1004?) I try using flag -1284 :

samtools view -F -1284 $f | grep 'NH:i:1\b' | wc -l

but then I get much more reads: 21411

When I look at the reads of the second option they give reads of flag 0 and flag 1024 (which I was hoping to exclude) and it doesn't contain flag 16 which I was hoping to include.

Could somebody explain me this result?

What is your way to get only primary, unique, non duplicate aligned reads?

Thanks in advance

software error alignment sequencing • 6.1k views
ADD COMMENT
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLY
0
Entering edit mode

Thanks I didn't know these features.

ADD REPLY
2
Entering edit mode
5.2 years ago
ATpoint 82k

I like using -F 1284 to remove non-primary alignments, duplicates and unmapped reads in combination with a MAPQ > 20 (-q 20) for this purpose plus removing all non-primary chromosomes like unplaced contigs etc. The definition of an unique alignment depends on the aligner and has been been discussed many times before, please use the search function and google.

Suggestion given a sorted and indexed BAM file (assuming hg38):

samtools idxstats in.bam | cut -f 1 | grep -vE 'chrM|_random|chrU|chrEBV|\*' | \
xargs samtools view -f 1 -F 1284 -q 20 -o out.bam in.bam
ADD COMMENT
0
Entering edit mode

Thank you for your answer. Did the -F 1284 remove the reads flagged 1024 for you?

ADD REPLY
0
Entering edit mode

Yes given they are appropriately flagged.

ADD REPLY
0
Entering edit mode

Thanks. For some reason, it doesn't work for me. -F over 0x400 was eventually what worked. Thank you very much!

ADD REPLY

Login before adding your answer.

Traffic: 3373 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6