Question: Help with Samtools flags to get only primary, unique, non duplicate aligned reads
0
gravatar for kerem.wainer
6 months ago by
kerem.wainer0 wrote:

I am trying to use samtools (samtools 1.7Using htslib 1.7-2) to extract only reads that were mapped once and were not marked as duplicates by Picard.

As I understand samtools view -F -"flag" excludes following flags from the bam file.

The flags I want to exlude are: flag 4 is unmapped, flag 256 is secondary alignment. So using flag 260 I am getting:

samtools view -F -260 $f | grep 'NH:i:1\b' | wc -l

I get 9104 reads (It is a single cell).

This includes only reads that have flag 0. (Although I was hoping to get also 16 which is on the reverse strand).

When I want to exclude also the duplicates (flag 1024 but also 1004?) I try using flag -1284 :

samtools view -F -1284 $f | grep 'NH:i:1\b' | wc -l

but then I get much more reads: 21411

When I look at the reads of the second option they give reads of flag 0 and flag 1024 (which I was hoping to exclude) and it doesn't contain flag 16 which I was hoping to include.

Could somebody explain me this result?

What is your way to get only primary, unique, non duplicate aligned reads?

Thanks in advance

ADD COMMENTlink modified 6 months ago by ATpoint21k • written 6 months ago by kerem.wainer0

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLYlink written 6 months ago by RamRS23k

Thanks I didn't know these features.

ADD REPLYlink written 6 months ago by kerem.wainer0
0
gravatar for ATpoint
6 months ago by
ATpoint21k
Germany
ATpoint21k wrote:

I like using -F 1284 to remove non-primary alignments, duplicates and unmapped reads in combination with a MAPQ > 20 (-q 20) for this purpose plus removing all non-primary chromosomes like unplaced contigs etc. The definition of an unique alignment depends on the aligner and has been been discussed many times before, please use the search function and google.

Suggestion given a sorted and indexed BAM file (assuming hg38):

samtools idxstats in.bam | cut -f 1 | grep -vE 'chrM|_random|chrU|chrEBV|\*' | \
xargs samtools view -f 1 -F 1284 -q 20 -o out.bam in.bam
ADD COMMENTlink modified 6 months ago • written 6 months ago by ATpoint21k

Thank you for your answer. Did the -F 1284 remove the reads flagged 1024 for you?

ADD REPLYlink written 6 months ago by kerem.wainer0

Yes given they are appropriately flagged.

ADD REPLYlink written 6 months ago by ATpoint21k

Thanks. For some reason, it doesn't work for me. -F over 0x400 was eventually what worked. Thank you very much!

ADD REPLYlink written 6 months ago by kerem.wainer0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1394 users visited in the last hour