Filtering for primary and secondary reads using sam flags (0 properly paired reads in alignment step)
1
0
Entering edit mode
6 months ago

Hey everybody,

I have just performed the alignment of paired-end reads to a reference using bwa mem with the -M flag, ran samtools markdup and flagstat. Flagstat produced the following output:

4985084 + 0 in total (QC-passed reads + QC-failed reads)
1806492 + 0 primary
3178592 + 0 secondary
0 + 0 supplementary
1102030 + 0 duplicates
1102030 + 0 primary duplicates
4502645 + 0 mapped (90.32% : N/A)
1324053 + 0 primary mapped (73.29% : N/A)
1806492 + 0 paired in sequencing
903246 + 0 read1
903246 + 0 read2
0 + 0 properly paired (0.00% : N/A)
1274254 + 0 with itself and mate mapped
49799 + 0 singletons (2.76% : N/A)
236204 + 0 with mate mapped to a different chr
51688 + 0 with mate mapped to a different chr (mapQ>=5)

The output shows that the paired-end data was successfully detected and processed. Noteworthy, zero reads are marked as properly paired. This is caused by aligning reads to many very short references (more specifically, consensus loci produced by Stacks denovo_map), which can result in the overlapping of reads, their placement on different consensus loci/'chromosomes' , etc. I would like to filter for primary and secondary alignments, respectively. Primary alignments will later be used as input for freebayes. However, I am struggling to find reliable information on valid filtering approaches.

For instance, here I am told that the filtering for paired-end data can only be done by samtools view -F 0x4 and not by samtools view -F 0x904, as the latter cannot be used on paired-end data. However, this tutorial on filtering-paired end reads uses a very similar approach to the criticized one by filtering for primary alignments using samtools view -F 0x104.

After the thorough inspection of sam flags, I would have used samtools view -F 0x904 to obtain uniquely mapped primary reads (to be used as input for freebayes) and samtools view -f 0x100 -F 0x4 for obtaining secondary reads. Both commands feature -F 0x4, as selecting for secondary/supplementary alignments is only valid if this flag is not set ("If 0x4 is set, no assumptions can be made about RNAME, POS, CIGAR, MAPQ, and bits 0x2, 0x100, and 0x800"; source). Unfortunately, filtering flags are seldom reported in publications ...

Given the combination of paired-end sequencing data with no properly paired reads in the alignment, what is the proper approach for obtaining high-quality primary reads (to be used in freebayes) and secondary ones (to be visualized)?

Thanks

Marcel

freebayes samtools samflags sam paired-end • 486 views
ADD COMMENT
0
Entering edit mode
6 months ago
aw7 ▴ 270

I think you are doing the right thing. I wrote a brief explainer on the flags here (which may or may not be helpful).

The tutorial you mentioned that uses 0x4 (UNMAP) is only concerned with filtering out the unmapped reads and is quite happy to use the secondary and supplementary reads.

Since you have no supplementary reads you could just filter on 0x104 (UNMAP, SECONDARY) but there is no significant computational cost of filtering with 0x904 (UNMAP, SECONDARY, SUPPLEMENTARY) instead.

ADD COMMENT

Login before adding your answer.

Traffic: 1611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6