Flagstat vs stats Samtools
2
0
Entering edit mode
4.3 years ago

Hello, I have a bam file and the output of flagstat is

**2081062 + 0 in total (QC-passed reads + QC-failed reads)**
**1434239 + 0 secondary**
13883 + 0 supplementary
0 + 0 duplicates
2057380 + 0 mapped (98.86% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

stats:

**raw total sequences:  632940**
filtered sequences: 0
sequences:  632940
is sorted:  0
1st fragments:  632940
last fragments: 0
reads mapped:   609258
reads mapped and paired:    0   # paired-end technology bit set + both mates mapped
reads unmapped: 23682
reads properly paired:  0   # proper-pair bit set
reads paired:   0   # paired-end technology bit set
reads duplicated:   0   # PCR or optical duplicate bit set
reads MQ0:  246308  # mapped and MQ=0
reads QC failed:    0
**non-primary alignments:   1434239**
total length:   580845355   # ignores clipping
total first fragment length:    580845355   # ignores clipping
total last fragment length: 0   # ignores clipping
bases mapped:   569218429   # ignores clipping
bases mapped (cigar):   536472283   # more accurate
bases trimmed:  0
bases duplicated:   0
mismatches: 71417634    # from NM fields
error rate: 1.331245e-01    # mismatches / bases mapped (cigar)
average length: 917
average first fragment length:  918
average last fragment length:   0
maximum length: 18968
maximum first fragment length:  0
maximum last fragment length:   0
average quality:    16.3
insert size average:    0.0
insert size standard deviation: 0.0
inward oriented pairs:  0
outward oriented pairs: 0
pairs with other orientation:   0
pairs on different chromosomes: 0
percentage of properly paired reads (%):    0.0

Really I have 632940 reads. I don't understand the number of 2 millions in flagstat, how could it be so? I'd like to find the reads with secondary alignment. Firstly, I'd like to know how many reads I have with secondary alignment.

Thank you a lot.

P.s I've read about 2081062 - 1434239 - 13883 = 632940. But it didn't help me with understanding the number 2081062. Also idxstats sorted.bam | awk '{s+=$3+$4} END {print s}' gives me 2081062. Alignment with human transcriptome.

rna-seq samtools • 4.8k views
ADD COMMENT
6
Entering edit mode
4.3 years ago

2081062 - 1434239 - 13883 = 632940, since you have a bunch of secondary and supplementary alignments. The total number of reads you started with is equal to the number of primary alignments (i.e., non-secondary and non-supplementary) alignments in the file.

ADD COMMENT
0
Entering edit mode

2081062 - 1434239 - 13883 = 632940 are these numbers for # of alignments but not reads, right? so for 632940 reads I have total 2.081.062 alignments 1.430.239 alignments secondary 13.883 supp 632.940 primary alignments right? if I'd like to have only these 632.940 primary alignments, should i do samtools view -F 260? but it gives 623.141 -F 256 646.823

how could I check how many reads from 632.940 have the unique mappings and how many multi mappings?

in description of idxstats it is written about #reads, but not alignments.

ADD REPLY
0
Entering edit mode

If you want the primary entries, then -F 2304 is what you want. Remember that reads don't need to map to have a primary entry (we normally call these "alignments", regardless of whether the entry is actually aligned, which is why idxstats calls these reads).

ADD REPLY
0
Entering edit mode

Thank you for your explanation, seems that at last I understood. I have 2 mln entries /lines in the bam file and 632 thousands are for primary alignment, 1,4 mln for secondary and 13 thousands for supplementary. -F 2304 gives me 623141. so 9799(632.940 primary alignments) entries are out of this subset. What category are they?

ADD REPLY
0
Entering edit mode

I'm not sure how you got that, according to both samtools flagstat and samtools idxstats you shouldn't have. samtools flagstat is literally just going through the entries and counting things up, so it will never be wrong.

ADD REPLY

Login before adding your answer.

Traffic: 1923 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6