Question: Getting Confused With The Flagstat After Pcr Duplicates Removed
1
gravatar for KJ Lim
7.3 years ago by
KJ Lim120
KJ Lim120 wrote:

Good day.

I encountered a situation like below:

The flagstat before PCR duplicates removed from paired end mapped reads.

:::::::::::::: 
0H.flagstat.txt
::::::::::::::
173146136 + 0 in total 
0 + 0 duplicates
130510023 + 0 mapped (75.38%:nan%)
173146136 + 0 paired in sequencing
86573068 + 0 read1  <--
86573068 + 0 read2  <--
87873910 + 0 properly paired (50.75%:nan%)
87873910 + 0 with itself and mate mapped
42636113 + 0 singletons (24.62%:nan%)

The flagstat information after PCR duplicates removed with Picard MarkDuplicates tool from paired end mapped reads.

::::::::::::::
0H.ptFlagstat.txt
::::::::::::::
49080460 + 0 in total 
0 + 0 duplicates
6444347 + 0 mapped (13.13%:nan%)
49080460 + 0 paired in sequencing
45547041 + 0 read1  <--
3533419 + 0 read2   <--
5822436 + 0 properly paired (11.86%:nan%)
5822436 + 0 with itself and mate mapped
621911 + 0 singletons (1.27%:nan%)

The number mapped of read1 and read2 is different after the PCR duplicates were removed. Anyone here has the same situation?

I'm confused with these "paired in sequencing" and "properly paired" phrases, could anyone kindly please share with me your thoughts. The number shown for these two phrases are different.

duplicates picard sam bam pcr • 2.2k views
ADD COMMENTlink modified 7.3 years ago by swbarnes26.2k • written 7.3 years ago by KJ Lim120
1
gravatar for Mikael Huss
7.3 years ago by
Mikael Huss4.6k
Stockholm
Mikael Huss4.6k wrote:

Your results do look a bit strange ... as far as I know, the "read1" plus the "read2" value should always equal the "mapped" value. For you, the sum is equal to the "paired in sequencing" value instead. By the way, the read1 and read2 values do not need to be equal, in fact I have never seen it before. (Usually there are never exactly the same number of read1:s aligning as read2:s.)

"Paired in sequencing" is the number of paired reads among the total reads (usually equal to this number, although you could in principle have a mix of paired-end and single-end reads in a BAM/SAM file). "Properly paired" is the number of alignments where the "properly paired" SAM flag is set. This is done by the aligner, so it depends on the aligner how that is defined. Generally, it means that read 1 and read 2 align within some maximum distance of each other and in the correct orientation (if applicable).

ADD COMMENTlink written 7.3 years ago by Mikael Huss4.6k

Thanks Mikael for the explanation.

I mapped the SOLiD csfasta reads against pseuodogenome (a collection of EST sequences of the Genus) as there is no complete genome available. It is a non-model plant species. I used SHRiMP2 to carry out the mapping task with --half-paired option on (default is on as of v2.2.0).

ADD REPLYlink modified 7.3 years ago • written 7.3 years ago by KJ Lim120
0
gravatar for swbarnes2
7.3 years ago by
swbarnes26.2k
United States
swbarnes26.2k wrote:

I'm not all that clear on what MARKDuplicates does with reads and read pairs where one or both ends don't map.

Maybe if Read 2 mapped much better than Read 1, maybe that's why MarkDuplicates took away so much more of it, and your read 1 data is full of unmapped reads that MarkDuplicates left alone.

You can use samtools view to disect how many read 1 and read 2's are properly paired versus just mapped versus unmapped.

ADD COMMENTlink written 7.3 years ago by swbarnes26.2k

Thanks swbarnes2 for your answer.

Could you kindly please elaborate more about : "You can use samtools view to disect how many read 1 and read 2's are properly paired versus just mapped versus unmapped". Thanks.

I'm still in learning process to master the Samtools.

ADD REPLYlink written 7.3 years ago by KJ Lim120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1085 users visited in the last hour