Question: Where to find PCR duplicate reads in bam file?
0
gravatar for bioinforesearchquestions
2.3 years ago by
United States
bioinforesearchquestions160 wrote:

Dear All,

I am trying to find the PCR duplicate reads from the bam/sam. If we use picard we can mark duplicates in the bam/sam file.

How to see the marked duplicates in the bam file. I tried checking the sam flag "1024" which decodes to "read is PCR or optical duplicate".

$ samtools flagstat Sample_WES01/WES01.clean.dedup.recal.bam

71753231 + 0 in total (QC-passed reads + QC-failed reads)

12384215 + 0 duplicates

71185962 + 0 mapped (99.21%:-nan%)

71753231 + 0 paired in sequencing

35881695 + 0 read1

35871536 + 0 read2

70159253 + 0 properly paired (97.78%:-nan%)

70654767 + 0 with itself and mate mapped

531195 + 0 singletons (0.74%:-nan%)

425872 + 0 with mate mapped to a different car

287474 + 0 with mate mapped to a different chr (mapQ>=5)

$

I extracted the flag column from bam file and tried grep'ing "1024". I couldn't see any matches.

Will I be able to see duplicate reads in IGV?

bowtie2 bwa bam dnaseq • 3.9k views
ADD COMMENTlink modified 2.3 years ago by Pierre Lindenbaum112k • written 2.3 years ago by bioinforesearchquestions160
6
gravatar for Pierre Lindenbaum
2.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum112k wrote:

How to see the marked duplicates in the bam file. I tried checking the sam flag "1024" which decodes to "read is PCR or optical duplicate".

$ samtools flagstat Sample_WES01/WES01.clean.dedup.recal.bam

here:

12384215 ( (QC-passed reads) + 0 duplicates (QC-failed reads)

I extracted the flag column from bam file and tried grep'ing "1024". I couldn't see any matches.

because i'ts a bit field https://en.wikipedia.org/wiki/Bit_field

you can get those reads using the option -f (required flag) of samtools view

samtools view -f 1024 in.bam

Will I be able to see duplicate reads in IGV?

There is an option in the IGV preferences to show the dup reads.:

"Filter duplicate reads: Clear to display alignments marked as duplicate reads. In DNA-Seq alignments these PCR or optical duplicates are often marked and filtered. In RNA-Seq alignments considerations differ."

http://www.broadinstitute.org/igv/Preferences

ADD COMMENTlink written 2.3 years ago by Pierre Lindenbaum112k

Awesome Pierre :).

I tried "samtools view -f 1024 in.bam". Then I extracted the unique flag from the bam. I got the following 16 flags.

"1089,1097,1105,1107,1121,1123,1137,1145,1153,1161,1169,1171,1185,1187,1201,1209"

I checked all the above flags and they are tagged to "read is PCR or optical duplicate" in addition to other property.

HISEQ:137:C6W39ACXX:7:1314:15234:3404 1123 chrM 1 15 57S44M = 41 141 TCAGGGCCATAAAG HISEQ:137:C6W39ACXX:7:1314:15234:3404 1171 chrM 41 60 101M = 1 -141 CTCCATGCATTTGGT HISEQ:137:C6W39ACXX:7:2102:20584:10431 1187 chrM 1 60 11S90M = 112 212 ACATCACGATGGATCA HISEQ:137:C6W39ACXX:7:1113:11949:62990 1209 chrM 10 60 101M = 10 0 TCTATCACCCTATTAAC HISEQ:137:C6W39ACXX:7:2311:11970:3501 1169 chrM 15 60 101M = 16193 16079 CACCCTATTAACCAC

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by bioinforesearchquestions160
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1580 users visited in the last hour