Question: How to generate coverage counts for duplicate reads
gravatar for axolotl
2.0 years ago by
axolotl0 wrote:


I have an RNA-seq dataset (HiSeq 3000, poly-A selected, 100 bp PE reads) with very high duplication, and I want to create coverage plots of the duplicates to see if they are evenly distributed or represent only a few genes. After aligning to the genome with HISAT2, I've flagged the duplicates with Picard and used Samtools depth to get the counts (of non-duplicates). I then plot them in R.

My problem is that I cannot get counts for just the duplicate reads. Here is what I've tried so far:

Extract duplicates from Picard MarkDuplicates output.

samtools view -h -f 0x400 input_markdups.bam > dups.sam

Calculate counts. Produces empty file with no errors.

samtools depth dups.sam > dups_depth.txt

Thinking that depth may be ignoring the duplicates (i.e. all reads), I removed the flags.

picard RevertSam I=dups.sam O=dups_revsam.sam REMOVE_DUPLICATE_INFORMATION=true

Once again, samtools depth runs for a minute or two and then ends with a blank file and no errors.

What am I missing? Is there an easier way to get counts for duplicates? Any help will be greatly appreciated.

Note: This is my first post here, so please let me know if I have deviated from standard procedures. Thank you!

rna-seq • 917 views
ADD COMMENTlink written 2.0 years ago by axolotl0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 886 users visited in the last hour