How to generate coverage counts for duplicate reads
0
0
Entering edit mode
5.9 years ago
axolotl • 0

Greetings,

I have an RNA-seq dataset (HiSeq 3000, poly-A selected, 100 bp PE reads) with very high duplication, and I want to create coverage plots of the duplicates to see if they are evenly distributed or represent only a few genes. After aligning to the genome with HISAT2, I've flagged the duplicates with Picard and used Samtools depth to get the counts (of non-duplicates). I then plot them in R.

My problem is that I cannot get counts for just the duplicate reads. Here is what I've tried so far:

Extract duplicates from Picard MarkDuplicates output.

samtools view -h -f 0x400 input_markdups.bam > dups.sam

Calculate counts. Produces empty file with no errors.

samtools depth dups.sam > dups_depth.txt

Thinking that depth may be ignoring the duplicates (i.e. all reads), I removed the flags.

picard RevertSam I=dups.sam O=dups_revsam.sam REMOVE_DUPLICATE_INFORMATION=true

Once again, samtools depth runs for a minute or two and then ends with a blank file and no errors.

What am I missing? Is there an easier way to get counts for duplicates? Any help will be greatly appreciated.

Note: This is my first post here, so please let me know if I have deviated from standard procedures. Thank you!

RNA-Seq • 1.9k views
ADD COMMENT

Login before adding your answer.

Traffic: 2640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6