Question

Counting number of duplicated reads for each gene or exon

0

Entering edit mode

6.8 years ago

arta ▴ 670

Hey,

I am analyzing RNA-seq data and I am interested in duplicated reads. I know I can count number of duplicated reads overall using picard markDuplicates.

samtools view -f 1024 dedup_reads.bam | wc -l

But i am interesting in the distribution of these duplicated. I have both sam and bam files.

Here is simple what i would to have

               # total reads              #duplicated reads          
   Gene1         30                                 10
   Gene2        100                                20
   Gene3         20                                 0

I googled but i couldn't find any tools, is there any tools, softwares or packages? Or should I implement myself.

RNA-Seq • 2.2k views

ADD COMMENT • link updated 6.8 years ago by Biostar 20 • written 6.8 years ago by arta ▴ 670

score 3 · Accepted Answer · 2017-06-30

3

Entering edit mode

6.8 years ago

Pierre Lindenbaum 161k

awk '{printf("%s:%d-%d\n",$1,int($2)+1,$3);}' input.bed  | while read F; 
do echo -n "$F " && samtools view -F 4 -c  input.bam $F | tr -d '\n' && echo -n " " &&  samtools view -f 1024 -F 4 -c  input.bam $F
done

updated : oopss added $F after each "samtools view"

ADD COMMENT • link 6.8 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Great, thanks Pierre !!!

ADD REPLY • link 6.8 years ago by arta ▴ 670