How To Extract And Quantify Duplicated Rna-Seq Reads?
1
1
Entering edit mode
11.2 years ago
Pals ★ 1.3k

I am working on a RNA-Seq data. The Fastqc shows that data has duplication level of >65%. Can anyone tell me how I can extract the duplicated reads and quantify them? I am also interested to rank those duplicated reads.

Thanks!

rna-seq • 3.5k views
ADD COMMENT
0
Entering edit mode

Why do you want to extract and quantify them? They are already quantified by FastQC right? Are you worried about the big number? Then you should Duplicated Reads In Rna-Seq Experiment where it is explained that in RNA-seq many of the reads come from ribosomal RNA and highly expressed genes resulting in a high duplication level in your reads

ADD REPLY
0
Entering edit mode

just curious to see the actual reads that are duplicated thousands of times.

ADD REPLY
1
Entering edit mode

gunzip -dc yourFastq.gz | awk '{if(NR%4==2)print $0}' | sort | uniq -c

ADD REPLY
0
Entering edit mode

Thank you very much Irsan. I had to slightly modify your trick awk '{if((NR-2)%4==0)print $0}' :-)

ADD REPLY
1
Entering edit mode

Cool, I changed the comment

ADD REPLY
0
Entering edit mode

Unfortunately, sorting did not go well. I want the most repetitive sequence ranked, where this command sorts alphabetically.

ADD REPLY
1
Entering edit mode

print the reads and then | sort | uniq -c

ADD REPLY
1
Entering edit mode

Thank again, finally its done. gunzip -dc yourFastq.gz | awk '{if(NR%4==2)print $1}' | sort | uniq -c | sort -g

ADD REPLY
0
Entering edit mode

glad you go it working.

ADD REPLY
5
Entering edit mode
11.2 years ago

After marking duplicates (with your favorite program) you can use the samtools flag to pull out the reads you want.

Just set the include flag: samtools view -f 0x400:

0x400 PCR or optical duplicate

ADD COMMENT

Login before adding your answer.

Traffic: 2433 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6