fastqc duplicate reads
1
0
Entering edit mode
4.1 years ago
evelyn ▴ 230

Hello everyone,

I have done fastqc on paired end fastq files for RNA. I have got an error for duplicate reads as most of the files failed for this criteria. I want filter duplicate reads in these fastq files and blast them to see what they actually are. I have seen some posts discussing about duplicate reads but I am looking for a way to filter out the duplicate reads from fastq files.

I have got another warning for some of the overrepresented sequences. Is there a way to deal with such sequences? There was no error for adapters but I am still removing the adapter sequences as I ran fastqc before doing that. thank you for the help!

sequencing • 1.4k views
ADD COMMENT
0
Entering edit mode

Both is normal for RNA-seq. I suggest you proceed with your analysis as usual.

ADD REPLY
0
Entering edit mode

I see. Thank you! Is there still any way to just filter duplicates from fastq. I just want to look at these.

ADD REPLY
0
Entering edit mode

Look at them in what way? It is not recommended to remove them in RNA-seq.

ADD REPLY
0
Entering edit mode

I want to see if they are primer dimers or just highly expressed. I won't remove them in either case.

ADD REPLY
0
Entering edit mode

No point in keeping them if they are primer dimers. They are going to waste CPU cycles.

You may also want to read this: https://sequencing.qcfail.com/articles/libraries-can-contain-technical-duplication/

ADD REPLY
0
Entering edit mode
4.1 years ago

The easiest way would probably be to align the reads to the genome and pick up the duplicates based on the identical mapping coordinates.

It's probably rRNA, which might not be in your reference genome though.

ADD COMMENT
0
Entering edit mode

I agree it might not be in the reference genome. That's why I just want to use fastq files instead.

ADD REPLY

Login before adding your answer.

Traffic: 1730 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6