Question

fastqc duplicate reads

0

Entering edit mode

4.1 years ago

evelyn ▴ 230

Hello everyone,

I have done fastqc on paired end fastq files for RNA. I have got an error for duplicate reads as most of the files failed for this criteria. I want filter duplicate reads in these fastq files and blast them to see what they actually are. I have seen some posts discussing about duplicate reads but I am looking for a way to filter out the duplicate reads from fastq files.

I have got another warning for some of the overrepresented sequences. Is there a way to deal with such sequences? There was no error for adapters but I am still removing the adapter sequences as I ran fastqc before doing that. thank you for the help!

sequencing • 1.4k views

ADD COMMENT • link updated 4.1 years ago by WouterDeCoster 47k • written 4.1 years ago by evelyn ▴ 230

0

Entering edit mode

Both is normal for RNA-seq. I suggest you proceed with your analysis as usual.

ADD REPLY • link 4.1 years ago by ATpoint 82k

0

Entering edit mode

I see. Thank you! Is there still any way to just filter duplicates from fastq. I just want to look at these.

ADD REPLY • link 4.1 years ago by evelyn ▴ 230

0

Entering edit mode

Look at them in what way? It is not recommended to remove them in RNA-seq.

ADD REPLY • link 4.1 years ago by ATpoint 82k

0

Entering edit mode

I want to see if they are primer dimers or just highly expressed. I won't remove them in either case.

ADD REPLY • link 4.1 years ago by evelyn ▴ 230

0

Entering edit mode

No point in keeping them if they are primer dimers. They are going to waste CPU cycles.

You may also want to read this: https://sequencing.qcfail.com/articles/libraries-can-contain-technical-duplication/

ADD REPLY • link 4.1 years ago by GenoMax 142k

score 0 · Answer 1 · 2020-03-24

0

Entering edit mode

4.1 years ago

WouterDeCoster 47k

The easiest way would probably be to align the reads to the genome and pick up the duplicates based on the identical mapping coordinates.

It's probably rRNA, which might not be in your reference genome though.

ADD COMMENT • link 4.1 years ago by WouterDeCoster 47k

0

Entering edit mode

I agree it might not be in the reference genome. That's why I just want to use fastq files instead.

ADD REPLY • link 4.1 years ago by evelyn ▴ 230