Hello everyone,
I have done fastqc
on paired end fastq files for RNA. I have got an error for duplicate reads as most of the files failed for this criteria. I want filter duplicate reads in these fastq files and blast them to see what they actually are. I have seen some posts discussing about duplicate reads but I am looking for a way to filter out the duplicate reads from fastq files.
I have got another warning for some of the overrepresented sequences. Is there a way to deal with such sequences? There was no error for adapters but I am still removing the adapter sequences as I ran fastqc before doing that. thank you for the help!
Both is normal for RNA-seq. I suggest you proceed with your analysis as usual.
I see. Thank you! Is there still any way to just filter duplicates from fastq. I just want to look at these.
Look at them in what way? It is not recommended to remove them in RNA-seq.
I want to see if they are primer dimers or just highly expressed. I won't remove them in either case.
No point in keeping them if they are primer dimers. They are going to waste CPU cycles.
You may also want to read this: https://sequencing.qcfail.com/articles/libraries-can-contain-technical-duplication/