What is the stance on optical duplicates in RNASeq?
0
0
Entering edit mode
27 days ago
Davor • 0

Hello, I ran a MarkDuplicates analysis on my STAR output and there are some optical duplicates, which got me reading, and, while there's a lot of discussion, it's difficult to sift out info on RNASeq and optical dups, specifically.

We're interested in duplicates in RNASeq, but optical ones are a technical artifact, but I got the feeling deduplication in any shape usually isn't done at all for RNASeq. I suppose the main deciding factor is the accuracy with which we're able to say whether a duplication is a technical artifact, and not a real duplication (with any tool available at our disposal).

Are my views and impressions correct? Is there a current consensus/best practice opinion on this?

duplicates rnaseq optical-duplicates • 383 views
ADD COMMENT
1
Entering edit mode

but optical ones are a technical artifact

Generally these are applicable only if your data was run on patterned flowcells (which may be the norm of late). As long as the loading was properly optimized the occurance of optical dups should be minimal : https://knowledge.illumina.com/instrumentation/novaseq-x-x-plus/instrumentation-novaseq-x-x-plus-reference_material-list/000008911

clumpify.sh will allow you to identify optical replicates --> Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates.

ADD REPLY
0
Entering edit mode

Thanks, I attempted a Picard MarkDuplicates analysis so far and it did identify some - I posted the report in this post. The machine used did indeed use patterned flowcells. I'll try Clumpify too. Would it make sense to remove them in this specific case?

ADD REPLY

Login before adding your answer.

Traffic: 1668 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6