Does the order of SplitNCigarReads and MarkDuplicates affect RNA-seq variant calling results?
1
0
Entering edit mode
12 weeks ago
iamsmor • 0

Hi all,

I’m working on a human RNA-seq variant calling pipeline using GATK (v4.3), and I recently realized that I may have swapped two key steps in the preprocessing stage. Here's what I did:

Alignment with HISAT2

Conversion to sorted BAM

Step 1: SplitNCigarReads

Step 2: MarkDuplicates (Picard)

Then followed with BQSR, HaplotypeCaller, and filtering

However, I now see that several GATK tutorials and forums suggest doing MarkDuplicates before SplitNCigarReads. I’m concerned whether my current pipeline (with the reverse order) may lead to incorrect or biased variant calls.

Would this have a significant impact on the results (e.g., duplicate marking failing, false positives, coverage distortion, etc.)?

Has anyone compared results from both orderings or found issues when SplitNCigarReads comes first?

Thanks in advance for your insights!

variantcalling. rnaseq gatk • 546 views
ADD COMMENT
0
Entering edit mode
12 weeks ago
rfran010 ★ 1.6k

I have not compared results as you suggest, but logically, there is a functional difference. Whether this has a great effect depends on the nature of your data.

Mark Duplicates generally works by marking reads with the same sequence and start position. and SplitNCigarReads splits one read into multiple reads. This could in theory affect duplicate marking, for one example, if you have two reads that start at two different positions (not duplicates), but after splitting the split reads now map to the same position with the same sequence, they may be marked duplicate, even though they probably are not.

Rough example:

Before splitting (not duplicates)
readA: ----ATGCGNNNNNNNNNNNNNNATTCGCGGGC
readB: CTAGATGCGNNNNNNNNNNNNNNATTCGCGGGC

After splitting (read C&D look like duplicates)
readA: ----ATGCG    readC: ATTCGCGGGC
readB: CTAGATGCG    readD: ATTCGCGGGC
ADD COMMENT

Login before adding your answer.

Traffic: 3592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6