Question

Benchmarking RNASeq Variant Calling Pipeline (Short Reads)

0

Entering edit mode

10 weeks ago

Esraa • 0

Hello, I am currently working on optimizng a variant calling pipeline for short read RNA-Seq data, and i have been searching for any Gold Standard benchmarking datasets for the pipeline that has the VCF results provided and could not find any.

I know GIAB project provides Google-Illumina short read RNA-Seq datasets, but there is no curated VCF for the data that i can compare my final results with, so if anyone has an idea of what i can do it would be really helpful.

Thank you all in advance.

rna-seq vcf variant-calling • 460 views

ADD COMMENT • link 10 weeks ago by Esraa • 0

2

Entering edit mode

As long as it is the same GIAB sample you could compare your SNP with the SNP's available for the whole genome set.

ADD REPLY • link 10 weeks ago by GenoMax 144k

0

Entering edit mode

Thank you so much for answering! I actually found some studies doing it the way you mentioned.

I ran the GATK best practices pipeline on the RNA-Seq reads and compared it to the high confidence variants using hap.py, but the results do not make sense as it gave F1 Scores of about 0.04, which indicates i am doing something wrong in my analysis.

I tried every troubleshoot i could think of like checking my references, tools parameters, etc.., but could not grasp the cause of the problem, do you have any idea of what i could be doing wrong?

ADD REPLY • link 10 weeks ago by Esraa • 0

score 0 · Answer 1 · 2024-05-13

0

Entering edit mode

10 weeks ago

lagartija ▴ 160

I don't know but another way of doing it would be to combine different datasets of different strains that you know are clonal. Then you know where the true variants should be based on the alignments of their genomes.

ADD COMMENT • link 10 weeks ago by lagartija ▴ 160

0

Entering edit mode

Thank you! I will try searching for this more and see if it would fit my analysis purposes.

ADD REPLY • link 10 weeks ago by Esraa • 0