Question

PCR duplicates in FFPE RNASeq

0

Entering edit mode

22 months ago

Gama313 ▴ 120

Dear all,

I am working on 100 RNASeq data generated with a stranded protocol and a Novaseq run.

I need to perform variant calling on these samples, however I am facing some problem.

I have not access to DNA so exome/targeted amplification is not possible.

For variant calling ,it is usually suggested a first step of marking duplicates, which I performed with picard MarkDuplicates (considering both lanes and distance for optical duplicates).

Said that, I think that the duplicates recognition could be affected by sample degradation. In particular, I suspect that FFPE degradation could limit the RNA regions amplified 'falsly' resulting in higher PCR duplicates. Is this assumption correct?

Moreover, I am wondering whether the duplication rate of a particular gene could be used as a metric to give more/less confidence to specific variants.

Regards

PCR-duplicates RNA-Seq FFPE Variant-Calling • 701 views

ADD COMMENT • link updated 29 days ago by Ram 43k • written 22 months ago by Gama313 ▴ 120

0

Entering edit mode

Said that, I think that the duplicates recognition could be affected by sample degradation. In particular, I suspect that FFPE degradation could limit the RNA regions amplified 'falsly' resulting in higher PCR duplicates. Is this assumption correct?

That is certainly logical. Not much you can do about that.

I am wondering whether the duplication rate of a particular gene could be used as a metric to give more/less confidence to specific variants.

I doubt that. There is no specific reason why a particular gene can be used as a control.

You probably have no choice in the matter but consider limitations noted in Kevin's answer here : Inferring genotype based on RNA sequences (RNA-seq variant calling)

ADD REPLY • link 22 months ago by GenoMax 141k

0

Entering edit mode

I forget to add that I called from NON-markduplicated reads since I think that I can calculate metrics (Mann-Whitney rank sum test) to discriminate systematic errors (e.i. drops in phred etc.). In this sense, I would use the duplication level of the specific gene to add more information to hard-filter variants.

For example: seen 30 times on a gene with a duplication level = 30% is more reliable respect to a variable seen 30 times on a gene with a duplication level = 70%. Does it makes sente?

Thanks again for you willingness

ADD REPLY • link 22 months ago by Gama313 ▴ 120

0

Entering edit mode

It may come down to what you want to do with the results and their ultimate application. As you are well aware FFPE samples are compromised because of the nature of the input material so any conclusions may need to be independently confirmed at a minimum. I am not a statistician so can't comment on your approach. Hopefully someone else will.

ADD REPLY • link 22 months ago by GenoMax 141k