Question: Artifactual mutations in CCLE data
4.7 years ago by
United States
monukmr9860 wrote:

Hi there,

I am confused whether some of the mutations reported in some big study like CCLE are artifactual mutations. I have been asked this question many a times, but don't understand what artifactual mutations are ?

Are artifactual mutations related to:

platform related - eg. illumina, 454 etc

protocol related - eg. library preparation for rna-seq or exome-seq purposes

variant calling algorithm related - eg GATK , SAMTOOLS mpileup

or something else what I mite be missing

Please shed some light on this topic.



ADD COMMENTlink written 4.7 years ago by monukmr9860

all of the above, plus  a number of them are germline variants which weren't identified since no non-transformed sample is available for the vast majority of cell lines

ADD REPLYlink written 4.7 years ago by russhh5.2k

Thanks russ

In their preferred dataset, CCLE people have mentioned that the following variants were filtered out:

  • common polymorphisms,
  • allelic fraction < 10%,
  • putative neutral variants (missenses present in less than 2 warm-blooded vertebrates),
  • located outside of the CDS for all transcripts.

It means, that germline variants have not been filtered out.

Also, I want to know whether is it possible to filter out germ line variants from cell line samples or only patient samples are eligible for that ? If yes, then what are the software or pipelines available ?


ADD REPLYlink written 4.7 years ago by monukmr9860

to be honest, I don't think it would be possible to predict the germline variants for cell lines that don't have a comparator. It's hard enough in tumours, where untransformed cells are present, to determine the somatic variants from the germline variants unless you have some blood DNA or somehting

ADD REPLYlink written 4.7 years ago by russhh5.2k
