Using whole exome data from different protocols
6.9 years ago
haiying.kong ▴ 360

We are doing whole exome sequencing for our samples to identify somatic mutation for a phenotype of our interest.

We have used Agilent SureSelect Human All Exon V5+UTRs for some of our samples, and am planning to use Agilent SureSelect Human All Exon V6+UTR for the rest of our samples because this protocol requires less DNA.

One of our collaborators is planning to use Agilent SureSelect Human All Exon V4+UTRs because of their specific contract.

Question: Would it be fine to combine sequence data from different protocols for our analysis?

6.9 years ago

Depends on the difference between V4, V5, and V6. If the sample prep is the same, and for each version of your SureSelect you have the same sample types, then theoretically you should be able to estimate variance between sure select versions, and therefore use an additive additive model design in DESeq2 or Sleuth. As an exploratory measure, I'd align all your samples, count using ht_seq_count (providing my assumptions are true for your case), and run them through DESeq2 to look at a PCA of the samples.

I have similer question regarding the kits used for sequencing. I'm in a process of calculating the allele frequency of a group of samples, I have the VCF files however it was produced using different versions such as Agilent SureSelect Human All Exon V4 , V5 and V6. My question is, for calculating the AF, can I do my calculations with all files regardless of the version? or I should separate them? Thank you.