Question: Large number of overlapped genes/transcripts reported by VEP
gravatar for newbio17
2.6 years ago by
newbio17320 wrote:

I'm currently in the process of analyzing whole-exome and RNA sequencing data on a cancer cell line and attempting to see how many genes consists of deleterious mutations.

I have performed quality control, alignment/mapping (BWA for WES and STAR for RNA-Seq), and variant calling (VarScan).

The VCF file returned was given as a input to ENSEMBL's Variant Effect Predictor (VEP), and I plan to filtering the output so that it consists of SNPs annotated as deleterious.

I quickly examined the HTML file containing statistics (default output provided by VEP), and noticed that there were large number of overlapped genes/transcripts reported by the tool.

Should I be concerned with such large numbers? Is there something I am missing or should be looking out for? Any input would be greatly appreciated.

Thank you.

snp rna-seq vep • 1.0k views
ADD COMMENTlink written 2.6 years ago by newbio17320

Hello newbio17,

what do you mean by "large numbers" and why do you worry about this? If I'm doing WES and RNA sequencing I would expect that (nearly) all my variants overlap a transcript of a gene.

Furthermore AFAIK VEP reports for every transcript that overlaps the variant. One gene can have multiple transcripts.

fin swimmer

ADD REPLYlink written 2.6 years ago by finswimmer14k

Hi finswimmer,

Thank you for your input.

It's my first time working with WES and RNA-Seq data so everything is new to me. As a reference, below are the statistics VEP reported for the run. To clarify, it seemed to me that the number reported for overlapped genes with respect to number of variants processed was a little high.

General statistics

  • Lines of input read: 27714
  • Variants processed: 26392
  • Variants filtered out: 0
  • Novel / existing variants: 0 (0.0) / 26392 (100.0)
  • Overlapped genes: 9564
  • Overlapped transcripts: 9603
ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by newbio17320

Honestly, I'm surprised that you sequenced a whole exome and only identified variants in 9564 genes. Given the frequency of variants in any individual, I would have thought you'd have variants in every gene.

ADD REPLYlink written 2.6 years ago by Emily_Ensembl21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1965 users visited in the last hour