Question: Process the VCF files for a huge number of samples
0
gravatar for wangdp123
9 months ago by
wangdp123140
Oxford
wangdp123140 wrote:

Dear Colleague,

I am processing the VCF files from thousands of samples from exome-sequencing.

I would like to take the three steps as follows:

  1. Identify the homozygous SNPs.
  2. Identify the minor allele SNPs.
  3. identify the SNP/InDel that have deleterious effects on protein function such as nonsynonymous mutations, frameshift, indels et. al. Ideally to find out all the SNPs and InDels that cause the change of protein sequences.

Is there any well-established software or tool or guideline to achieve the three goals one by one?

Many thanks,

Regards,

Tom

snp exome-seq indel vcf • 305 views
ADD COMMENTlink modified 9 months ago • written 9 months ago by wangdp123140

Identify the homozygous SNPs.

what does it mean ? in a x-samples VCF , the VCF contains y-variants. Each variant contains 'x' genotypes. Some genotypes are homozygous.

Identify the minor allele SNPs.

usually use the AF field in the INFO column.

identify the SNP/InDel

search for SNPEFF, VEP, ANNOVAR in biostars.org ....

ADD REPLYlink written 9 months ago by Pierre Lindenbaum115k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 939 users visited in the last hour