Extracting variations in the gene regions and from 100 bp of gene boundary from multiple VCF files
1
0
Entering edit mode
9 weeks ago
VenGeno ▴ 60

Hi,

I sincerely hope that I am not repeating an already answered question. I couldn't find the answer to my exact problem.

I have three VCF files derived using bcftools (isec). Those three files contain similar variations compared to the reference sequence. End of the day, I have

  • Three VCF files representing three varieties (include only the common variations)
  • Reference FASTA file
  • Annotation (gff3) file for reference.

What I want to do is extract variations found in;

  1. Gene region
  2. 100 bp from TSS/+1 and the stop codon

Please note this is a 5 MB region (not a whole-genome, so there are no chromosomes).

I appreciate it if someone can help me in this regard. Thank you!

VCF Variations • 192 views
ADD COMMENT
0
Entering edit mode
9 weeks ago
Tm ★ 1.0k

You can try using variant annotation tools like snpeff. It will add gene-related information (exonic, intronic, intergenic etc) in your vcf file.

ADD COMMENT

Login before adding your answer.

Traffic: 3373 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6