Gene name with its mutations number
1
0
Entering edit mode
5.6 years ago
LimMo ▴ 30

Hello all,

I have many VCF files, I need ant tool that can produce/annotate the files and give me the gene names and the number of mutations within each gene in that VCF files.

Any ideas or suggestions will be appreciated.

mutations gene • 1.7k views
ADD COMMENT
0
Entering edit mode

And googling "annotation vcf" did not bring up any tools?

ADD REPLY
0
Entering edit mode

Yes, I tried some of them, but the main problem that they can't provide the number of mutations per gene.

ADD REPLY
0
Entering edit mode

You can annotate with all of those, and then you just need some unix magic to count the genes, finally piping to uniq -c. Something like this:

cat annotated.vcf | cut -f8 | tr ';' '\n' | grep ^GENE=  | cut -f2 -d'=' | sort | uniq -c

I guess just the grep has to be adapted to suit the annotation format you got.

ADD REPLY
0
Entering edit mode

Have you googled VCF annotation? This then suggests snpeff, annovar, vcfanno, and VEP from Ensembl. I suggest you use the latter. It is quiet powerful.

ADD REPLY
0
Entering edit mode

I tried snpeff, annovar and VEP. They all can provide/annotate the VCF files and gave me the gene names but they don't produce the number of mutations per gene.

ADD REPLY
0
Entering edit mode

None of them does AFAIK. Probably because "gene" is quiet a flexible term. Depending on the goal, this can be only the exons, only the coding exons, both introns and exons etc. You'll need to do some custom intersections with the coordinates that are of interest for you. Check out bedtools intersect.

ADD REPLY
0
Entering edit mode

ok I understand you, I checked bedtools intersect quickly, it will help as you said "if I'm interested in a specific region like exons and etc." i.e, help to process the file, but still the question is how can I get the number of mutations within that region?

ADD REPLY
1
Entering edit mode

Given that you have a file with the start and end of your genes (genes.bed) and your VCF files:

## Get the number of mutations across all VCFs (assuming both genes.bed and VCFs are sorted by chr and start):
bedtools intersect -a genes.bed -b *.vcf -sorted -c > mutationCountPerGene.bed
ADD REPLY
0
Entering edit mode

I tried what you proposed, but I got this error:

Error: Type checker found wrong number of fields while tokenizing data line.

ADD REPLY
0
Entering edit mode

Please give examples of all used files. The error indicates malformatted files AFAIK. Do the VCFs have headers?

ADD REPLY
1
Entering edit mode
5.6 years ago
$ vcf2bed < variants.vcf > variants.bed
$ bedmap --echo --count --echo-map-id genes.bed variants.bed > answer.bed

The file answer.bed will contain genes and the count and IDs of variants over each gene.

ADD COMMENT

Login before adding your answer.

Traffic: 2735 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6