Question

SNP annotation using gff3

0

Entering edit mode

7.3 years ago

reza ▴ 300

hi everyone

i am trying to annotate of identified SNPs and INDELs (vcf format) using software such as snpEff. i downloaded gff3 file correspond to my reference genome. i have to build database in snpEff because my desired database is not present in snpEff.

problem: scaffold names in gff3 file and VCF file are different and i must to convert scaffold names in gff3 file to scaffolds name in VCF file and then try to building new database in snpEff using name-converted gff3 file. anybody here know how i can do it?

example of differences in scoffold names

name in gff3 * name in VCF

NW_011590949.1 * KN271049.1

NW_011590950.1 * KN271050.1

...

are there alternative ways to annotate new SNPs? it is possible to merge gff and VCF to annotate VCF file?

thanks in advance

SNP sequencing next-gen • 4.5k views

ADD COMMENT • link updated 7.3 years ago by harold.smith.tarheel ★ 4.9k • written 7.3 years ago by reza ▴ 300

0

Entering edit mode

It's weird that you have different scaffolds names taking in to account that it is the same genome. Are you sure that the genome you have used for variant calling is the same genome that it is annotated in the gff3 file? same genome and same version? If the answer is yes, if you have list with gff3 - vcf scaffold name correlation, let's say NW_011590949.1 gff3 scaffold correspond to KN271049.1 vcf scaffold, it shouldn't be so difficult to change the names using a little script or awk command.

ADD REPLY • link 7.3 years ago by iraun 6.2k

score 0 · Answer 1 · 2017-01-15

0

Entering edit mode

7.3 years ago

reza ▴ 300

it is same genome. anybody can help me to do this with awk command?

ADD COMMENT • link 7.3 years ago by reza ▴ 300

0

Entering edit mode

Please use ADD REPLY to answer to earlier comments, as such this thread remains logically structured and easy to follow.

ADD REPLY • link 7.3 years ago by WouterDeCoster 47k

score 0 · Answer 2 · 2017-01-17

Hi

This is an example of what I used. It worked for me.

awk '{gsub(/Ld01_v01s1/, "1"); gsub(/Ld02_v01s1/, "2"); gsub(/Ld03_v01s1/, "3"); gsub(/Ld04_v01s1/, "4"); gsub(/Ld05_v01s1/, "5"); gsub(/Ld06_v01s1/, "6"); gsub(/Ld07_v01s1/, "7"); gsub(/Ld08_v01s1/, "8"); gsub(/Ld09_v01s1/, "9"); gsub(/Ld10_v01s1/, "10"); gsub(/Ld11_v01s1/, "11"); gsub(/Ld12_v01s1/, "12"); gsub(/Ld13_v01s1/, "13"); gsub(/Ld14_v01s1/, "14"); gsub(/Ld15_v01s1/, "15"); gsub(/Ld16_v01s1/, "16"); gsub(/Ld17_v01s1/, "17"); gsub(/Ld18_v01s1/, "18"); gsub(/Ld19_v01s1/, "19"); gsub(/Ld20_v01s1/, "20"); gsub(/Ld21_v01s1/, "21"); gsub(/Ld22_v01s1/, "22"); gsub(/Ld23_v01s1/, "23"); gsub(/Ld24_v01s1/, "24"); gsub(/Ld25_v01s1/, "25"); gsub(/Ld26_v01s1/, "26"); gsub(/Ld27_v01s1/, "27"); gsub(/Ld28_v01s1/, "28"); gsub(/Ld29_v01s1/, "29"); gsub(/Ld30_v01s1/, "30"); gsub(/Ld31_v01s1/, "31"); gsub(/Ld32_v01s1/, "32"); gsub(/Ld33_v01s1/, "33"); gsub(/Ld34_v01s1/, "34"); gsub(/Ld35_v01s1/, "35"); gsub(/Ld36_v01s1/, "36"); print;}' ~/Documents/LeishGenomeInfo/Ldonovani_BPK282A1.noseq.gff3 > ~/Documents/LeishGenomeInfo/Ldonovani_BPK282A1.noseq-new.gff3

Rangi

score 0 · Answer 3 · 2017-01-17

0

Entering edit mode

7.3 years ago

harold.smith.tarheel ★ 4.9k

Instead of modifying the GFF, it's probably easier to add the new scaffold names to your VCF using VCFtools or BCFtools 'annotate' and then edit the VCF to replace the scaffold names.

EDIT: Even easier, assuming you have a file with OLD_NAME and NEW_NAME fields, is the Linux 'join' command:

join -o 1.2,2.2,2.3,2.4,2.5,2.6,2.7,2.8 OLD_NEW_NAMES.FILE OLD.VCF > NEW.VCF

Note that, for any old scaffold name without a corresponding new name, those variants will be lost. Also, you will need to copy the OLD.VCF header to the new file.

ADD COMMENT • link 7.3 years ago by harold.smith.tarheel ★ 4.9k

0

Entering edit mode

should OLD_NEW_NAMES.FILE have two column (oldname newname)? is it possible for you to clarify it for me?

thanks a lot in advance

ADD REPLY • link 7.3 years ago by reza ▴ 300