Question: SNP annotation using gff3
0
gravatar for reza
2.3 years ago by
reza210
Iran
reza210 wrote:

hi everyone

i am trying to annotate of identified SNPs and INDELs (vcf format) using software such as snpEff. i downloaded gff3 file correspond to my reference genome. i have to build database in snpEff because my desired database is not present in snpEff.

problem: scaffold names in gff3 file and VCF file are different and i must to convert scaffold names in gff3 file to scaffolds name in VCF file and then try to building new database in snpEff using name-converted gff3 file. anybody here know how i can do it?

example of differences in scoffold names

name in gff3 * name in VCF

NW_011590949.1 * KN271049.1

NW_011590950.1 * KN271050.1

...

are there alternative ways to annotate new SNPs? it is possible to merge gff and VCF to annotate VCF file?

thanks in advance

sequencing snp next-gen • 1.7k views
ADD COMMENTlink modified 2.3 years ago by harold.smith.tarheel4.3k • written 2.3 years ago by reza210

It's weird that you have different scaffolds names taking in to account that it is the same genome. Are you sure that the genome you have used for variant calling is the same genome that it is annotated in the gff3 file? same genome and same version? If the answer is yes, if you have list with gff3 - vcf scaffold name correlation, let's say NW_011590949.1 gff3 scaffold correspond to KN271049.1 vcf scaffold, it shouldn't be so difficult to change the names using a little script or awk command.

ADD REPLYlink written 2.3 years ago by iraun3.5k
0
gravatar for reza
2.3 years ago by
reza210
Iran
reza210 wrote:

it is same genome. anybody can help me to do this with awk command?

ADD COMMENTlink written 2.3 years ago by reza210

Please use ADD REPLY to answer to earlier comments, as such this thread remains logically structured and easy to follow.

ADD REPLYlink written 2.3 years ago by WouterDeCoster38k
0
gravatar for sumudu_rangika
2.3 years ago by
sumudu_rangika30 wrote:

Hi

This is an example of what I used. It worked for me.

awk '{gsub(/Ld01_v01s1/, "1"); gsub(/Ld02_v01s1/, "2"); gsub(/Ld03_v01s1/, "3"); gsub(/Ld04_v01s1/, "4"); gsub(/Ld05_v01s1/, "5"); gsub(/Ld06_v01s1/, "6"); gsub(/Ld07_v01s1/, "7"); gsub(/Ld08_v01s1/, "8"); gsub(/Ld09_v01s1/, "9"); gsub(/Ld10_v01s1/, "10"); gsub(/Ld11_v01s1/, "11"); gsub(/Ld12_v01s1/, "12"); gsub(/Ld13_v01s1/, "13"); gsub(/Ld14_v01s1/, "14"); gsub(/Ld15_v01s1/, "15"); gsub(/Ld16_v01s1/, "16"); gsub(/Ld17_v01s1/, "17"); gsub(/Ld18_v01s1/, "18"); gsub(/Ld19_v01s1/, "19"); gsub(/Ld20_v01s1/, "20"); gsub(/Ld21_v01s1/, "21"); gsub(/Ld22_v01s1/, "22"); gsub(/Ld23_v01s1/, "23"); gsub(/Ld24_v01s1/, "24"); gsub(/Ld25_v01s1/, "25"); gsub(/Ld26_v01s1/, "26"); gsub(/Ld27_v01s1/, "27"); gsub(/Ld28_v01s1/, "28"); gsub(/Ld29_v01s1/, "29"); gsub(/Ld30_v01s1/, "30"); gsub(/Ld31_v01s1/, "31"); gsub(/Ld32_v01s1/, "32"); gsub(/Ld33_v01s1/, "33"); gsub(/Ld34_v01s1/, "34"); gsub(/Ld35_v01s1/, "35"); gsub(/Ld36_v01s1/, "36"); print;}' ~/Documents/LeishGenomeInfo/Ldonovani_BPK282A1.noseq.gff3 > ~/Documents/LeishGenomeInfo/Ldonovani_BPK282A1.noseq-new.gff3

Rangi

ADD COMMENTlink written 2.3 years ago by sumudu_rangika30

Instead of using this method, it may be more efficient to 1. sort your files according to the chromosome names and then 2. use the unix join command, which performs an SQL style join on two files and 3. cut the columns from your resulting joined file using the unix cut command to get the final file that you want to have.

ADD REPLYlink written 2.3 years ago by cmdcolin1.2k

thank you for your suggestion but my vcf file have ~32000 scaffold id. no shortcut to do this?

ADD REPLYlink written 2.3 years ago by reza210
0
gravatar for harold.smith.tarheel
2.3 years ago by
United States
harold.smith.tarheel4.3k wrote:

Instead of modifying the GFF, it's probably easier to add the new scaffold names to your VCF using VCFtools or BCFtools 'annotate' and then edit the VCF to replace the scaffold names.

EDIT: Even easier, assuming you have a file with OLD_NAME and NEW_NAME fields, is the Linux 'join' command:

join -o 1.2,2.2,2.3,2.4,2.5,2.6,2.7,2.8 OLD_NEW_NAMES.FILE OLD.VCF > NEW.VCF

Note that, for any old scaffold name without a corresponding new name, those variants will be lost. Also, you will need to copy the OLD.VCF header to the new file.

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by harold.smith.tarheel4.3k

should OLD_NEW_NAMES.FILE have two column (oldname newname)? is it possible for you to clarify it for me?

thanks a lot in advance

ADD REPLYlink written 2.2 years ago by reza210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1095 users visited in the last hour