Question: Mapping SNP position to gff file
gravatar for shachigahoimbi
14 days ago by
shachigahoimbi10 wrote:

I want to map my SNP positions to gff file and extract the mapped gene name, gene coordinates.

I have two files, In one file, I have chromosome number and SNP position and another file which is gff file. I want to find if the SNPs are located inside genes and extract gene name and gene coordinates from gff file. Is there some CLI-tool where you can pass a SNP position with chromosome number and a gff file and it returns the name of the genes with coordinates where SNP mapped?

1. file1 (SNP informtion)
Chrm_num   SNP position
1   45106203
1   143047631
1   146650701
1   149261376
2   47602826
2   134579820
3   21063116
3   24559129
3   28000146
3   34244849
3   56395014
3   66228111
3   88703762
3   89967513
2. file 2 (gff file)
1       RefSeq  region  1       161428367       .       +       .       ID=id0;Dbxref=taxon:9913;Name=1;breed=Hereford;chromosome=1;gbkey=Src;genome=chromosome;mol_type=genomic DNA
1       Gnomon  gene    26068   28997   .       +       .       ID=gene0;Dbxref=GeneID:104976623;Name=LOC104976623;gbkey=Gene;gene=LOC104976623;gene_biotype=pseudogene;pseudo=true
1       Gnomon  exon    26068   28997   .       +       .       ID=id1;Parent=gene0;Dbxref=GeneID:104976623;gbkey=exon;gene=LOC104976623

Please let me know, I tried it very hard but I am not getting it.

snp gene • 111 views
ADD COMMENTlink modified 14 days ago by husensofteng70 • written 14 days ago by shachigahoimbi10

Welcome to biostars. I have two suggestions :-)

Please let me know, I tried it very hard but I am not getting it.

We like to put you on the right track, but then it is easier if you show us what you already tried. Maybe you are almost there and we just have to point you in the right direction for the final step.

Second, it would probably be easier if you have your variants in vcf format, since for those files many tools are developed. Do you?

ADD REPLYlink modified 14 days ago • written 14 days ago by WouterDeCoster39k
gravatar for husensofteng
14 days ago by
husensofteng70 wrote:

one way is to use the intersect function from BedTools.

awk 'BEGIN{FS=OFS="\t"}{if(NR>1){ print $1,$2-1,$2 }}' snps.txt |
bedtools intersect -wb -a genes.gtf -b stdin | awk '$3=="gene"'
  1. first converts your SNP list into a BED file (chr, start, stop) and removes the header line
  2. intersects with the gtf file to get those entries that overlap the snps
  3. only keeps records that are genes (you can simply change the gene to exon/transcript etc).

From the output you can decide what columns to keep.

Also, if you want to get the entire gene coordinates instead of just the overlapping bases then change '-wb' to -'wo'

ADD COMMENTlink modified 14 days ago by ATpoint16k • written 14 days ago by husensofteng70

Thank you so much for telling it in detail. I got the result exactly what I want. Thanks once again!

ADD REPLYlink written 10 days ago by shachigahoimbi10

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

ADD REPLYlink written 10 days ago by WouterDeCoster39k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1144 users visited in the last hour