Mapping SNP position to gff file
Entering edit mode
3.9 years ago

I want to map my SNP positions to gff file and extract the mapped gene name, gene coordinates.

I have two files, In one file, I have chromosome number and SNP position and another file which is gff file. I want to find if the SNPs are located inside genes and extract gene name and gene coordinates from gff file. Is there some CLI-tool where you can pass a SNP position with chromosome number and a gff file and it returns the name of the genes with coordinates where SNP mapped?

1. file1 (SNP informtion)
Chrm_num   SNP position
1   45106203
1   143047631
1   146650701
1   149261376
2   47602826
2   134579820
3   21063116
3   24559129
3   28000146
3   34244849
3   56395014
3   66228111
3   88703762
3   89967513
2. file 2 (gff file)
1       RefSeq  region  1       161428367       .       +       .       ID=id0;Dbxref=taxon:9913;Name=1;breed=Hereford;chromosome=1;gbkey=Src;genome=chromosome;mol_type=genomic DNA
1       Gnomon  gene    26068   28997   .       +       .       ID=gene0;Dbxref=GeneID:104976623;Name=LOC104976623;gbkey=Gene;gene=LOC104976623;gene_biotype=pseudogene;pseudo=true
1       Gnomon  exon    26068   28997   .       +       .       ID=id1;Parent=gene0;Dbxref=GeneID:104976623;gbkey=exon;gene=LOC104976623

Please let me know, I tried it very hard but I am not getting it.

SNP gene • 2.8k views
Entering edit mode

Welcome to biostars. I have two suggestions :-)

Please let me know, I tried it very hard but I am not getting it.

We like to put you on the right track, but then it is easier if you show us what you already tried. Maybe you are almost there and we just have to point you in the right direction for the final step.

Second, it would probably be easier if you have your variants in vcf format, since for those files many tools are developed. Do you?

Entering edit mode
3.9 years ago
husensofteng ▴ 380

one way is to use the intersect function from BedTools.

awk 'BEGIN{FS=OFS="\t"}{if(NR>1){ print $1,$2-1,$2 }}' snps.txt |
bedtools intersect -wb -a genes.gtf -b stdin | awk '$3=="gene"'
  1. first converts your SNP list into a BED file (chr, start, stop) and removes the header line
  2. intersects with the gtf file to get those entries that overlap the snps
  3. only keeps records that are genes (you can simply change the gene to exon/transcript etc).

From the output you can decide what columns to keep.

Also, if you want to get the entire gene coordinates instead of just the overlapping bases then change '-wb' to -'wo'

Entering edit mode

Thank you so much for telling it in detail. I got the result exactly what I want. Thanks once again!

Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.


Login before adding your answer.

Traffic: 1287 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6