How to find gene annotation given a position in a bacterial genome
1
0
Entering edit mode
12 months ago

I have a .bed and .gff file for a bacterial genome of Gardnerella vaginalis. I also have a csv that contains lists of positions where there are mutations. Ex: 10401, 224444.

I want to feed in the position and figure out the gene or intergenic region the mutation is in. So put in 10401 which is the nucleotide position in the genome, and output what gene annotation or region it is.

How do I do this, are there available tools?

position annotation snp intergenic • 489 views
0
Entering edit mode
12 months ago

To do an ad-hoc search via the BEDOPS kit:

$gff2bed < genes.gff > genes.bed$ echo -e 'chrZ\t10400\t10401' | bedops -e 1 genes.bed -


Replace chrZ with the name of your contig.

Once you have a feel for this, put your zero-indexed positions of interest into a tab-delimited, sort-bed-sorted BED file to run a full search:

$bedops -e 1 genes.bed positions.bed  Or to get associations, you can use bedmap: $ bedmap --echo --echo-map positions.bed genes.bed


This will report each position, along with any genes that associate with that position, where there are overlaps.

Make sure contig names are consistent between gene annotations and positions, and that BED files are sorted properly, per sort-bed.

0
Entering edit mode

Thank you so much! I had success pulling out the gene associated with the manual position entry. However I am stuck on how to create a .bed file out of a .txt file of positions. The positions I have are just one number because they are SNPs so I have a .txt file that looks like this:

40136

47092

136648

165946

219134

Thank you I really appreciate the help!