how to convert Locus tag to old locus tag?
1
1
Entering edit mode
7.0 years ago
Paul ▴ 80

I have a few new Locus tags as follows:

RS09560
RS10020
RS10595

and I need to convert them into old Locus tag which are

RS09560     NA
RS10020     1984
RS10595     2097

NA are the ones with not available old locus tag. I couldn't find any database to do so. And the information available for old locus tag and new locus tag is in the NCBI website

https://www.ncbi.nlm.nih.gov/nuccore/NC_000962.3

I tried to scarp the data from the website using R (rvest) and had written a few lines, but skeptical about the HTML node I should extract from the web page to get the information about old locus tag and new locus tag.

> library(xml2)
> library(rvest)
> url <- 'https://www.ncbi.nlm.nih.gov/nuccore/NC_000962.3'
> webpage <- read_html(url)
> sb_table <- html_nodes(webpage, ' ')

Please suggest a way to do and also which node could be used to fetch the locus tags.

Locus-tag R web-scraping HTML Locus-tags • 4.0k views
ADD COMMENT
1
Entering edit mode
7.0 years ago
Jenez ▴ 540

I'm a python type of guy, and what I would do is to set up a script to parse through the genbank file using biopython.

I wouldn't expect to find any source that has any old locus names that are missing from the NCBI genbank files. Gene annotations are constantly updated and it's not unusual for 'novel' genes to pop up, which will not have an old locus name.

Assuming that what you have is a neat list of accessions of interest, it is just a matter of looping through the 'Features' and looking for the accessions, grabbing the old locus names if available, and print a simple list of new vs old accessions.

Even if you have not touched python yet, it is a fairly easy and decent introduction to it if you are willing to give it a go.

ADD COMMENT

Login before adding your answer.

Traffic: 3095 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6