Question: (Closed) Is there any way to retrieve a list of start and end positions of tuberculosis genome genes?
0
gravatar for bioinformatics_bel
16 months ago by
US, Alaska
bioinformatics_bel20 wrote:

Is there any way to retrieve a list of start and end positions of tuberculosis genome genes automatically?

I have a list of TB drug resistance genes:

ebmB
embR
Rv3124
Rv3125c
Rv3126
Rv0340
iniA
iniB
iniC
rmlA2
rmlD
inhA
ethA
gyrA
gyrB
tlyA
thyA
rss
katG
kasA
ndh
oxyR
ahpC
mabA
inhA
furA
Rv0340
Rv1592c
Rv1772
srmR
fabD
accD6
fbpC
fadE24
efpA
nhoA
gid
rpsL
pncA
embC
embB
embA
rpoB
katG

Want to get a respective columns of Start and End nucleotide numbers automatically

Thx

ADD COMMENTlink modified 16 months ago by genomax63k • written 16 months ago by bioinformatics_bel20

Are you after a specific strain or genome? I could suggest efetch/eutils solution if you could please provide more info.

ADD REPLYlink written 16 months ago by Sej Modha4.1k

Hello bioinformatics_bel!

Questions similar to yours can already be found at:

We have closed your question to allow us to keep similar content in the same thread.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.

Cheers!

ADD REPLYlink written 16 months ago by Michael Dondrup45k
2
gravatar for genomax
16 months ago by
genomax63k
United States
genomax63k wrote:
for i in `cat ./gene`; do esearch -db gene -query "$i[gene] AND Mycobacterium tuberculosis[orgn]" | efetch -format docsum | xtract -pattern DocumentSummary -element ScientificName -element Name -element ChrStart -element ChrStart | sed 's/999999999//g';done

Should produce.

Mycobacterium tuberculosis H37Rv        embR    1417346         1417346
Mycobacterium tuberculosis str. Erdman = ATCC 35801     embR    1415864         1415864
Mycobacterium tuberculosis UT205        embR    1420011         1420011
Mycobacterium tuberculosis CTRI-2       embR    1417710         1417710
Mycobacterium tuberculosis H37Ra        embR    1418656         1418656
Mycobacterium tuberculosis H37Rv        moaR1   3489505         3489505
Mycobacterium tuberculosis H37Rv        PPE49   3491650         3491650
Mycobacterium tuberculosis H37Rv        Rv0340  408633          408633
Mycobacterium tuberculosis H37Rv        iniA    410837          410837
Mycobacterium tuberculosis CCDC5079     iniA    409232          409232
Mycobacterium tuberculosis str. Erdman = ATCC 35801     iniA    411274          411274
Mycobacterium tuberculosis UT205        iniA    412048          412048
Mycobacterium tuberculosis CTRI-2       iniA    413184          413184
Mycobacterium tuberculosis H37Ra        iniA    412199          412199
Mycobacterium tuberculosis H37Rv        iniB    409361          409361
Mycobacterium tuberculosis CCDC5079     iniB    407756          407756
Mycobacterium tuberculosis str. Erdman = ATCC 35801     iniB    409723          409723
Mycobacterium tuberculosis UT205        iniB    410572          410572
Mycobacterium tuberculosis CTRI-2       iniB    411708          411708
Mycobacterium tuberculosis H37Ra        iniB    410723          410723
ADD COMMENTlink written 16 months ago by genomax63k

what package to download and where to put these codes? I have tried lInux shell terminal and it does not work for list of genes, thx.

ADD REPLYlink written 16 months ago by bioinformatics_bel20

Download eutils from NCBI. This absolutely works, the results are in the post above.

ADD REPLYlink modified 16 months ago • written 16 months ago by genomax63k
1
gravatar for Sej Modha
16 months ago by
Sej Modha4.1k
Glasgow, UK
Sej Modha4.1k wrote:

Maybe you could try:

It should give you the coordinates you are after in the annotation attribute.

esearch -db genome -query "Mycobacterium tuberculosis[Organism]"|elink -target gene|efilter -query "embR[gene]"|efetch -format ft_na

Or the following command that should give you coordinates for each genome:

esearch -db gene -query "embR[gene]"|efilter -query "Mycobacterium tuberculosis[Organism]"|efetch -format ft_na
ADD COMMENTlink modified 16 months ago • written 16 months ago by Sej Modha4.1k

Wrap this in a bash loop. gene contains names one per line.

for i in `cat ./gene`; do esearch -db genome -query "Mycobacterium tuberculosis[Organism]"|elink -target gene|efilter -query "$i[gene]"|efetch -format ft_na; done
ADD REPLYlink written 16 months ago by genomax63k

what package to download and where to put these codes? I have tried lInux shell terminal and it does not work for list of genes, thx.

ADD REPLYlink written 16 months ago by bioinformatics_bel20

what package to download and where to put these codes? I have tried lInux shell terminal and it does not work for a list of genes, thx.

ADD REPLYlink written 16 months ago by bioinformatics_bel20
1

it does not work

That's not very informative. You could at least tell us what went wrong.

ADD REPLYlink written 16 months ago by WouterDeCoster37k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1196 users visited in the last hour