Question: Conversion of fasta file to taxon id(?)
0
gravatar for sajjalwaqas
6 days ago by
sajjalwaqas0 wrote:

I have a fasta file of nitrogen fixing bacteria. Gene nifH. The file has id as well as sequence. I want to form a table that will define I'd for example ( id: kinddom, phylum, class, order, family, genus, specie). This is my first time doing this as I am still a bachelor student so any help will be appreciated.

ADD COMMENTlink modified 5 days ago • written 6 days ago by sajjalwaqas0

Do you have the taxID of these bacteria or just sequence of the genes? Do you have names of the bacteria you are dealing with?

If you have the taxID's you can get the information you need by using my answer here: A: converting taxID to taxonomy

ADD REPLYlink written 6 days ago by genomax80k

I have sequences of genes.

ADD REPLYlink written 5 days ago by sajjalwaqas0

Does the header of the sequences contains Accession numbers of the genes?

How many sequences do you have?

ADD REPLYlink modified 5 days ago • written 5 days ago by Fatima560

No accession number and the number of sequences are more than 100. I don't really know because the file was forwarded to me

ADD REPLYlink written 5 days ago by sajjalwaqas0

There might be tools out there that can give you what you need just using the sequences, but it's not on top of my head.

I'm not sure but I think you can use blastn and find the accession numbers or reference genomes, if you have under 500 sequence s you might be able to use online blastn. Try with 2 of the sequences and see if you like the output. You can also do blastn command line and use - remote option to search against nucleotide database (nr), but -remote option is slow. You can also download nt database and compile it, but it's huge.

You can also use blastx against non redundant proteins (nr) and get the protein accession numbers.

If you had protein sequences instead of gene sequences you could use blastp.

After you got the accession numbers then you can use efetch from Entrez Direct to get the taxonomy.

If you're not familiar with these tools you can google them or search them in this forum.

If you click on this link https://www.ncbi.nlm.nih.gov/nuccore/X51500.1?report=fasta

on the right you can see Run Blast, if you click on it, then you can choose blastn or blastx and compare their output.

Also you can access online blast from here:

https://blast.ncbi.nlm.nih.gov/Blast.cgi

You can select blastx or blastn and copy paste two of your sequences, upload your fasta file, ...

ADD REPLYlink modified 5 days ago • written 5 days ago by Fatima560

thank you ill try doind as you say

ADD REPLYlink written 5 days ago by sajjalwaqas0

Please post an example of some of the data you have.

ADD REPLYlink written 5 days ago by Joe16k

If you know that you have nifH genes it should be easy to identify which bacteria they are from by downloading all nifH genes from NCBI and then doing a search against that set using blat or blast.

ADD REPLYlink modified 5 days ago • written 5 days ago by genomax80k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1046 users visited in the last hour