Conversion of fasta file to taxon id(?)
0
0
Entering edit mode
16 months ago

I have a fasta file of nitrogen fixing bacteria. Gene nifH. The file has id as well as sequence. I want to form a table that will define I'd for example ( id: kinddom, phylum, class, order, family, genus, specie). This is my first time doing this as I am still a bachelor student so any help will be appreciated.

sequence BLAST taxon id Ncbi Taxonomy • 581 views
0
Entering edit mode

Do you have the taxID of these bacteria or just sequence of the genes? Do you have names of the bacteria you are dealing with?

If you have the taxID's you can get the information you need by using my answer here: A: converting taxID to taxonomy

0
Entering edit mode

I have sequences of genes.

0
Entering edit mode

Does the header of the sequences contains Accession numbers of the genes?

How many sequences do you have?

0
Entering edit mode

No accession number and the number of sequences are more than 100. I don't really know because the file was forwarded to me

0
Entering edit mode

There might be tools out there that can give you what you need just using the sequences, but it's not on top of my head.

I'm not sure but I think you can use blastn and find the accession numbers or reference genomes, if you have under 500 sequence s you might be able to use online blastn. Try with 2 of the sequences and see if you like the output. You can also do blastn command line and use - remote option to search against nucleotide database (nr), but -remote option is slow. You can also download nt database and compile it, but it's huge.

You can also use blastx against non redundant proteins (nr) and get the protein accession numbers.

If you had protein sequences instead of gene sequences you could use blastp.

After you got the accession numbers then you can use efetch from Entrez Direct to get the taxonomy.

If you're not familiar with these tools you can google them or search them in this forum.

If you click on this link https://www.ncbi.nlm.nih.gov/nuccore/X51500.1?report=fasta

on the right you can see Run Blast, if you click on it, then you can choose blastn or blastx and compare their output.

Also you can access online blast from here:

https://blast.ncbi.nlm.nih.gov/Blast.cgi

You can select blastx or blastn and copy paste two of your sequences, upload your fasta file, ...

0
Entering edit mode

thank you ill try doind as you say

0
Entering edit mode

Please post an example of some of the data you have.

0
Entering edit mode

If you know that you have nifH genes it should be easy to identify which bacteria they are from by downloading all nifH genes from NCBI and then doing a search against that set using blat or blast.