Entering edit mode
3.9 years ago
Bioinfonext
▴
460
I have downloaded the microbial genome using the repophlan_get_microbes.py. and got four folder:
faa
ffn
fna
frn
With in the fna folder I got the files like this:
G001284865.fna.bz2 G002910165.fna.bz2 G009390615.fna.bz2
G001284885.fna.bz2 G002910195.fna.bz2 G009390655.fna.bz2..
...........
Could you please help me now I can filter these genome files based on the quality score? as they have shown in this publication (https://www.nature.com/articles/s41586-020-2095-1)
and further how I can make single nucleotide database to do blastn?
Many thanks for your help and time.
How so?
.fna
should be fasta format sequence files without any associated quality information/scores.Since these are regular fasta files. It should be straightforward to make blast databases using
makeblastdb
. Not sure what you mean bysingle
nucleotide.thanks genomax for quick help!
In the ablove link publication they have mentioned that " A total of 71,782 microbial genomes were downloaded using RepoPhlan (https://bitbucket.org/nsegata/repophlan) on 14 June 2016, of which 5,503 were viral and 66,279 were bacterial or archaeal. On the basis of prior literature, bacterial and archaeal genomes were filtered for quality scores of 0.8 or better58, which left 54,471 of them for subsequent analysis, or a total of 59,974 microbial genomes".
But did not mention how to find the quality score and then how they filtered it?
there is also script on RepoPhlan: screen.py (https://bitbucket.org/nsegata/repophlan/src/default/) but not sure what is the use of this script and how I should use it?
Many thanks bioinfonext
I have downloaded the mcrobial genome using repophlan_get_microbes.py script (https://bitbucket.org/nsegata/repophlan/src/default/). now I am trying to get the quality score for downloaded genome using the screen.py but I am getting an error? do I need to download any other tool?
Could you please help me how I can resolve this although I have downloaded the pfam directory from the above link?
Many thanks nabiyogesh
As you can see from the usage statement this program does not seem to have a
--hmm
option. I think you just need to run (fromrun.sh
file)Thanks @genomax, now it is working well without -hmm flag.
Hi @genomax,
I got multiple bacterial genome after quality filtering (around 54000) and each of these genome fasta files are in an individual folder, could you please suggest how I can make a single database for blastn.
Earlier, I used makeblastbd for making database using below command:
Genome fasta files are located in fna folder;
fna/
With in the fna folder I got the files like this:
Many thanks