Question: how to filter microbial genome based on quality score
0
gravatar for Bioinfonext
7 weeks ago by
Bioinfonext220
Korea
Bioinfonext220 wrote:

I have downloaded the microbial genome using the repophlan_get_microbes.py. and got four folder:

faa  
ffn 
fna  
frn

With in the fna folder I got the files like this:

G001284865.fna.bz2  G002910165.fna.bz2  G009390615.fna.bz2
G001284885.fna.bz2  G002910195.fna.bz2  G009390655.fna.bz2..

...........

Could you please help me now I can filter these genome files based on the quality score? as they have shown in this publication (https://www.nature.com/articles/s41586-020-2095-1)

and further how I can make single nucleotide database to do blastn?

Many thanks for your help and time.

repophlan metagenomics • 127 views
ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by Bioinfonext220

Could you please help me now I can filter these genome files based on the quality score?

How so? .fna should be fasta format sequence files without any associated quality information/scores.

further how I can make single nucleotide database to do blastn?

Since these are regular fasta files. It should be straightforward to make blast databases using makeblastdb. Not sure what you mean by single nucleotide.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by genomax87k

thanks genomax for quick help!

In the ablove link publication they have mentioned that " A total of 71,782 microbial genomes were downloaded using RepoPhlan (https://bitbucket.org/nsegata/repophlan) on 14 June 2016, of which 5,503 were viral and 66,279 were bacterial or archaeal. On the basis of prior literature, bacterial and archaeal genomes were filtered for quality scores of 0.8 or better58, which left 54,471 of them for subsequent analysis, or a total of 59,974 microbial genomes".

But did not mention how to find the quality score and then how they filtered it?

there is also script on RepoPhlan: screen.py (https://bitbucket.org/nsegata/repophlan/src/default/) but not sure what is the use of this script and how I should use it?

Many thanks bioinfonext

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Bioinfonext220

I have downloaded the mcrobial genome using repophlan_get_microbes.py script (https://bitbucket.org/nsegata/repophlan/src/default/). now I am trying to get the quality score for downloaded genome using the screen.py but I am getting an error? do I need to download any other tool?

Could you please help me how I can resolve this although I have downloaded the pfam directory from the above link?

$ python screen.py --in_summary repophlan_microbes.txt --out_summary repophlan_microbes_wscores.txt --hmm pfam/102.hmm
usage: screen.py [-h] [--nproc NPROC] --in_summary IN_SUMMARY --out_summary
OUT_SUMMARY
screen.py: error: unrecognized arguments: --hmm pfam/102.hmm

Many thanks nabiyogesh

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Bioinfonext220
1

As you can see from the usage statement this program does not seem to have a --hmm option. I think you just need to run (from run.sh file)

python screen.py --nproc 10 --in_summary out/repophlan_microbes_${t}.txt --out_summary out/repophlan_microbes_${t}_wscores.txt
ADD REPLYlink written 4 weeks ago by genomax87k

Thanks @genomax, now it is working well without -hmm flag.

ADD REPLYlink written 4 weeks ago by Bioinfonext220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 688 users visited in the last hour