Question: Sort genomes for gene presence
gravatar for flemmingaftaasinge
14 months ago by
flemmingaftaasinge0 wrote:

I have a hard time solving a relatively simple problem (I am a novice)

I want to screen a collection of >8000 Prokaryot genomes for at specific gene, in order to make an prescence/absence table for a GWAS study. However, I am not sure how to go about this. I have the sequence of the gene in FASTA format, as well as all the genomes. Could I perhaps use Blast+ and create a local database of the genomes? I have the genomes as fna files directly from GenBank but I also have annotated versions from running Prokka.

Thank you in advance!

sequence gene genome • 310 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by flemmingaftaasinge0

You can use searching gene against genome strategy using blast+ (blastn - searches nucleotide gene sequence against genome database) utility. But searching a gene sequence against predicted genes is a good strategy and would be more precise than searching against genome.

You can use .fasta file generated by prokka, which will contain predicted gene sequence for an individual organism.

NOTE: While searching your gene of interest against predicted genes, you may face a problem of unique identifier. Before making a gene database you have to change gene headers so that at the end you can track the source organism for the hit.

ADD REPLYlink written 14 months ago by Nitin Narwade440
gravatar for flemmingaftaasinge
14 months ago by
flemmingaftaasinge0 wrote:

Ok so I partially solved this. The genomes i want to screen are saved as fna files and the genes i want to screen for are saved in txt files. Then I run this nested loop in Ubuntu that performs a Blastn of the genes against all genomes and creates an xls file for each gebe containing the results for the blastn of each genome. The word size is set to 28bp to mimic megablast for intraspecies screening.

mkdir results_folder
for txt_fil in *txt
echo $txt_fil
for fna_fil in *fna
echo $fna_fil
blastn -task blastn -subject ${fna_fil} -query $txt_fil -word_size 28 -outfmt "6 sseqid pident length gapopen qstart qend sstart send evalue bitscore" -out "$fna_fil".csv
echo Done with $fna_fil
cat *csv >> "$txt_fil".xls
rm *csv
mv *xls results_folder
echo Done with $txt_fil
ADD COMMENTlink modified 14 months ago by genomax83k • written 14 months ago by flemmingaftaasinge0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1030 users visited in the last hour