Question: Sort genomes for gene presence
0
gravatar for flemmingaftaasinge
10 weeks ago by
flemmingaftaasinge0 wrote:

I have a hard time solving a relatively simple problem (I am a novice)

I want to screen a collection of >8000 Prokaryot genomes for at specific gene, in order to make an prescence/absence table for a GWAS study. However, I am not sure how to go about this. I have the sequence of the gene in FASTA format, as well as all the genomes. Could I perhaps use Blast+ and create a local database of the genomes? I have the genomes as fna files directly from GenBank but I also have annotated versions from running Prokka.

Thank you in advance!

sequence gene genome • 147 views
ADD COMMENTlink modified 9 weeks ago • written 10 weeks ago by flemmingaftaasinge0

You can use searching gene against genome strategy using blast+ (blastn - searches nucleotide gene sequence against genome database) utility. But searching a gene sequence against predicted genes is a good strategy and would be more precise than searching against genome.

You can use .fasta file generated by prokka, which will contain predicted gene sequence for an individual organism.

NOTE: While searching your gene of interest against predicted genes, you may face a problem of unique identifier. Before making a gene database you have to change gene headers so that at the end you can track the source organism for the hit.

ADD REPLYlink written 10 weeks ago by Nitin Narwade380
0
gravatar for flemmingaftaasinge
9 weeks ago by
flemmingaftaasinge0 wrote:

Ok so I partially solved this. The genomes i want to screen are saved as fna files and the genes i want to screen for are saved in txt files. Then I run this nested loop in Ubuntu that performs a Blastn of the genes against all genomes and creates an xls file for each gebe containing the results for the blastn of each genome. The word size is set to 28bp to mimic megablast for intraspecies screening.

mkdir results_folder
for txt_fil in *txt
do
echo $txt_fil
for fna_fil in *fna
do
echo $fna_fil
blastn -task blastn -subject ${fna_fil} -query $txt_fil -word_size 28 -outfmt "6 sseqid pident length gapopen qstart qend sstart send evalue bitscore" -out "$fna_fil".csv
echo Done with $fna_fil
done
cat *csv >> "$txt_fil".xls
rm *csv
mv *xls results_folder
echo Done with $txt_fil
done;
ADD COMMENTlink modified 9 weeks ago by genomax67k • written 9 weeks ago by flemmingaftaasinge0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1074 users visited in the last hour