finding bacterial genes in dataset
Entering edit mode
6 weeks ago


I am new to biology and genome sequencing. I am trying to figure out how I can utilize the knowledge that exists in the genome datasets. I get stuck at simple questions. Like the one I have right now is that if I have the genomics of the bacteria in the microbiome, then how can I figure out what is frequency of occurrence of a particular gene?

For instance, I am looking at this dataset -

How can I look up the presence of a particular gene in this dataset? Can it be done? If yes, then can it be done online or do I have to download the entire dataset (and even then how to do it)?

I will appreciate it if someone could help and/or point me to the resources where I can learn how to do this stuff.

Thank you so much for your help!

microbiome gene sequencing • 125 views
Entering edit mode

You could simply take the sequence of the gene you are interested in and search against this dataset. You could use a NGS data aligner (e.g. BBMap, bwa mem etc) with NCBI data as input or you can use DIAMOND (LINK) to search against your gene (convert it to protein sequence).

Since you are working with short reads expect to get some off-target alignment (simply due to chance sequence similarity) but it should tell you if that gene is likely present. This method is not going to give you information about frequency (as in say copies present) though. You may need to do some metagenomic assemblies with NCBI data to find that out.

Entering edit mode

Thanks, I will check it out.


Login before adding your answer.

Traffic: 1076 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6