Question

finding bacterial genes in dataset

0

Entering edit mode

3.2 years ago

john.chandler2011 • 0

Hi,

I am new to biology and genome sequencing. I am trying to figure out how I can utilize the knowledge that exists in the genome datasets. I get stuck at simple questions. Like the one I have right now is that if I have the genomics of the bacteria in the microbiome, then how can I figure out what is frequency of occurrence of a particular gene?

For instance, I am looking at this dataset - https://www.ncbi.nlm.nih.gov/bioproject/PRJNA482748

How can I look up the presence of a particular gene in this dataset? Can it be done? If yes, then can it be done online or do I have to download the entire dataset (and even then how to do it)?

I will appreciate it if someone could help and/or point me to the resources where I can learn how to do this stuff.

Thank you so much for your help!

microbiome gene sequencing • 546 views

ADD COMMENT • link 3.2 years ago by john.chandler2011 • 0

0

Entering edit mode

You could simply take the sequence of the gene you are interested in and search against this dataset. You could use a NGS data aligner (e.g. BBMap, bwa mem etc) with NCBI data as input or you can use DIAMOND (LINK) to search against your gene (convert it to protein sequence).

Since you are working with short reads expect to get some off-target alignment (simply due to chance sequence similarity) but it should tell you if that gene is likely present. This method is not going to give you information about frequency (as in say copies present) though. You may need to do some metagenomic assemblies with NCBI data to find that out.