Estimate number of particular genes in my assembly
1
0
Entering edit mode
3.6 years ago
Gonçalo • 0

Hello everyone. I am new here so forgive me if I am not doing things in a most correct way. After successfully performing genome assembly ( I hope) from a particular species, I need to provide an estimation of how many cytochromes genes are present in my assembly. I really struggling to come up with a strategy to approach this task. Would you use BLAST to compare/find regions of similarity with other known sequences? But how do I do that with a particular gene (CYP in this case)?. Any help would be very much appreciated. Thank you very much.

ps - this is related to some MSc coursework

Gonçalo

Assembly gene genome • 706 views
ADD COMMENT
1
Entering edit mode

Is there a related genome available that is annotated? You could start with that and compare. It would also give you some idea of how good your assembly is.

BLAST would be a good way to start to look at individual genes. Have you done gene predictions? Preferably do your comparisons at protein level to have confidence in the results.

If the gene is expected to be multi-copy then your assembly may have collapsed those copies if you did not have long read data. So keep that in consideration.

ADD REPLY
0
Entering edit mode

Thank you very much for your reply. I didn't do gene predictions yet. Would you suggest using something like MAKER for gene prediction and then use BLAST to find regions of similarity with the reference genome? My task is basically to perform a de novo assembly for the fire ant Solenopsis Invicta as, apparently, the official genome assembly is quite fragmented. Then, as part of the same assessment, I am being asked to estimate the number of CYP genes in my assembly.

ADD REPLY
1
Entering edit mode

Running MAKER on your assembly would be fine. Since it is an annotation pipeline it will produce valued added results for the whole genome. You may also want to run BUSCO to see if your assembly is reasonably complete.

ADD REPLY
1
Entering edit mode
3.6 years ago
NH ▴ 10

Depending on the stringency of the requirements BLAST may be a reasonable method, but you would need to select a list of CYP genes from your species found on an online database, keep them in a file and blast those against your assembled genome. Obviously this will produce a huge number of matches, but you could then reduce these by selecting for high identity, length and coverage etc using the various options available for blast.

This is a very simple method and introduces many follow-on questions, but I'm sure if your work is asking for just an estimate, there might also be a question regarding the strengths and limitations of such a method of estimating genes.

Good luck with your MSc!!

ADD COMMENT
0
Entering edit mode

Thank you very much for your help. "I'm sure if your work is asking for just an estimate, there might also be a question regarding the strengths and limitations of such a method of estimating genes." That's absolutely the case :)

ADD REPLY

Login before adding your answer.

Traffic: 1663 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6