Entering edit mode
7 weeks ago
SushiRoll ▴ 100
Sorry if this question has been addressed before but I haven't been able to find a solution to this. I have a lot of assemblies (around 800) and I would like to retrieve the fasta sequence for a specific housekeeping gene which should (in theory) be present in all of them. Is there any tool that can take the fasta assembly as input and retrieve a specific gene with certain % variation to retrieve the gene even if it has mutations? Alternatively it could take a gbk or gff3 as input and use the gene annotation as retrieval criterion.
Thanks a lot!
Don't know if there is a ready made tool. You will need to align the gene to your assemblies and then it is a matter of parsing the results and retrieving the sequence you need using
samtools faidxand similar options.
Great, I'll give it a shot.
Personally, I never worked on similar tasks and thus unfortunately can't provide you with a polished solution, but what you are trying here is to find orthologous genes. Using this keyword, you should find tools suitable for this task, e.g. OrthoFinder showed up in a quick search.
Thanks Matthias, that's a great starting point, I'll check what's out there.