Entering edit mode
9.2 years ago
julia92796
•
0
Hi,
I'm trying to pull out protein sequences from BAM files containing genome alignments. What is the best way to do this? Right now, I have a BAM file containing the neanderthal alignment to the human genome and a fasta file containing the human reference sequence. I wasn't sure whether the next step would be to pull out the neanderthal consensus sequence using SAMtools. If so, where do I go from there, and if that's not the case, what should I do next?
Have you seen this thread: Looking for neanderthal genomes to download
What are you looking to get at the end?
Thanks, this thread is helpful. I have a list of human proteins, as well as the genes coding for these proteins, and I'm trying to find homologs to the proteins in humanoid species. I suspect that the next step in my process is generating the consensus sequence for the neanderthal genome, but after that, I'm not sure exactly where to proceed.
Hey @Julia92796, Were you able to figure it out? I'm curious how you built the consensus sequences and aligned them to human proteins. I noticed a lot of insertions, and the MSA seems to be failing at domain regions.