From consensus sequence with variants to protein
1
0
Entering edit mode
3.7 years ago
Maxime • 0

Hello Biostars,

I have multiple gene sequences created with https://samtools.github.io/bcftools/howtos/consensus-sequence.html for probands. I would like to analyse them at protein level, to retrieve the protein sequence from those individuals including all variants. The point is to analyse some specific mutations in context of common polymorphisms surrounding them.

So, I have the complete gene sequence at nucleotidic level and I would like to retrieve the coding sequence to then translate it.

My approach was to get the CCDS of my gene of interest (for example https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=CCDS&DATA=CCDS10509 for CREBBP) and then align my full sequence onto it to reconstruct the proband CDS. I used blastn on NCBI's website (option megablast) and it aligned well but with it rose some issues.

The end of an aligned match usually doesn't exactly match with the beginning of the next match. For example, one match ending at position 3701 and the next one beginning at position 3697.

So, downloading the aligned matches then creating a script to concatenate them together wouldn't work. I could do it manually but I have too much genes to do it.

Is there an alternative and easier solution?

Thank you

aligment BLAST CDS CCDS VCF • 861 views
ADD COMMENT
1
Entering edit mode
3.7 years ago
Maxime • 0

I've found this tool : http://genomics.brocku.ca/Prot2gene/index.html which does what I want, if anyone come across the same question.

ADD COMMENT

Login before adding your answer.

Traffic: 3198 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6