Question

PRANK aligner abbreviates gene IDs

0

Entering edit mode

4.2 years ago

daw277 • 0

I am using PRANK to align multiple FASTA files containing orthogroups as identified in OrthoFinder for Bayesian analysis. I have noticed that when I run MrBayes, certain alignments will not load/run. Inspection of the alignments revealed that PRANK had abbreviated names from the original FASTA and as a result two of them were identical causing an Error in MrBayes. I am wondering if this is a result of the way I ran PRANK and if there is a way to get it to output the entire gene ID to the resulting nexus files.

I am running PRANK using its default settings as so:

prank -d=<input_fasta> -f=nexus -o=<output_nexus>

The input fasta has these taxa:

Itaiw_v1_scaffold_34_t22466-RA MLMCVLIANSSGNVLLERFHGVPGEERLHWRSFLVKLGTDNLKGARDDEPFIASHKSVYV VYGIIGDIWIFTVGKDEYDELTLVEVLYSITSSIKEVCKKAPNERLFLDNYGKVCLCLDE ICAQGMLEHTDKGRIRRLIRLRPLVDT* Itaiw_v1_scaffold_34_t22466-RB MLMCVLIANSSGNVLLERFHGVPGEERLHWRSFLVKLGTDNLKGARDDEPFIASHKSVYV VYGIIGDIWIFTVGKDEYDELTWNVRAHR*

Output looks like this:

'Itaiw_v1_scaffold_34' -----------------------------MLMCVLIANSSGNVLLERFHGVPGEERLHWR (...) 'Itaiw_v1_scaffold_34' -----------------------------MLMCVLIANSSGNVLLERFHGVPGEERLHWR (...)

Has anyone else run into this problem using PRANK? Is there a way to modify my PRANK command or my input files so that I get full names? I am running this analysis on thousands of orthogroup files so fixing them one by one is not an option. I realize that this question is fairly specific but I couldn't find any reference to this sort of thing in the PRANK docs and a cursory Google search didn't give me much either- So here I am!

alignment gene • 720 views

ADD COMMENT • link 4.2 years ago by daw277 • 0