PRANK aligner abbreviates gene IDs
0
0
Entering edit mode
4.2 years ago
daw277 • 0

I am using PRANK to align multiple FASTA files containing orthogroups as identified in OrthoFinder for Bayesian analysis. I have noticed that when I run MrBayes, certain alignments will not load/run. Inspection of the alignments revealed that PRANK had abbreviated names from the original FASTA and as a result two of them were identical causing an Error in MrBayes. I am wondering if this is a result of the way I ran PRANK and if there is a way to get it to output the entire gene ID to the resulting nexus files.

I am running PRANK using its default settings as so:

prank -d=<input_fasta> -f=nexus -o=<output_nexus>

The input fasta has these taxa:

Itaiw_v1_scaffold_34_t22466-RA MLMCVLIANSSGNVLLERFHGVPGEERLHWRSFLVKLGTDNLKGARDDEPFIASHKSVYV VYGIIGDIWIFTVGKDEYDELTLVEVLYSITSSIKEVCKKAPNERLFLDNYGKVCLCLDE ICAQGMLEHTDKGRIRRLIRLRPLVDT* Itaiw_v1_scaffold_34_t22466-RB MLMCVLIANSSGNVLLERFHGVPGEERLHWRSFLVKLGTDNLKGARDDEPFIASHKSVYV VYGIIGDIWIFTVGKDEYDELTWNVRAHR*

Output looks like this:

'Itaiw_v1_scaffold_34' -----------------------------MLMCVLIANSSGNVLLERFHGVPGEERLHWR (...) 'Itaiw_v1_scaffold_34' -----------------------------MLMCVLIANSSGNVLLERFHGVPGEERLHWR (...)

Has anyone else run into this problem using PRANK? Is there a way to modify my PRANK command or my input files so that I get full names? I am running this analysis on thousands of orthogroup files so fixing them one by one is not an option. I realize that this question is fairly specific but I couldn't find any reference to this sort of thing in the PRANK docs and a cursory Google search didn't give me much either- So here I am!

alignment gene • 720 views
ADD COMMENT

Login before adding your answer.

Traffic: 2750 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6