(I edited some typo.)
Hi, I am planning to using GERP++
with a whole genome alignment. I want to get a multiple alignment FASTA file from my results from Cactus
, a whole-genome aligner (https://github.com/ComparativeGenomicsToolkit/cactus/tree/master).
I seems to have a correct HAL file with cactus
from multiple genome FASTA files, then I convert it to MAF file with cactus-hal2maf
.
singularity exec cactus.sif cactus-hal2maf ./jobStore ${hal} ${maf} \ --refGenome REF --noAncestors --chunkSize 1000000 --batchCores 4 \ --filterGapCausingDupes --refSequence ${chr}
NOTE: Although cactus
has hal2fasta
, but it only extracts original sequences without alignment information.
Next, I attempted to split the MAF file by chromosome using msa_view
, but I encountered an error.
Other programs I've tried have not been successful either.
The issue seems to be a mismatch between the reference sequences in my MAF file and the original genome:
msa_view cactus_alignment/Chromosome1.maf -f -G 1 --refseq Chromosome1.fasta > Chromosome1.maf.fasta
I received the following error:
Splitting 1 files by target sequence -- ignoring first argument dummy.bed
cactus_alignment/Chromosome1.maf
ERROR: character 'T' at position 108901 of reference sequence does not match character 'a' given in MAF file.
How can I convert my alignment to a multiple FASTA?
Thanks in advance for your help!