Dear Friends,
I am using mummer to align my assembled query phage with the reference genomes (more than 1 obtained from blast hit). In this case the reference genome sequence ins 1 and query sequence is 1. I would really appreciate your suggestions on this error from Mummer:
When I run:
./mapview ref_qry.coords -n 1 -Ir KP010413_1.gff
I get this error:
ERROR in the input files ! The reference seq ID can't be found in GFF files !
The first column in the GFF file should be the ID of the reference seq.
The alignments file should provide the same info in the column before the last one.
Here are some example records for the GFF file:
gnl|FlyBase|X Dmel3 initial-exon 2155 2413 . - . X_CG3038.1
gnl|FlyBase|X Dmel3 last-exon 1182 2077 . - . X_CG3038.1
...
The fields are :
<seq_ID> <source> <exon type> <start> <end> <score> <strand> <frame> <gene_name>
"ref_qry.coords" file looks like this: /
daniel/KP010413.1-genome-sequence.fasta daniel/Largest-contig.fasta
NUCMER
[S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] | [LEN R] [LEN Q] | [COV R] [COV Q] | [TAGS]
===============================================================================================================================
1 5324 | 60432 55108 | 5324 5325 | 94.69 | 87510 88315 | 6.08 6.03 | KP010413_1 X_cov_48.544790
7042 7868 | 55142 54316 | 827 827 | 96.74 | 87510 88315 | 0.95 0.94 | KP010413_1 Y_cov_48.544790
And, "KP010413_1.gff" looks like this:
KP010413_1 Genbank region 1 87510 . + . ID=KP010413_1:1..87510;Dbxref=taxon:1567025;collection-date=10-Oct-2013;country=China;gbkey=Src;genome=genomic;isolation-source=sewage;mol_type=genomic DNA;nat-host=Salmonella enterica serovar Pullorum 96116;old-name=Salmonella phage HB-2014
KP010413_1 Genbank gene 1 2292 . + . ID=gene-HB2014_1;Name=HB2014_1;gbkey=Gene;gene_biotype=protein_coding;locus_tag=HB2014_1
KP010413_1 Genbank CDS 1 2292 . + 0 ID=cds-AJT60577.1;Parent=gene-HB2014_1;Dbxref=NCBI_GP:AJT60577.1;Name=AJT60577.1;Note=ORF1;gbkey=CDS;locus_tag=HB2014_1;product=putative rIIA;protein_id=AJT60577.1;transl_table=11
KP010413_1 Genbank gene 2372 3481 . + . ID=gene-HB2014_2;Name=HB2014_2;gbkey=Gene;gene_biotype=protein_coding;locus_tag=HB2014_2
KP010413_1 Genbank CDS 2372 3481 . + 0 ID=cds-AJT60578.1;Parent=gene-HB2014_2;Dbxref=NCBI_GP:AJT60578.1;Name=AJT60578.1;Note=ORF2;gbkey=CDS;locus_tag=HB2014_2;product=putative rIIB;protein_id=AJT60578.1;transl_table=11
Could you please let me know what is the error here? Thank you for your time.
And, I would be very thankful if you could let me know how to align more than one reference genome to a query sequence using MUMMER?
DK
Hello @DanielC! I am dealing with the exact same error. Could you find a solution for this?