Hello all,
I am new to using bioperl and trying to extract the mRNA transcript for every gene from a reference fasta and gff file. I have used the Bioperl script provided in a similar query Extract Cds Fastas From A Gff Annotation + Reference Sequence (by user @severin) and this script works great for reference genome of mouse and rat. I am interested in doing a three species comparison where the two species are reference mouse (Mus_musculus_GRMCM38.91) and Rattus norvegicus (Rattus_norvegicus.Rnor_5.0.71.dna_sm.toplevel.fa). my third species is whole genome sequence of black rat (Rattus rattus) that we have sequenced in our lab. This genome of the black rat was mapped to the genome of the Rattus norvegicus (Rattus_norvegicus.Rnor_5.0.71.dna_sm.toplevel.fa) and I want to extract similar to the above mRNA transcripts from black rat genome assembly. I am using the gff3 file for Rattus norvegicus as it was mapped on this. However, using the bioperl script for this black rat genome I am not getting the same output as I get for the reference mouse and rat genomes. I have checked my genome sequence using samtools. Extracting the exon coordinates using samtools gives me the right sequence from the black rat. I am thinking this has to do with index file generated by the bioperl program. But I am not sure. I would really appreciate some help in figuring this out. Please let me know what files would help. Thank you.