Hi all,
I am working on a reference-guided de novo transcriptome assembly using Cufflinks and am having a problem with incongruous scaffold coordinates and feature coordinates for a specific scaffold, which is hampering the downstream pipeline.
Here is a brief description of my pipeline:
I used STAR aligner to index the genome and align trimmed reads to the genome. I was unable to get the --sjdbGTFtagExonParentTranscript Parent
flag to run properly for my .gff3 file, so I used AGAT to convert the gff3 to gtf before running the alignments. For the alignments, I also used --alignSoftClipAtReferenceEnds No
flag to ensure the outputted BAM files were compatible with Cufflinks. Following alignments, the BAM files were indexed used Bamtools. Cufflinks was then performed on the .bam.fai files using the -g
flag to reference the original gtf file. However, when I run cuffmerge, I get the following error:
Error (GFaSeqGet): end coordinate (98460) cannot be larger than sequence length 94244 Error (GFaSeqGet): subsequence cannot be larger than 107857 Error getting subseq for 574542 (3448..134221)! [FAILED] Error: could not execute cuffcompare
Looking at the indexed genome, I determined which scaffold is of length 94244 and which transcripts have features being called to 98460. Both the original, published gff3 file and my de novo transcripts.gtf from Cufflinks are showing coordinates for transcripts beyond the 94244.
I'm really at a loss at to how to move forward with troubleshooting this problem, and I would be so grateful for any recommendations.
Does anyone have suggestions?
Thanks in advance!