Repeated Coordinate Errors (>171823) in NC_007605.1 (EBV) GTF Conversion (LMP2A/2B Exons)
1
0
Entering edit mode
1 day ago
halo22 ▴ 300

I am trying to build a custom reference package combining the human genome (GRCh38/hg38) and the Epstein-Barr Virus (EBV) genome (NC_007605.1) for scRNA-seq analysis using a common reference builder (e.g., a version of mkref).The EBV annotation GTF was created by converting the official NCBI, GFF3 file for NC_007605.1 (length 171823 bp).

The mkref tool consistently fails due to invalid, out-of-bounds coordinates in the EBV portion of the GTF, likely stemming from the linearization of circular features (specifically those spanning the Terminal Repeats, TRs).Specific Errors EncounteredThe tool throws an error because the start or end position of a feature exceeds the contig length 171823. I have had to manually fix multiple sets of coordinates, which is not ideal.

Here are a few problematic coordinates I have found and manually fixed so far, all related to the essential LMP-2A and LMP-2B transcripts: 1) LMP-2A/2B (Set1) starts 177231 and end 177679. 2) LMP-2A/2B (Set2) starts 171881 and end 172095.

Question for the Community

  1. Does anyone have a known, pre-validated GTF file for NC_007605.1 that is known to work with standard linear reference builders (i.e., one where all features spanning the origin have been correctly split or truncated)?

2) Alternatively, if you have successfully created this reference using an NCBI GFF3, which GFF3-to-GTF conversion tool (and specific parameters) did you use to handle the wrap-around coordinates correctly?

3) Are there any other known out-of-bounds coordinates in this NCBI GFF3/GTF file that I should manually check for e.g., coordinates in the 172000s or 177000s before running the process a third time?

My current temporary fix is to substitute all invalid coordinates with 171823, but I'd like a more robust and accurate reference file.Thank you for any insight or link to a validated file you can provide!

cellranger • 179 views
ADD COMMENT
1
Entering edit mode
1 day ago
GenoMax 154k

There is a GTF file available (no conversion needed) here: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/402/265/GCF_002402265.1_ASM240226v1/

Have you tried it? Just to be careful get the genome version from there as well.

ADD COMMENT
0
Entering edit mode

Thanks for sharing this. I was able to get makeref to work.

ADD REPLY

Login before adding your answer.

Traffic: 3848 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6