cellranger arc genome build error and how to fix it
11 weeks ago

Hi all,

I am trying to build a cellranger arc genome for the canine canFam4 genome build. Everything was going quite smoothly until I got an error

mkref has FAILED
Error building reference package
Invalid gene annotation input: in GTF
records for gene_id RPL10A are not contiguous in the file


so I just did a grep to pull out all the lines that contained "RPL10A" in the GTF file and there was indeed a gap starting at line 3941 and ending 5535 for that gene. Here is where I am confused the RPL10A in 3900s is on chromosome 11 and the RPL10A in the 5530s in on chromosome 12. I have never really built a genome besides following a basic tutorials so I do not know if this is an error and if I should remove one these (how do I decide?) or what it means really. I have never manipulated GTF files before but I need to know what is going on because this does not make sense to me. Thanks

canfam4 gtf cellranger
Where did you get this GTF file? Is it a direct download from a reference genome website such as GENCODE?

This seems to be an entry unique to UCSC - both EnsEMBL and NCBI have the RPL10A gene in chr12, not chr11. In fact, all 4 breeds available on EnsEMBL have the gene on chr12. See screenshot below. The NM_ identifier is also made up by UCSC - there is no NM_001252145_2 in NCBI. I prefer to never use UCSC resource files as EnsEMBL > NCBI > UCSC as far as standardization goes.

I'd recommend going with EnsEMBL's GTF for the generic C. familiaris or for a more specific breed if you'd prefer that (you can navigate to the parent directory and pick a different folder to match your breed requirement).