Hi all,
Any idea why the gene_ids of NCBI's gtf file of T2T human genome assembly have "_1" in the end?
NC_060925.1     BestRefSeq      gene    52979   54612   .       -      .       gene_id "LOC101928626_1"; transcript_id ""; db_xref "GeneID:101928626"; description "uncharacterized LOC101928626"; gbkey "Gene"; gene "LOC101928626"; gene_biotype "lncRNA"; 
NC_060925.1     BestRefSeq      transcript      52979   54612   .       -       .      gene_id "LOC101928626_1"; transcript_id "NR_125957.1"; db_xref "GeneID:101928626"; exception "annotated by transcript or proteomic data"; gbkey "ncRNA"; gene "LOC101928626"; inference "similar to RNA sequence (same species):RefSeq:NR_125957.1"; note "The RefSeq transcript has 2 substitutions, 1 non-frameshifting indel compared to this genomic sequence"; product "uncharacterized LOC101928626"; transcript_biotype "lnc_RNA"; 
NC_060925.1     BestRefSeq      exon    54522   54612   .       -       .       gene_id "LOC101928626_1"; transcript_id "NR_125957.1"; db_xref "GeneID:101928626"; exception "annotated by transcript or proteomic data"; gene "LOC101928626"; inference "similar to RNA sequence (same species):RefSeq:NR_125957.1"; note "The RefSeq transcript has 2 substitutions, 1 non-frameshifting indel compared to this genomic sequence"; product "uncharacterized LOC101928626"; transcript_biotype "lnc_RNA"; exon_number "1"; 
NC_060925.1     BestRefSeq      gene    111940  112877  .       -      .       gene_id "OR4F29_1"; transcript_id ""; db_xref "GeneID:729759"; db_xref "HGNC:HGNC:31275"; description "olfactory receptor family 4 subfamily F member 29"; gbkey "Gene"; gene "OR4F29"; gene_biotype "protein_coding"; gene_synonym "OR7-21"; 
NC_060925.1     BestRefSeq   transcript      111940  112877  .       -       .       gene_id "OR4F29_1"; transcript_id "NM_001005221.2"; db_xref "GeneID:729759"; exception "annotated by transcript or proteomic data"; gbkey "mRNA"; gene "OR4F29"; inference "similar to RNA sequence, mRNA (same species):RefSeq:NM_001005221.2"; note "The RefSeq transcript has 9 substitutions, 1 frameshift compared to this genomic sequence"; product "olfactory receptor family 4 subfamily F member 29"; tag "RefSeq Select"; transcript_biotype "mRNA";
It breaks some analyses for GO enrichment/GSEA. Is it safe just to remove these underscores?
cheers
awesome, thanks!