I am trying to generate my own GTF using cellranger mkref could not get right file format...
The header for the two files I am trying to use are as following:
The FASTA reads like this:
SRR8424011.1.1 CB15KANXX170420:1:1101:10000:40548 length=101 GTTCTCTTGTTTTACATTAATAAGAAATATACTGTGACTCCTAGAGCTATGTTCATTCATATTTGTAACTGCTACATGTCTGTTGGATTTTCCTTCATCTA
SRR8424011.1.2 CB15KANXX170420:1:1101:10000:90445 length=101 GATAAGTACAAAGAATGAATAAGGTGGACAGGAAAGTGAAGAGTGTGGGATGGTTAGGGGCTTTAAGGACTTCCCAGGAAAATAGGATTCTGGGATGGGGT
The GTF read like this:
chr1 StringTie exon 91827 92344 . + . transcript_id "MSTRG.2.1"; gene_id "ZNF692"; gene_name "ZNF692"; exon_number "1" chr1 StringTie exon 107447 107478 . + . transcript_id "MSTRG.2.1"; gene_id "ZNF692"; gene_name "ZNF692"; exon_number "2"
The program failed and I suspected two reasons:
1) The fasta is not a genome FASTA so it really won't work. I tried to fix this problem by using ensemble Genome
1 dna_sm:chromosome chromosome:Macaca_fascicularis_5.0:1:1:227556264:1 REF aaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaacactaaccct aaccctaaccctaaccctaacccgaacccgaacccgaaccctaaccctaacccctaaccc ctaaccctaaccctaaccctaaccctaacccgaaccctaatccctaaccctaaccctaac....
and delete the "chr" from chr1 chr2 etc to match the format.
Still I got error, and I feel like the reason is that the GTF I got missing several categories that requires by cellranger (or STAR). If this is true, how can I put these info in?
2) Using emsemble Genome also feel problematic because this may not match the version of the GTF? Also what is the point of providing the RNA-seq results for this paper?
The question seems entangled... Simply put, I am hoping to run cellranger mkref with the two files listed above... Any suggestions/thoughts/ideas are deeply and sincerely apprecited!