Question: matching downloaded FASTA and GTF
0
gravatar for jorja
12 days ago by
jorja0
jorja0 wrote:

Hello!

I am trying to generate my own GTF using cellranger mkref could not get right file format...

The header for the two files I am trying to use are as following:

The FASTA reads like this:

SRR8424011.1.1 CB15KANXX170420:1:1101:10000:40548 length=101 GTTCTCTTGTTTTACATTAATAAGAAATATACTGTGACTCCTAGAGCTATGTTCATTCATATTTGTAACTGCTACATGTCTGTTGGATTTTCCTTCATCTA

SRR8424011.1.2 CB15KANXX170420:1:1101:10000:90445 length=101 GATAAGTACAAAGAATGAATAAGGTGGACAGGAAAGTGAAGAGTGTGGGATGGTTAGGGGCTTTAAGGACTTCCCAGGAAAATAGGATTCTGGGATGGGGT

The GTF read like this:

chr1 StringTie exon 91827 92344 . + . transcript_id "MSTRG.2.1"; gene_id "ZNF692"; gene_name "ZNF692"; exon_number "1" chr1 StringTie exon 107447 107478 . + . transcript_id "MSTRG.2.1"; gene_id "ZNF692"; gene_name "ZNF692"; exon_number "2"

The program failed and I suspected two reasons:

1) The fasta is not a genome FASTA so it really won't work. I tried to fix this problem by using ensemble Genome

1 dna_sm:chromosome chromosome:Macaca_fascicularis_5.0:1:1:227556264:1 REF aaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaacactaaccct aaccctaaccctaaccctaacccgaacccgaacccgaaccctaaccctaacccctaaccc ctaaccctaaccctaaccctaaccctaacccgaaccctaatccctaaccctaaccctaac....

and delete the "chr" from chr1 chr2 etc to match the format.

Still I got error, and I feel like the reason is that the GTF I got missing several categories that requires by cellranger (or STAR). If this is true, how can I put these info in?

2) Using emsemble Genome also feel problematic because this may not match the version of the GTF? Also what is the point of providing the RNA-seq results for this paper?

The question seems entangled... Simply put, I am hoping to run cellranger mkref with the two files listed above... Any suggestions/thoughts/ideas are deeply and sincerely apprecited!

Thank you!

Georgia

rna-seq software error • 82 views
ADD COMMENTlink written 12 days ago by jorja0

Have you checked the example that 10x provides to build a custom reference? Don't mix and match the annotations and reference sequence. That is asking for trouble.

ADD REPLYlink modified 12 days ago • written 12 days ago by genomax91k

genomax, thank you for your reply!

Yes I did read the example multiple times. But as far as I can tell, it does not answer my question...

I have a gtf provided by a paper. The GTF contains chr and location for each gene. But the paper does not specify which genome reference sequence version to align to. Instead, it gave a bunch of sequence. These sequence did not contain chr or location information for GTF to align to... (please see my earlier example)

This make me feel like it is mission impossible.

ADD REPLYlink written 11 days ago by jorja0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1633 users visited in the last hour