matching downloaded FASTA and GTF
0
0
Entering edit mode
3.6 years ago
jorja • 0

Hello!

I am trying to generate my own GTF using cellranger mkref could not get right file format...

The header for the two files I am trying to use are as following:

The FASTA reads like this:

SRR8424011.1.1 CB15KANXX170420:1:1101:10000:40548 length=101 GTTCTCTTGTTTTACATTAATAAGAAATATACTGTGACTCCTAGAGCTATGTTCATTCATATTTGTAACTGCTACATGTCTGTTGGATTTTCCTTCATCTA

SRR8424011.1.2 CB15KANXX170420:1:1101:10000:90445 length=101 GATAAGTACAAAGAATGAATAAGGTGGACAGGAAAGTGAAGAGTGTGGGATGGTTAGGGGCTTTAAGGACTTCCCAGGAAAATAGGATTCTGGGATGGGGT

The GTF read like this:

chr1 StringTie exon 91827 92344 . + . transcript_id "MSTRG.2.1"; gene_id "ZNF692"; gene_name "ZNF692"; exon_number "1" chr1 StringTie exon 107447 107478 . + . transcript_id "MSTRG.2.1"; gene_id "ZNF692"; gene_name "ZNF692"; exon_number "2"

The program failed and I suspected two reasons:

1) The fasta is not a genome FASTA so it really won't work. I tried to fix this problem by using ensemble Genome

1 dna_sm:chromosome chromosome:Macaca_fascicularis_5.0:1:1:227556264:1 REF aaccctaaccctaaccctaaccctaaccctaaccctaaccctaaccctaacactaaccct aaccctaaccctaaccctaacccgaacccgaacccgaaccctaaccctaacccctaaccc ctaaccctaaccctaaccctaaccctaacccgaaccctaatccctaaccctaaccctaac....

and delete the "chr" from chr1 chr2 etc to match the format.

Still I got error, and I feel like the reason is that the GTF I got missing several categories that requires by cellranger (or STAR). If this is true, how can I put these info in?

2) Using emsemble Genome also feel problematic because this may not match the version of the GTF? Also what is the point of providing the RNA-seq results for this paper?

The question seems entangled... Simply put, I am hoping to run cellranger mkref with the two files listed above... Any suggestions/thoughts/ideas are deeply and sincerely apprecited!

Thank you!

Georgia

software error RNA-Seq • 811 views
ADD COMMENT
0
Entering edit mode

Have you checked the example that 10x provides to build a custom reference? Don't mix and match the annotations and reference sequence. That is asking for trouble.

ADD REPLY
0
Entering edit mode

genomax, thank you for your reply!

Yes I did read the example multiple times. But as far as I can tell, it does not answer my question...

I have a gtf provided by a paper. The GTF contains chr and location for each gene. But the paper does not specify which genome reference sequence version to align to. Instead, it gave a bunch of sequence. These sequence did not contain chr or location information for GTF to align to... (please see my earlier example)

This make me feel like it is mission impossible.

ADD REPLY

Login before adding your answer.

Traffic: 2466 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6