Question: How to extract fasta sequences from assembled transcripts generated by Stringtie
3.7 years ago by
seta1.4k wrote:

Hi all,

I used STAR and stringtie for mapping reads to reference genome and assembly. As you know, the generated assembled transcripts by stringtie are in gtf format. Now, I want to have fasta sequence of assembled transcript. I used gffread, but all sequences had the same header! maybe it's not compatible with stringtie. Could you please help me out to convert assembled transcripts by stringtie in gtf format to fasta format?


3.7 years ago

use gffread, you can find it in cufflink package

16 months ago by dukecomeback40
3.3 years ago by
zzqr40 wrote:

The stringtie_merged.gtf file have seqname, start, end strand info. So, you can use R GRanges object and getSeq function from GenomicRanges and BSgenome packages to retrive sequences.

3.3 years ago by zzqr40
3.3 years ago by
lakhujanivijay5.3k wrote:

You can also use bedtools getfasta to fetch sequences from GTF or BED files.


Here is the perfect solution

3.3 years ago by lakhujanivijay5.3k

I used this, but I run into the following error

"Error (GFaSeqGet): subsequence cannot be larger than 465 Error getting subseq for gene1 (465..1503)!"

Did you had any issues using gffread?


2.3 years ago by spriyar10

There is a Python script that fixes this error, you can follow A: gffread error when extracting transcript sequences from gtf, coordinates exceed

4 months ago by Cristina Zamora0
16 months ago by
Juke344.9k wrote:

I use from AGAT. --cdna --gff input.gtf --fasta genome.fa -o output.fa

16 months ago by Juke344.9k
