6.6 years ago
amoltej ▴ 90

Hello there,

I am new for this field. Recently I did differential analysis experiment. I used DEseq and at the end I got gene names which are differentially expressed in different tissues. because the list is big with number of transcripts, I would like to extract all the  transcript sequences in fasta format using gft (or gff3) file and genome scaffold file. This is not a model organism. and I have made this gtf file using scipio program.

Amol

6.6 years ago
6.6 years ago
David Fredman

The gffread utility in the Cufflinks package will extract transcript fasta given a gtf/gff and reference (genome) fasta file. For all the options:

gffread -h

To get only the DE transcripts, either subset the gff/gtf or, perhaps more straightforward, subset the fasta file (see Extracting Multiple Fasta Sequences At A Time From A File Containing Many Sequences for multiple ways of doing that)

Thank you so much for quick reply. I tried that but could not get anything. I dont know if I am doing anything worng. can you please provide me actual command?

thank you

gffread your_transcripts.gff -g genomic_reference.fasta -w your_transcripts.fasta​

Make sure that the chromosome/scaffold ids are the same in gff and genomic reference (capitals, underscores etc).

I was doing same... but it doesn't work!

that's odd. assuming that the chromosome names were correct, then the only reason I could think of would be a gff format that gffread does not understand..

Either try to validate your gff, or try a different tool. Perhaps bedtools will be more forgiving