Question: Isoform transcript sequence reconstruction
gravatar for FreeMindAlex
4.2 years ago by
United Kingdom
FreeMindAlex90 wrote:

Hi All,

I have discovered new transcripts via TUXEDO RNAseq pipeline. After running cuffmerge I have got a GTF file with genomic coordinates (Ensembl build 37) of individual exons of discovered isoforms. I want to verify their existence by PCR approach. Designing isoform - specific primer sets means I need their sequences. Can anybody advise on how to obtain full transcript sequences? Note that I am not worried about getting these sequences directly from RNAseq reads (not sure if that's possible or not; if possible an explanation of how will be appreciated though) but rather want to reconstruct them using GTF annotation and reference genome sequence in high throughput manner. The high throughput aspect is important.


rna-seq sequence • 1.2k views
ADD COMMENTlink modified 4.2 years ago by Devon Ryan90k • written 4.2 years ago by FreeMindAlex90
gravatar for Devon Ryan
4.2 years ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

For making a reference sequence from your dataset, search the forum for "BAM consensus sequence". Common tools for this are bcftools consensus and FastaAlternateReferenceMaker from GATK. However, I'll warn you that going this route is likely to be a total nightmare for you, unless you stick only to SNPs (i.e., just forget about including indels or any sort of complex variant.

For making the actual fasta files, tophat actually comes with a convenient program call gtf_to_fasta. It accepts a GTF file and a reference genome and will make a fasta file containing transcript sequences from it. This is typically used by tophat internally when people give it a reference annotation, but there's no reason you can't use it too. I can't say I've ever looked at the output file, so I can't say how conveniently it names the various transcripts.

If you prefer R, you can use GenomicFeatures with the extractTranscriptSeq() command.

ADD COMMENTlink written 4.2 years ago by Devon Ryan90k

Thanks Devon... that's useful... For R I'm a newbie so I'll try the gtf_to_fasta feature of TopHat instead. Fingers crossed, it's going to work. Thanks!

ADD REPLYlink written 4.2 years ago by FreeMindAlex90
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1747 users visited in the last hour