Question: Relative transcript expression
0
gravatar for Govardhan Anande
16 months ago by
Australia
Govardhan Anande130 wrote:

Hi,

I have been trying to understand relative expression of two transcripts from a gene.

Let's, say I have a gene with 6 exon and it produces two transcripts: isoform 1 with all 6 exons. and isoform 2 with exon 1, 2, 3, 4 & 6.

I have bam files STAR and I don't want to do alignment again so I would really appreciate if anyone can suggest tool that will quantify these two isoforms.

Thanks in advance.

ADD COMMENTlink modified 16 months ago by WouterDeCoster41k • written 16 months ago by Govardhan Anande130

Try miso as well.

ADD REPLYlink modified 16 months ago • written 16 months ago by cpad011212k

I have MISO results and as you know miso only consider alternative exon along with upstream and downstream exons but not entire transcript.

ADD REPLYlink written 16 months ago by Govardhan Anande130
1
gravatar for lakhujanivijay
16 months ago by
lakhujanivijay4.5k
India
lakhujanivijay4.5k wrote:

Hi Govardhan

Basically, there are 2 steps

  1. The identification of the transcripts.
  2. Estimating the "relative" abundance of those transcripts in your sample.

When you say you have already have isoforms in hand, I believe that you are already done with the step#1.

So, if you have the bam files and the corresponding reference genome in hand, you can run stringtie to estimate the abundances (step#2)

In case if you are not yet done with step#1 then you will have to run stringtie 2 times as described below

  • first time with the bam files and the reference file to perform a "reference guided" transcriptome assembly.
  • taking the consensus set of transcripts from all samples as reference, you will have to estimate their abundance.

By abundance, I mean the FPKM or TPM values (or your favourite metric) which stringtie will generate for you.

NOTE: StringTie is part of the new tuxedo protocol.

ADD COMMENTlink written 16 months ago by lakhujanivijay4.5k

Hi Vijay,

Thank you.

Yes, I have identified the transcripts and I have generated GTF file of two transcripts. Now I am trying to get the relative abundance but I getting "Error: could not any valid reference transcripts in Demo.gtf (invalid GTF/GFF file?)?

My gtf looks like : chrX protein_coding exon XXX507 XXX637 . + . gene_id "geneX"; transcript_id "isoX"; gene_name "geneX"; chrX protein_coding CDS XXX507 XXX637 . + . gene_id "geneX"; transcript_id "isoX"; gene_name "geneX"; chrX protein_coding exon XXX612 XXX724 . + . gene_id "geneX"; transcript_id "isoX"; gene_name "geneX"; chrX protein_coding CDS XXX612 XXX724 . + . gene_id "geneX"; transcript_id "isoX"; gene_name "geneX";

ADD REPLYlink modified 16 months ago • written 16 months ago by Govardhan Anande130

Share the exact command for

  • mapping

  • and for this step (abundance)

ADD REPLYlink modified 16 months ago • written 16 months ago by lakhujanivijay4.5k

Alignment

STAR --runMode alignReads --outSAMtype BAM SortedByCoordinate --runThreadN 10 --genomeDir $FastaIndex --readFilesIn $R1 $R2

I just started with basic one for abundance

~/stringtie-1.3.4d.Linux_x86_64/stringtie Aligned.sortedByCoord.out.bam -G Demo.gtf
ADD REPLYlink written 16 months ago by Govardhan Anande130

What is the output of this?

stringtie -G reference.gtf -o out.gtf sample.sorted.bam

reference.gtf = GTF file for the corresponding reference genome you are using

out.gtf = stringtie will generate for you

sample.sorted.bam = coordinate sorted bam file

This step is the assembly step. The out.gtf will have the information of the assembled transcripts.

Once you are done with this, the next step is abundance which I ll share later

ADD REPLYlink modified 16 months ago • written 16 months ago by lakhujanivijay4.5k

Why do I need to use reference GTF when I can use gtf of two transcripts??

Is it something StrinTie requires?? and output of above command is GTF i.e. chrM StringTie transcript 1 16571 1000 . . gene_id "STRG.1"; transcript_id "STRG.1.1"; cov "20872.708984"; chrM StringTie exon 1 16571 1000 . . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "1"; cov "20872.708984" ;

ADD REPLYlink modified 16 months ago • written 16 months ago by Govardhan Anande130
1
gravatar for WouterDeCoster
16 months ago by
Belgium
WouterDeCoster41k wrote:

For quantification of transcripts you could also look at fast alignment-free approaches such as Salmon.

ADD COMMENTlink modified 16 months ago • written 16 months ago by WouterDeCoster41k
1

It's also worth adding here, that Salmon needs a tool called Wasabi, to make the output into a h5 structure, ready for differential isoform modelling in Sleuth

ADD REPLYlink written 16 months ago by andrew.j.skelton735.8k
0
gravatar for lakhujanivijay
16 months ago by
lakhujanivijay4.5k
India
lakhujanivijay4.5k wrote:

Why do I need to use reference GTF when I can use gtf of two transcripts??

A reference is required when you are performing "reference guided assembly". Information of the genomic features will be utilized from the reference GTF file. Are you trying to do a de novo assembly?

Is it something StrinTie requires??

Its optional, stringtie can perform denovo assembly.

ADD COMMENTlink written 16 months ago by lakhujanivijay4.5k

I am working on Human samples, so I just need expression of each transcripts from one gene.

Thanks, Govardhan

ADD REPLYlink written 16 months ago by Govardhan Anande130

Did you try RSEM?

ADD REPLYlink written 16 months ago by lakhujanivijay4.5k

Again, the problem with RSEM is the alignment. I have bam files from STAR and they are not compatible with RSEM and same goes for cufflinks as well. Honestly I can't afford realignment so trying to find way to utilise what I have at the moment. Anyways, thank you for your help and time.

ADD REPLYlink written 16 months ago by Govardhan Anande130

If "time" is the concern, then you can try HISAT2 for alignment, but the call is yours! You're welcome. I ll be glad if you share the final thing that helped.

ADD REPLYlink modified 16 months ago • written 16 months ago by lakhujanivijay4.5k

Govardhan, STAR is compatible with cufflinks. Please paste an error snippet if you get any so that I may help with it

ADD REPLYlink written 16 months ago by Jeffin Rockey1.1k

Jeffin, you are right cufflinks accepts the STAR bam files but results are different. I tried feeding one sample bam from TopHat and STAR. Anyways, I got splicing information from various tool and now I am planning use that value.

I posted this question here because I wish to compare entire transcript expression rather than alternative exon.

ADD REPLYlink written 16 months ago by Govardhan Anande130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1639 users visited in the last hour