Question: Finding isoform sequences for Differentially spliced genes
0
gravatar for kandoigaurav
4.6 years ago by
kandoigaurav120
United States
kandoigaurav120 wrote:

I've performed RNA-Seq analysis for A. thaliana using the TopHat/Cufflinks pipeline for 2 conditions. How can I extract the sequences of all isoforms of the genes which are differentially alternatively spliced b/w the two conditions? (Significant genes as output by cuffdiff in the splicing.diff file)

While browsing through the splicing.diff output file, I found that the coordinates for genes mentioned in the file are nowhere to be found in the transcript.gtf file produced by Cufflinks.

Sample splicing.diff file:

TSS1 XLOC_000001 AT1G01010 Chr1:3630-5899 WT20 WT37 NOTEST 0 0 0 0 1 1 yes
TSS10 XLOC_000007 AT1G01180 Chr1:75582-76758 WT20 WT37 NOTEST 0 0 0 0 1 1 yes

 

Sample transcript.gtf file:

Chr1 Cufflinks transcript 11903 12897 1000 - . gene_id "WT_37.1"; transcript_id "WT_37.1.1"; FPKM "0.6185344748"; frac "1.000000"; conf_lo "0.341917"; conf_hi "0.895152"; cov "2.561616";
Chr1 Cufflinks exon 11903 12897 1000 - . gene_id "WT_37.1"; transcript_id "WT_37.1.1"; exon_number "1"; FPKM "0.6185344748"; frac "1.000000"; conf_lo "0.341917"; conf_hi "0.895152"; cov "2.561616";

 

As you can see, the gene_id in the transcript.gtf file is also not same as that in the splicing.diff file, which further complicates extraction. The coordinates in splicing.diff files are usually not present in the transcript.gtf file. Either of the start or end coordinates is few always ~100nt apart while the other end is much further away. 


I've tried gffreads, but it only extracts sequences for the transcript.gtf file for both conditions and not differentially spliced b/w the two conditions. The GeneID's in gtf and splicing.diff files are different, so it gets more confusing.

ADD COMMENTlink modified 4.6 years ago by geek_y10k • written 4.6 years ago by kandoigaurav120

convert the splicing.diff to a bed format and use bedtools getfasta ?

ADD REPLYlink written 4.6 years ago by geek_y10k
1

Thanks for the suggestion. I can do that, but it'll give me the entire gene sequence for all differentially spliced genes in splicing.diff and not all the isoforms of these genes per se. Any way to get the isoforms from both conditions for these differentially spliced genes?

ADD REPLYlink written 4.6 years ago by kandoigaurav120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 798 users visited in the last hour