Question: mapping RNAseq reads to a genbank isoforme that is not in Reference Annotation
0
gravatar for fta.mirzadeh
3.7 years ago by
fta.mirzadeh0 wrote:

Hi, I'm trying to quantify different isoforms of a gene using RNAseq data. I have some bam files and I've tried assembling transcripts by cufflinks -G option using reference annotation (downloaded from UCSC). but one of the isoformes is not annotated in reference annotation that I have (maybe because its new!) and I can quantify it. however I can see this isoform in human mRNA from genbank track in UCSC. so is there any other reference annotation that I can use that may contain this new isoform? or all available reference are the same?

i have to mention that I've tried de novo assembling using -g option and the simple cufflinks command but it couldnt find the whole isoform, instead it predicted the extra exon(presented in this isoform) solely as a distinct transcript. so what is the possible remedy to this problem.

ADD COMMENTlink written 3.7 years ago by fta.mirzadeh0
2

How about adding your new isoform annotation to the UCSC reference annotation?

I guess your reference may be a GTF or GFF3 file so you can add in command line.

ADD REPLYlink written 3.7 years ago by cfarmeri150

yes my reference is a gtf file..but I'm quite new to this area..how can I add an isoform to my reference annotation? could you please suggest me a paper or discussion on how to do that.
 

ADD REPLYlink written 3.7 years ago by fta.mirzadeh0
1

First, you creat another gtf file(isoform.gtf) of your new isoform.
The contents are your isoform.GTF format is described at

https://genome.ucsc.edu/FAQ/FAQformat.html#format4
http://www.gencodegenes.org/gencodeformat.html

The line MUST be tab-separated, and chromosome name ( column-1 ) MUST be "1" not "chr1".
If you prepare above new gtf file, 

- - - - - - -
cat isoform.gtf UCSC_annotation.gtf > new.gtf
- - - - - - -

in your command line.(Linux or OS X)

ADD REPLYlink written 3.7 years ago by cfarmeri150

thank you very much, i added 3 overlapping isoforms into my reference annotations file. I ran cufflinks to quantify them. it seems that only one of the added isoforms have been taken to analysis by cufflinks..because for 2 isoforms RPKM is 0.0000 . could it be because they share exons and cufflinks somehow ignore some reads that mapped to those exon for 2 out of 3 similar isoforms?

ADD REPLYlink written 3.7 years ago by fta.mirzadeh0

That i possible, but I can not say the exact thing about Cufflinks isoform assembly.

So I think you should confirm it by reading Cufflinks original paper.I'm sorry that I couldn't be of any help to you.

ADD REPLYlink written 3.7 years ago by cfarmeri150
1

hi,

Just playing devil's advocate here. If you load your RNA-seq BAM in IGV and go to your gene and then make a Sashimi plot (click the left hand side panel which shows the name of the BAM loaded and from the drop down menu choose 'Sashimi plot').

You should see something like this -

Basically the exons should have connecting bridges/loops which means reads split-aligned and hence supporting the transcript structure/ isoform. Do you see your novel/ extra exon with a loop connecting it to the preceding exon?

If not, then it might be the reason why Cufflinks fails to generate the desired isoform but instead generates a single exon transcript.

I might be wrong but its worth checking.

ADD REPLYlink written 3.7 years ago by Amitm1.6k

thanks for the tip..i checked and there is no loop. in other words there is no junction read that covers these two exons. I did not mapped the reads my self. I started my work on bam files. is there any possibility that these junction reads are discarded in previous processing step? by the way this new exon is 10 kb up stream of the conventional first exon..could this large distance affect cufflinks ability to generate the novel isoform?

ADD REPLYlink written 3.7 years ago by fta.mirzadeh0
1

There is a parameter regulating the max. intron length in splice-aware aligners. But like in TopHat by deafult its 500kb and in STAR also to nearly same tune. So 10kb is definitely in safe zone unless deafult parameters were altered. The other thing to check is whether the novel exon overlaps a repeat region. You can import RepeatMasker track in IGV or you could visualize your gene on the UCSC Genome Browser and turn on the necessary track. If it is so then an imbalanced coverage (more or lesser reads aligining on the novel exon of repeat nature) maybe leading to not creating the isoform.

It could also be that your novel region is transcribed but not as part of mRNA isoforms. Also take help of public RNA-seq data (BodyMap, Encode) to check if there are known cases lurking somewhere. Really hand-waiving here

ADD REPLYlink written 3.7 years ago by Amitm1.6k

well Amitm this new exon transcribes from a LTR element, and as you said the read coverage on this repeat elements is really higher than other exons. I think I have to try to find a way for refining reads alignments into this repeat element first.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by fta.mirzadeh0

Have you checked Ensembl annotation?

ADD REPLYlink written 3.7 years ago by andrew.j.skelton735.8k

yeah! but the isoform is not included. it seems that Ensembl is quite like Refseq.

 

ADD REPLYlink written 3.7 years ago by fta.mirzadeh0
1

ok, have you checked the origin of the Genbank track, is it from an experiment, or does it have a release number?

ADD REPLYlink written 3.7 years ago by andrew.j.skelton735.8k

thanks Andrew, well it dose have an id number but it comes from an experiment because in information page some researcher are listed as authors however there is no link to a publication.

ADD REPLYlink written 3.7 years ago by fta.mirzadeh0

Then likely it's an isoform that hasn't been proven, or predicted yet outside of that experiment. If it's not predicted computationally, or experimentally by Havana, then I'd consider it to be novel to a degree. I guess if you could confirm the transcript through qPCR and it's predicted in the Genbank track, then you'd have a good amount of evidence to suggest it's a real transcript variant. 

ADD REPLYlink written 3.7 years ago by andrew.j.skelton735.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1645 users visited in the last hour