mapping RNAseq reads to a genbank isoforme that is not in Reference Annotation
0
0
Entering edit mode
8.2 years ago

Hi, I'm trying to quantify different isoforms of a gene using RNAseq data. I have some bam files and I've tried assembling transcripts by cufflinks -G option using reference annotation (downloaded from UCSC). but one of the isoformes is not annotated in reference annotation that I have (maybe because its new!) and I can quantify it. however I can see this isoform in human mRNA from genbank track in UCSC. so is there any other reference annotation that I can use that may contain this new isoform? or all available reference are the same?

i have to mention that I've tried de novo assembling using -g option and the simple cufflinks command but it couldnt find the whole isoform, instead it predicted the extra exon(presented in this isoform) solely as a distinct transcript. so what is the possible remedy to this problem.

RNA-Seq sequence cufflinks new isoform in rnaseq • 3.2k views
ADD COMMENT
2
Entering edit mode

How about adding your new isoform annotation to the UCSC reference annotation?

I guess your reference may be a GTF or GFF3 file so you can add in command line.

ADD REPLY
0
Entering edit mode

Yes my reference is a gtf file but I'm quite new to this area. How can I add an isoform to my reference annotation? Could you please suggest me a paper or discussion on how to do that.

ADD REPLY
1
Entering edit mode

First, you create another gtf file(isoform.gtf) of your new isoform. The contents are your isoform.GTF format is described at

The line MUST be tab-separated, and chromosome name ( column-1 ) MUST be "1" not "chr1". If you prepare above new gtf file,

cat isoform.gtf UCSC_annotation.gtf > new.gtf

in your command line.(Linux or OS X)

ADD REPLY
0
Entering edit mode

thank you very much, i added 3 overlapping isoforms into my reference annotations file. I ran cufflinks to quantify them. it seems that only one of the added isoforms have been taken to analysis by cufflinks..because for 2 isoforms RPKM is 0.0000 . could it be because they share exons and cufflinks somehow ignore some reads that mapped to those exon for 2 out of 3 similar isoforms?

ADD REPLY
0
Entering edit mode

That is possible, but I cannot say the exact thing about Cufflinks isoform assembly.

So I think you should confirm it by reading Cufflinks original paper. I'm sorry that I couldn't be of any help to you.

ADD REPLY
1
Entering edit mode

Hi,

Just playing devil's advocate here. If you load your RNA-seq BAM in IGV and go to your gene and then make a Sashimi plot (click the left hand side panel which shows the name of the BAM loaded and from the drop down menu choose 'Sashimi plot').

You should see something like this -

image: sashimi plot screenshot

Basically the exons should have connecting bridges/loops which means reads split-aligned and hence supporting the transcript structure/ isoform. Do you see your novel/ extra exon with a loop connecting it to the preceding exon?

If not, then it might be the reason why Cufflinks fails to generate the desired isoform but instead generates a single exon transcript.

I might be wrong but its worth checking.

ADD REPLY
0
Entering edit mode

Thanks for the tip. I checked and there is no loop. In other words, there is no junction read that covers these two exons. I did not map the reads myself. I started my work on bam files. Is there any possibility that these junction reads are discarded in previous processing step? By the way this new exon is 10 kb up stream of the conventional first exon. Could this large distance affect cufflinks ability to generate the novel isoform?

ADD REPLY
1
Entering edit mode

There is a parameter regulating the max. intron length in splice-aware aligners. But like in TopHat by deafult its 500kb and in STAR also to nearly same tune. So 10kb is definitely in safe zone unless deafult parameters were altered. The other thing to check is whether the novel exon overlaps a repeat region. You can import RepeatMasker track in IGV or you could visualize your gene on the UCSC Genome Browser and turn on the necessary track. If it is so then an imbalanced coverage (more or lesser reads aligining on the novel exon of repeat nature) maybe leading to not creating the isoform.

It could also be that your novel region is transcribed but not as part of mRNA isoforms. Also take help of public RNA-seq data (BodyMap, Encode) to check if there are known cases lurking somewhere. Really hand-waiving here

ADD REPLY
0
Entering edit mode

well Amitm this new exon transcribes from a LTR element, and as you said the read coverage on this repeat elements is really higher than other exons. I think I have to try to find a way for refining reads alignments into this repeat element first.

ADD REPLY
0
Entering edit mode

Have you checked Ensembl annotation?

ADD REPLY
0
Entering edit mode

Yeah! But the isoform is not included. It seems that Ensembl is quite like Refseq.

ADD REPLY
1
Entering edit mode

ok, have you checked the origin of the Genbank track, is it from an experiment, or does it have a release number?

ADD REPLY
0
Entering edit mode

thanks Andrew, well it dose have an id number but it comes from an experiment because in information page some researcher are listed as authors however there is no link to a publication.

ADD REPLY
0
Entering edit mode

Then likely it's an isoform that hasn't been proven, or predicted yet outside of that experiment. If it's not predicted computationally, or experimentally by Havana, then I'd consider it to be novel to a degree. I guess if you could confirm the transcript through qPCR and it's predicted in the Genbank track, then you'd have a good amount of evidence to suggest it's a real transcript variant.

ADD REPLY

Login before adding your answer.

Traffic: 3107 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6