Question: genes in gtf do not exist in the merged.gtf after cuffmerge
0
gravatar for aleka
4.0 years ago by
aleka100
United Kingdom
aleka100 wrote:

I use cufflinks pipeline to perform RNA Seq. I have some genes in gtf but after I perform cuffmerge, there are some genes missing. 

Is that normal? 

Why?

 

rna-seq alignment next-gen • 1.4k views
ADD COMMENTlink modified 4.0 years ago by Devon Ryan92k • written 4.0 years ago by aleka100
0
gravatar for aleka
4.0 years ago by
aleka100
United Kingdom
aleka100 wrote:

I found it. There were some genes that in the merged.gtf file were with their gene ID and some other with their gene name.

However, I saw that different genes sometimes correspond to the same merged_id in the merged.gtf file. Is that normal?

ADD COMMENTlink written 4.0 years ago by aleka100

Yes, the merging of genes seems to happen pretty frequently. If you have nearby genes they can sometimes get merged together (possibly correctly, though!).

ADD REPLYlink written 4.0 years ago by Devon Ryan92k

is there a specific reason that they merge? Just because they are close to each other?

What I have is 3 different genes, that have the same assembled merged gene ID but have different transcripts.

I was expecting that since cuffmerge merges the gene IDs, it would merge the transcripts as well, and this could have been the reason that it merges the genes together. But in my case I get different assembled merged transcript IDs.

Any idea why?

ADD REPLYlink written 4.0 years ago by aleka100

It's likely that the assembled transcripts between the different genes overlap a bit, which would result in merging the genes. I tend to see this more often when the 3' end gets extended compared to what's annotated and genes are tail to tail (and an unstranded library is being used, though a head to tail gene configuration can show the same thing).

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Devon Ryan92k

hi. thanks for your reply. very useful.  Indeed I checked in the isoforms file and it seems that the three transcripts that have the same assembled gene ID have exactly the same position on the chromosome but their length differs, which I suppose means that they are somehow overlapping. is there any way to see if the 3'  end gets extended to what is annotated?

ADD REPLYlink written 4.0 years ago by aleka100

You can usually see that visually with IGV.

ADD REPLYlink written 4.0 years ago by Devon Ryan92k

Hi, I was just wondering whether you know if the genes are being merged together because the CDS coordinates among the genes overlap or because the reads overlap, which results in the actual transcript overlap. In my case the coordinates of the genes overlap and also some reads among the genes overlap, so I am not sure on what the merge is based (coordinates or reads).

ADD REPLYlink written 4.0 years ago by aleka100

The CDS doesn't need to overlap for the genes to be merged.

ADD REPLYlink written 4.0 years ago by Devon Ryan92k

so it is mainly based on the overlap of the reads among the genes. Is that right? 

ADD REPLYlink written 4.0 years ago by aleka100

Correct       

ADD REPLYlink written 4.0 years ago by Devon Ryan92k

great. thanks

ADD REPLYlink written 4.0 years ago by aleka100

Hi. I was wondering if there is any way to find the coordinates of the transcripts that overlap. In the merged.gtf file there are only the coordinates of the genes. I didn't find them somewhere in my files, but maybe I miss something.

ADD REPLYlink written 4.0 years ago by aleka100

No clue, you might need to manually write something to do that.

ADD REPLYlink written 4.0 years ago by Devon Ryan92k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1335 users visited in the last hour