Question: cuffmerge does not give merged.gtf
0
gravatar for nbhardwaj
4.5 years ago by
nbhardwaj130
United States
nbhardwaj130 wrote:

Hi,

I am running cuffmerge to merge a few gtfs that I have created using cufflinks. This is the command I use:

 

cuffmerge  -o merged -s genome.fa list.txt

 

list.txt contains the list of the gtf files that I want to merge with the relative path.

 

Here is the stdout (no error)

[Fri Oct  3 16:15:22 2014] Beginning transcriptome assembly merge

-------------------------------------------

 

[Fri Oct  3 16:15:22 2014] Preparing output location merged/

Warning: no reference GTF provided!

[Fri Oct  3 16:15:22 2014] Converting GTF files to SAM

[16:15:22] Loading reference annotation.

....

[16:15:24] Loading reference annotation.

[Fri Oct  3 16:15:24 2014] Assembling transcripts

 

 

But there are only these 2 directories in merged:

logs  tmp

 

There is no merged.gtf 

Has anybody seen and solved this error? There is nothing alarming in run.log file either. This is how log file ends:

cufflinks -o merged/ -F 0.05 -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 1 merged/tmp/mergeSam_fileoCJV2Y

 

Any help will be appreciated. 
Thanks.

 

ADD COMMENTlink modified 2.6 years ago by Lindsay4420 • written 4.5 years ago by nbhardwaj130

Hi, please see my follow-up question below.

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by nbhardwaj130

I am facing the same issue. Please help

ADD REPLYlink written 2.7 years ago by naveenlalosharma0

Even if it appears to be the same issue it always helps to post the command line you are using and the exact error you are encountering. There is always something slightly different in each of these cases.

ADD REPLYlink written 2.7 years ago by genomax63k

I am having the same problem- Cuffmerge isn't producing the merged gtf. I don't know what is going wrong. I am pretty sure I inputted everything correctly. What did you do to fix this?

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Lindsay4420

Are you getting an error?

ADD REPLYlink written 2.6 years ago by genomax63k

No, I am not getting an error. It runs for hours and then there is no gtf file- just the run log and empty temp file.

I have another question as well. When do I need to include the reference annotation and the reference sequence fasta? There is that option in Cufflinks too and I do not know when I should input that information.

Thanks.

ADD REPLYlink written 2.6 years ago by Lindsay4420

See this for the answer to your reference question. Do you have a lot of samples? Are the GTF files large?

ADD REPLYlink written 2.6 years ago by genomax63k

Thank you so much.

I have ten files in total but I have been running two just to test it out. One is about 15,000 KB and the other is 6,000 KB.

I am running this command:

cuffmerge -g genes.gtf -s genome.fa

file 1 path

file 2 path

I was reading something and it said there had to be one file per line but I don't exactly know how to format it that way. I tried it a bunch of times, just written a little differently, and it didn't work at all.

How could I make an assemblies.txt file in an editor that I can add to my script? Maybe it will work that way. But based on how things have been going, I am not too sure.

What could be going wrong?

Thanks again for the help.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Lindsay4420

You make the assemblies.txt file in a text editor (use any you like, just save the file in text format).

So assemblies.txt file will have these contents.

/path_to/file1
/path_to/file2
/path_to/file3

Command would be

cuffmerge -g genes.gtf -s genome.fa assemblies.txt
ADD REPLYlink written 2.6 years ago by genomax63k

Thank you. I just ran it and it worked!. The gtf file is in my folder.

What do these outputs mean though? There is quite a long list of these:

SAM error on line 104: found spliced alignment without XS attribute

Warning: couldn't find fasta record for 'chr1_GL456221_random'!

I just have a few more questions:

What is the difference between including the reference annotation and sequence in both cufflinks and cuffmerge as opposed to just including that information in cuffmerge? I am looking to find novel genes in my study, so will preexisting reference information be a hindrance?

How does an annotation file made by cufflinks differ from one included in a reference genome download.

Also, I plan to input this merged gtf file into Cuffdiff. Do I make one large file that includes the two different experimental groups I am comparing?

I am having trouble running Cuffdiff as well. I have been trying for a long time now. When I enter the command, this comes up:

You are using Cufflinks v2.2.1, which is the most recent release.

And then it produces a bunch of empty output files.

I don't know if I am formatting the command correctly or what. I have two groups of files (each group has 5 files in it). How do I format that command? I tried another text file but it said it doesn't recognize that file type.

Thank you for all of the help. I am very new at all of this.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Lindsay4420
0
gravatar for Manvendra Singh
4.5 years ago by
Manvendra Singh2.0k
Berlin, Germany
Manvendra Singh2.0k wrote:

You probably need to check your genome.fa files and list.txt

use the same genome.fa which you used to make bowtie index file, which you would have provided during mapping

list.txt should be new line separated gtf files with full path which you got from cufflinks runs.

if things are in this way, then cuffmerge should be running smoothly

I ran cuffmerge like this:

 

/usr/local/bin/cuffmerge -p 4 -o output -g gtf_files/Human_gencodeV14_anno.gtf -s ../../hg19.fa --keep-tmp gtf_assembly/gtf_assembly.txt

 

 

 

 

ADD COMMENTlink written 4.5 years ago by Manvendra Singh2.0k

Hi, thanks for the reply. I checked my list.txt file and did not find anything wrong with it. Here is what it looks like:

 
/full/path/to/1.transcripts.gtf
/full/path/to/2.transcripts.gtf
/full/path/to/3.transcripts.gtf
/full/path/to/4.transcripts.gtf
 
I also checked the genome.fa file and it is the same as I used for building transcripts. But still no merged.gtf! I only see logs and tmp dir in the output dir.
Here is the command:
 
cuffmerge -o merged -s genome.fa list.txt
 
So, is there anything else that I can do/fix?
Do I absolutely need to provide a reference GTF file? How does it affect the results? I am merging transcripts that are non-coding and therefore are not present in the reference GTF file. 
Please help.
Best.
 
ADD REPLYlink written 4.5 years ago by nbhardwaj130

Things look fine.

is your system has enough memory to run this?

may be u need to use multiple cores by giving -p parameter

 

ADD REPLYlink written 4.5 years ago by Manvendra Singh2.0k

Yeah, the gtf files that I am merging are very small so memory should not be an issues. I tried with -p and still no merged.gtf

Would supplying a reference gtf help? But then how would it affect the results if the transcripts that I am merging are non-coding and not present in the reference gtf?
Thanks
ADD REPLYlink written 4.5 years ago by nbhardwaj130

Providing GTF file helps in assigning nearest reference IDs to the assembled transcripts. It can make downstream analyses easy. I provide the GTF file for tophat as well (along with cufflinks and cuffmerge). It can help in avoiding assignment of multiple xloc IDs to same gene which can improve quantitation. It doesnt matter if your gene of interest is not in the GTF. A transcript will still be assembled provided tophat wasn't made to align to GTF file ONLY.

ADD REPLYlink written 2.6 years ago by Satyajeet Khare1.3k

Okay thanks- I just have a few follow up questions.

What is the difference between including the reference information in both cufflinks and cuffmerge as opposed to just including that information in cuffmerge? I am looking to find novel genes in my study, so will preexisting reference information be a hindrance?

Will using the gtf in the alignment (I use hisat2) help improve lower alignment rates which occurred when just the gene fasta file was used as a reference?

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Lindsay4420

The pre-existing GTF file should not be a hindrance. It just wont be able to assign any known gene information to some genes and transcripts. But those genes will still have the XLOC IDs and TCONS Ids and location information which you able to use. Inclusion of GTF should not increase alignment rate per my knowledge.

ADD REPLYlink written 2.6 years ago by Satyajeet Khare1.3k

https://genomebiology.biomedcentral.com/articles/10.1186/gb-2011-12-3-r22 here you can find some information on including reference in cufflinks

ADD REPLYlink written 2.6 years ago by anjasta470
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2276 users visited in the last hour