Cuffmerge error when merging transcript.gtf files
0
0
Entering edit mode
3.4 years ago
sarai • 0

Hi everyone!

I'm new on RNA-Seq analysis and I need your help. I'm trying to merge the transcript.gtf files I got with cufflinks but when I write the following command I get an error.

Can anyone suggest me how to solve this?

Thanks so much!

cuffmerge -g Mus_musculus.GRCm38.101.gtf.gz -s Mus_musculus.GRCm38.dna.primary_assembly.fa.gz assemblies.txt

[Sat Dec  5 20:19:09 2020] Beginning transcriptome assembly merge
-------------------------------------------

[Sat Dec  5 20:19:09 2020] Preparing output location ./merged_asm/
Traceback (most recent call last):
  File "/home/msmenfig/tools/cufflinks-2.2.1.Linux_x86_64/cuffmerge", line 580, in <module>
    sys.exit(main())
  File "/home/msmenfig/tools/cufflinks-2.2.1.Linux_x86_64/cuffmerge", line 546, in main
    chrom_info = get_gtf_chrom_info(gtf, chrom_info)
  File "/home/msmenfig/tools/cufflinks-2.2.1.Linux_x86_64/cuffmerge", line 476, in get_gtf_chrom_info
    left = int(cols[3])
ValueError: invalid literal for int() with base 10: '9\x11\x9ei;\xb6}\xf6\xce\x08\xbe\x90\x8d;\xf8\x95\x8bS\xb5\xa6\xb3q\xb7\xf6\xcc\xea\x1fO\xcf\x9fO_9\xa9\xaa2?\x18X\xe4{\xa5\x97\x1d\xa3rRR\x9a>\x93H\xce\x82\xe3\x1b;\x12\xcf\xee)\x99\xb7`C\x9f+\x95[7\x18\x94\x8f\xe1?0\xc3.f\xa6\x90\xa5W\xf8\x15\x8a\xcb]~E\x90\x0c\xcd\x07\x0fs\xbcJ\x9dl\xc8\xd1\x82\x8c\xd9JNd\xac\x0c\x05[\xb6k\xab\x1c"\xe3\xb2\x1803`\x07M\x86$2\xb6i\x866rW\xf0WF\x9d\x05\xa7\xc4\xd8\x8bnP\x8e\xe1\xc7\x1a|\x08%V\xf74\xe9'
RNA-Seq • 1.2k views
ADD COMMENT
1
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLY
0
Entering edit mode

It looks like your input files contain weird characters. What is the output to:

file Mus_musculus.GRCm38.101.gtf.gz
file Mus_musculus.GRCm38.dna.primary_assembly.fa.gz

Also, ensure that cuffmerge can indeed work with gzipped input files.

ADD REPLY
0
Entering edit mode

What do you mean with "the output to" the files?

ADD REPLY
0
Entering edit mode

Those are commands, not files.

ADD REPLY
0
Entering edit mode
file Mus_musculus.GRCm38.101.gtf.gz 
Mus_musculus.GRCm38.101.gtf.gz: gzip compressed data, from Unix
file Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
Mus_musculus.GRCm38.dna.primary_assembly.fa.gz: gzip compressed data
ADD REPLY
0
Entering edit mode

It looks like there might be a problem with the GTF file. Does cuffmerge accept a gzipped GTF file? Try running

cuffmerge -g <(zcat Mus_musculus.GRCm38.101.gtf.gz) -s Mus_musculus.GRCm38.dna.primary_assembly.fa.gz assemblies.txt

If that errors out too, try

cuffmerge -g <(zcat Mus_musculus.GRCm38.101.gtf.gz) -s <(zcat Mus_musculus.GRCm38.dna.primary_assembly.fa.gz) assemblies.txt
ADD REPLY
0
Entering edit mode

I got this with the second command line:

cuffmerge -g <(zcat Mus_musculus.GRCm38.101.gtf.gz) -s <(zcat Mus_musculus.GRCm38.dna.primary_assembly.fa.gz) assemblies.txt

[Sat Dec  5 22:52:30 2020] Beginning transcriptome assembly merge

-------------------------------------------

[Sat Dec  5 22:52:30 2020] Preparing output location ./merged_asm/
[Sat Dec  5 22:52:43 2020] Converting GTF files to SAM
[22:52:43] Loading reference annotation.
[22:52:46] Loading reference annotation.
[22:52:49] Loading reference annotation.
[22:52:51] Loading reference annotation.
[22:52:54] Loading reference annotation.
[22:52:57] Loading reference annotation.
[22:53:00] Loading reference annotation.
[22:53:04] Loading reference annotation.
[22:53:06] Loading reference annotation.
[Sat Dec  5 22:53:11 2020] Quantitating transcripts
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
Command line:
cufflinks -o ./merged_asm/ -F 0.05 -g /dev/fd/63 -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 1 ./merged_asm/tmp/mergeSam_fileBPezAf 
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File ./merged_asm/tmp/mergeSam_fileBPezAf doesn't appear to be a valid BAM file, trying SAM...
[22:53:11] Loading reference annotation.
[22:53:11] Inspecting reads and determining fragment length distribution.
Processed 43499 loci.                       
> Map Properties:
>   Normalized Map Mass: 1151266.00
>   Raw Map Mass: 1151266.00
>   Fragment Length Distribution: Truncated Gaussian (default)
>                 Default Mean: 200
>              Default Std Dev: 80
[22:53:13] Assembling transcripts and estimating abundances.
Processed 43499 loci.                       
[Sat Dec  5 22:59:05 2020] Comparing against reference file /dev/fd/63
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
Error: fasta file /dev/fd/62 not found!
    [FAILED]
Error: could not execute cuffcompare

And I get the directory merged_asm with these files:

genes.fpkm_tracking  isoforms.fpkm_tracking  logs  skipped.gtf  tmp  transcripts.gtf
ADD REPLY
0
Entering edit mode

I think it failed because it released the FASTA prematurely. Try re-running it, but this time gunzip the fasta file first.

gunzip Mus_musculus.GRCm38.dna.primary_assembly.fa.gz
cuffmerge -g <(zcat Mus_musculus.GRCm38.101.gtf.gz) -s Mus_musculus.GRCm38.dna.primary_assembly.fa assemblies.txt
ADD REPLY
0
Entering edit mode

I got the following. Should I assume this time it worked since I got the merged.gtf file?

cuffmerge -g <(zcat Mus_musculus.GRCm38.101.gtf.gz) -s Mus_musculus.GRCm38.dna.primary_assembly.fa assemblies.txt

[Sat Dec  5 23:36:21 2020] Beginning transcriptome assembly merge
-------------------------------------------

[Sat Dec  5 23:36:21 2020] Preparing output location ./merged_asm/
[Sat Dec  5 23:36:36 2020] Converting GTF files to SAM
[23:36:36] Loading reference annotation.
[23:36:39] Loading reference annotation.
[23:36:41] Loading reference annotation.
[23:36:44] Loading reference annotation.
[23:36:47] Loading reference annotation.
[23:36:50] Loading reference annotation.
[23:36:53] Loading reference annotation.
[23:36:57] Loading reference annotation.
[23:36:59] Loading reference annotation.
[Sat Dec  5 23:37:03 2020] Quantitating transcripts
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
Command line:
cufflinks -o ./merged_asm/ -F 0.05 -g /dev/fd/63 -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 1 ./merged_asm/tmp/mergeSam_file43EPNZ 
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File ./merged_asm/tmp/mergeSam_file43EPNZ doesn't appear to be a valid BAM file, trying SAM...
[23:37:03] Loading reference annotation.
[23:37:03] Inspecting reads and determining fragment length distribution.
Processed 43499 loci.                       
> Map Properties:
>   Normalized Map Mass: 1151266.00
>   Raw Map Mass: 1151266.00
>   Fragment Length Distribution: Truncated Gaussian (default)
>                 Default Mean: 200
>              Default Std Dev: 80
[23:37:05] Assembling transcripts and estimating abundances.
Processed 43499 loci.                       
[Sat Dec  5 23:43:57 2020] Comparing against reference file /dev/fd/63
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
No fasta index found for Mus_musculus.GRCm38.dna.primary_assembly.fa. Rebuilding, please wait..
Fasta index rebuilt.
[Sat Dec  5 23:44:47 2020] Comparing against reference file /dev/fd/63
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
[msmenfig@node157 fastq_files]$ cd merged_asm
[msmenfig@node157 merged_asm]$ ls
logs  merged.gtf
ADD REPLY
1
Entering edit mode

Yes, I think so. More importantly, you did not get any errors. I think cuffmerge failed the last time because it cannot look for an index file when the file descriptor passed to it is a /dev/fd/ and not an actual filename.

ADD REPLY

Login before adding your answer.

Traffic: 2392 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6