Question: i meet an error when i run the cuffmerge
0
gravatar for 1165576001
14 months ago by
11655760010
11655760010 wrote:

I used tophat, cufflinks to analyse clean reads of RNA-seq, and get the transcriptome expression profile of my samples. annotation.gtf and genome.fa I used in these program all work well, the working codes are below:

Tophat:

/home/share/bin/tophat -p 8 -G /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf \
    -o /home/jianglin/ljiang/XP/results/'d10803_L5_I371_thout1/' \
    /home/jianglin/ljiang/XP/goat_ref/goat /home/jianglin/ljiang/XP/data/'d10803_L5_I371.R1.clean.fastq' \
    /home/jianglin/ljiang/XP/data/'d10803_L5_I371.R2.clean.fastq'

Cufflink: /home/share/bin/cufflinks -p 8 -G /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf \ -o /home/jianglin/ljiang/XP/results/'d10803_L5_I371_clout' \ /home/jianglin/ljiang/XP/results/'d10803_L5_I371_thout1/accepted_hits.bam'

assemblies.txt=

/home/jianglin/ljiang/XP/results/d4502_L6_I367_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d4501_L4_I366_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d4503_L6_I368_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d10801_L4_I369_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d10802_L5_I370_clout/transcripts.gtf
/home/jianglin/ljiang/XP/results/d10803_L5_I371_clout/transcripts.gtf

Cuffmerge:

/home/share/software/cufflinks-2.2.1.Linux_x86_64/cuffmerge –o /home/jianglin/ljiang/XP/results/merged_asm \
    -g /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf \
    -s /home/jianglin/ljiang/XP/goat_ref/goat.fa \
    -p 8 \
    /home/jianglin/ljiang/XP/results/assemblies.txt

BUT when I worked in cuffmerge to create a single merged transcriptome annotation, the working panel had these error warn:

 [Sun Jan 28 17:15:19 2018] Beginning transcriptome assembly merge
-------------------------------------------

[Sun Jan 28 17:15:19 2018] Preparing output location /home/jianglin/ljiang/XP/results/merged_asm/
[Sun Jan 28 17:15:29 2018] Converting GTF files to SAM
[17:15:29] Loading reference annotation.
[17:15:33] Loading reference annotation.
[17:15:37] Loading reference annotation.
[17:15:41] Loading reference annotation.
[17:15:45] Loading reference annotation.
[17:15:49] Loading reference annotation.
[Sun Jan 28 17:16:01 2018] Quantitating transcripts
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
Command line:
cufflinks -o /home/jianglin/ljiang/XP/results/merged_asm/ -F 0.05 -g /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 8 /home/jianglin/ljiang/XP/results/merged_asm/tmp/mergeSam_filezEJpqP 
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File /home/jianglin/ljiang/XP/results/merged_asm/tmp/mergeSam_filezEJpqP doesn't appear to be a valid BAM file, trying SAM...
[17:16:21] Loading reference annotation.
[17:16:23] Inspecting reads and determining fragment length distribution.
Processed 22612 loci.                       
> Map Properties:
>       Normalized Map Mass: 218274.00
>       Raw Map Mass: 218274.00
>       Fragment Length Distribution: Truncated Gaussian (default)
>                     Default Mean: 200
>                  Default Std Dev: 80
[17:16:36] Assembling transcripts and estimating abundances.
Processed 22612 loci.                       
[Sun Jan 28 17:21:09 2018] Comparing against reference file /home/jianglin/ljiang/XP/goat_ref/goat_refnew.gtf
Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
No fasta index found for /home/jianglin/ljiang/XP/goat_ref/goat.fa. Rebuilding, please wait..
Error: sequence lines in a FASTA record must have the same length!
        [FAILED]

Did that mean I need to index the genome.fa again? but I tried and failed to overcome this problem. i'll appreciate it if someone can solve this problem, THANK YOU!! ^ ^~~

rna-seq • 911 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by 11655760010
2

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. (If you can't get access to that publication, let me know and I'll -cough- help you.) There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

ADD REPLYlink modified 14 months ago • written 14 months ago by WouterDeCoster38k
1

Tip: When posting code, use the code sample button to make it easier to read.

"EOF marker is absent" means that your BAM file has been truncated. Did Tophat produce a BAM or a SAM file? How did you convert from sam to bam?

ADD REPLYlink written 14 months ago by tiago2112871.1k

Please check if the fasta file formatted properly with no extra lines between separate sequences or at the end.

ADD REPLYlink written 14 months ago by arup1.1k

@arup you are right. Some people said the length of each line in fasta file should be consistent and avoid unnecessary blank line and newline, or the system would notice the error:sequence lines in a FASTA record must have the same length!.... BUT how can i detect the consistency of the length in my fasta file and correct these error? THANK YOU!!!

ADD REPLYlink written 14 months ago by 11655760010

You can try the sed solution posted in this post to clean-up your fasta file: A: Useful Bash Commands To Handle Fasta Files

ADD REPLYlink written 14 months ago by genomax65k

You can try fastx-toolkit to make the file uniform.

fasta_formatter -i input.fa | fastx_trimmer -l 45 > output.fa
ADD REPLYlink written 14 months ago by arup1.1k
0
gravatar for arup
14 months ago by
arup1.1k
India
arup1.1k wrote:

Most probably the fasta file you are using not formatted properly or version of GTF and FASTA is different resulting in the error.

Error: sequence lines in a FASTA record must have the same length!

Ref: http://seqanswers.com/forums/archive/index.php/t-14419.html

To remove unnecessary line breaks use

sed -i '/^$/d' input.fa >output.fa

To make the fasta file of uniform length use FastX-toolkit

fasta_formatter -i input.fa | fastx_trimmer -l 45 > output.fa
ADD COMMENTlink modified 14 months ago • written 14 months ago by arup1.1k
0
gravatar for 1165576001
14 months ago by
11655760010
11655760010 wrote:

@arup sorry i can't reply to you directly so i only reply to you in a new answer section. Do you mean that the genome.fa and annotation.gtf have a mismatch between them? But why can i work fluently tophat and cufflinks by using the same GTF and FASTA files.

ADD COMMENTlink written 14 months ago by 11655760010

You can. See C: How do I ask a question on Biostars?

Now you can move this to where it belongs using the following steps:

ADD REPLYlink modified 14 months ago • written 14 months ago by RamRS21k

Whatever browser people are using in China seems to have this odd behavior (not being able to use ADD COMMENT/ADD REPLY on BioStars). This could be due to users keeping scripting completely off in browsers or else who knows ...

ADD REPLYlink modified 14 months ago • written 14 months ago by genomax65k

That's odd. China's Internet policies are strange.

ADD REPLYlink written 14 months ago by RamRS21k

It's not their browser. A: i meet an error when i run the cuffmerge (Or someone mod-moved it to that spot)

ADD REPLYlink modified 14 months ago by WouterDeCoster38k • written 14 months ago by RamRS21k

I moved it since it was posted as a new answer. That is the only option (not optimal as we have discussed many times in past).

ADD REPLYlink written 14 months ago by genomax65k

It is possible that your reference file is wrapped at n characters for some sequences where as others are one long string of [ACTG],

ADD REPLYlink written 14 months ago by genomax65k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 686 users visited in the last hour