Hi friends,
I used CLC genomic workbench software for trimming, then the output of this software, which is the trimmed fasta file was exposed to Trinity for de novo assembly. But, I encountered with the following error?. Could you please let me know what's wrong and how to solve it, the compatibility is the matter? You can see part of input fasta file and output of trinity here:
>D69F08P1:337:C4GGBACXX:1:1101:1193:2153_1:N:0:
TTTAAGTTCTTTACAGTAAGAAACAACATTGCATTTTTCACATCCTCAAGGTCATGTGAG
TGGCTGAATCATTCGTGGCTACTT
>D69F08P1:337:C4GGBACXX:1:1101:1193:2153_2:N:0:
CCACGAATGATTCAGCCACTCACATGACCTTGAGGATGTGAAAAATGCAATGTTGTTTCT
TACTGTAAAGAACTTAAAAGCCGT
>D69F08P1:337:C4GGBACXX:1:1101:1123:2158_1:N:0:
TACCCTGGAAACCGCTGTCATCATGCCAAAACGAGTTAGCGTCCAACTCAGGCAGCACAA
GTGTAGCATTCATGATTCGTGCAG
>D69F08P1:337:C4GGBACXX:1:1101:1123:2158_2:N:0:
I run Trinity with the code:
./Trinity --seqType fa --JM 180G --single 8_Trimmed.fa --run_as_paired --normalize_reads --SS_lib_type FR --min_contig_length 400 --CPU 6 --full_cleanup
Output of trinity with error:
Paired mode requires bowtie. Found bowtie at: /usr/bin/bowtie
and bowtie-build at /usr/bin/bowtie-build
-since butterfly will eventually be run, lets test for proper execution of java
Found samtools at: /usr/bin/samtools
#######################################
Running Java Tests
Wednesday, June 24, 2015: 13:28:35 CMD: java -Xmx64m -jar /home/jafarinezhad/software/trinityrnaseq_r20140717/util/support_scripts/ExitTester.jar 0
CMD finished (1 seconds)
Wednesday, June 24, 2015: 13:28:36 CMD: java -Xmx64m -jar /home/jafarinezhad/software/trinityrnaseq_r20140717/util/support_scripts/ExitTester.jar 1
-we properly captured the java failure status, as needed. Looking good.
Java tests succeeded.
###################################
---------------------------------------------------------------
------------ In silico Read Normalization ---------------------
-- (Removing Excess Reads Beyond 50 Coverage --
-- /home/jafarinezhad/software/trinityrnaseq_r20140717/insilico_read_normalization --
---------------------------------------------------------------
Wednesday, June 24, 2015: 13:28:36 CMD: /home/jafarinezhad/software/trinityrnaseq_r20140717/util/insilico_read_normalization.pl --seqType fa --JM 180G --max_cov 50 --CPU 6 --output /home/jafarinezhad/software/trinityrnaseq_r20140717/insilico_read_normalization --SS_lib_type FR --single /home/jafarinezhad/software/trinityrnaseq_r20140717/8_Trimmed.fa
CMD: ln -s /home/jafarinezhad/software/trinityrnaseq_r20140717/8_Trimmed.fa single.fa
CMD finished (0 seconds)
CMD: touch single.fa.ok
-------------------------------------------
----------- Jellyfish --------------------
-- (building a k-mer catalog from reads) --
-------------------------------------------
CMD finished (0 seconds)
CMD: /home/jafarinezhad/software/trinityrnaseq_r20140717/util/..//Inchworm/bin/fastaToKmerCoverageStats --reads single.fa --kmers jellyfish.K25.min2.kmers.fa --kmer_size 25 --num_threads 6 > single.fa.K25.stats
-reading Kmer occurences...
done parsing 0 Kmers, 0 added, taking 0 seconds.
STATS_GENERATION_TIME: 3461 seconds.
CMD finished (3461 seconds)
CMD: touch single.fa.K25.stats.ok
-sorting each stats file by read name.
CMD finished (0 seconds)
CMD: sort -k5,5 -T . -S 180G single.fa.K25.stats > single.fa.K25.stats.sort
CMD finished (610 seconds)
CMD: touch single.fa.K25.stats.sort.ok
CMD finished (2 seconds)
CMD: /home/jafarinezhad/software/trinityrnaseq_r20140717/util/..//util/support_scripts//nbkc_normalize.pl single.fa.K25.stats.sort 50 200 > single.fa.K25.stats.sort.C50.pctSD200.accs
326553678 / 326553678 = 100.00% reads selected during normalization.
0 / 326553678 = 0.00% reads discarded as likely aberrant based on coverage profiles.
0 / 326553678 = 0.00% reads missing kmer coverage (N chars included?).
CMD finished (1064 seconds)
CMD: touch single.fa.K25.stats.sort.C50.pctSD200.accs.ok
CMD finished (0 seconds)
Thread 2 terminated abnormally: Error, not all specified records have been retrieved (missing 326553678) from /home/jafarinezhad/software/trinityrnaseq_r20140717/8_Trimmed.fa at /home/jafarinezhad/software/trinityrnaseq_r20140717/util/insilico_read_normalization.pl line 521.
Error encountered with thread.
Error, at least one thread died at /home/jafarinezhad/software/trinityrnaseq_r20140717/util/insilico_read_normalization.pl line 419.
Error, cmd: /home/jafarinezhad/software/trinityrnaseq_r20140717/util/insilico_read_normalization.pl --seqType fa --JM 180G --max_cov 50 --CPU 6 --output /home/jafarinezhad/software/trinityrnaseq_r20140717/insilico_read_normalization --SS_lib_type FR --single /home/jafarinezhad/software/trinityrnaseq_r20140717/8_Trimmed.fa died with ret 6400 at ./Trinity line 1990.
Thanks for taking look at my problem and help me to resolve it.
I am guessing CLC trimmed whole reads and left only the header.
I don't think so, I did successfully de novo assembly on the same reads using CLC, and now I plan to compare two assemblies and may be combine their results. Any idea?
CLC may be immune to its own tricks, and Trinity may be more finicky about its input files.
Your mean is the input format is the issue. So, how I can compare two assemblies, is there any way to change change the format and provide acceptable format for trinity?
I was not sure input format is the problem, it was just a guess. I just added an answer.