Question: How To Find The Total Number Of Reads Used By Trinity Assembler To Perform Assembly
1
gravatar for bambus0725
5.8 years ago by
bambus072550
Germany
bambus072550 wrote:

Hello,

I am working on Meta-transcriptomics data.I want to assemble the data (in fasta format), for that I have chosen to use de-novo transcriptome assembler "Trinity" followed by mapping the reads with "Bowtie2".

I used the following parameters with Trinity

perl Trinity.pl --seqType fa --JM 10G --single sample.fasta  --SS_lib_type F --output trinity_output.sam --CPU 10 --min_contig_length 20 --inchworm_cpu 10 --max_reads_per_graph 100000  --bflyCPU 10 --min_per_id_same_path 97

with Bowtie2

bowtie2  -f -U  cluster.fasta -L 20 -a -t --un unpaired.fasta  --met-file metrics.txt  -p 10 --reorder  >output.sam -x reference -N 1 -L 20

I am not sure where the problem exits either while running Trinity(with parameters chosen) or Bowtie2,because at the end after the steps of assembly and mapping it could align only 30%.

It would be a great help if anyone could point out the reason for this and help me out with the solution.

Thank you in advance!!

trinity rna-seq bioinformatics • 3.5k views
ADD COMMENTlink modified 5.8 years ago by Devon Ryan92k • written 5.8 years ago by bambus072550

Aside from that bowtie2 command not appearing to be valid, you're also assembling with one set and aligning with another. If these are both random subsets of the original batch of sequences then this would seem to be fine, but otherwise might be the source of the problem.

ADD REPLYlink written 5.8 years ago by Devon Ryan92k

Thank you for the reply dpryan.

Yes,I used the appropriate file to align with I mean, the output file generated by Trinity as the reference file(after indexing)in Bowtie and aligned it to the set of sequences(i.e cluster.fasta)the original file even before clustering the data by CD-Hit_EST and named as "sample.fasta" which is used for assembling.

I hope I made it clear and could you suggest me what can be changed with Bowtie2 command.

ADD REPLYlink written 5.8 years ago by bambus072550

Yeah, that makes sense then. Perhaps you didn't subsample enough of the reads with CD-Hit_Est (I've never used it). Regardless, a more normally formatted (and likely to work) bowtie2 command would be:

bowtie2 -f -N 1 -L 20 -a -t --un unpaired.fa --met-file metrics.txt -p 10 --reorder -x reference -U cluster.fa -S output.sam
ADD REPLYlink written 5.8 years ago by Devon Ryan92k

Okay.So do you mean the order of chosing the arguments is important?But,still I got the same result.

ADD REPLYlink written 5.8 years ago by bambus072550

Your original command wouldn't have actually worked at all, I can only assume that it's not what you actually typed. Aside from that, bowtie2 is only mapping that few reads because only that percentage is mappable (at least to individual contigs). My guess would be that the assembly just isn't that good. Perhaps try a different assembler or just try assembling the unmapped reads. You might also try mapping with bwa mem to see if a number of the remaining reads map across contigs.

ADD REPLYlink written 5.8 years ago by Devon Ryan92k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1932 users visited in the last hour