I just started with bioinformatics, so I hope you can help me with understanding some basic things. I have RNA-Seq 100bp pair-ended reads from the Lepidoptera insect, for which there is no genome available. In total there were 2 treatments and one control, so for the assembly I merged all of them into two files (R1 and R2). So the goal is to make an assembly, and then continue with mapping, counts and statistic.
For the assembly I used next tools: Trinity, Velvet/Oases, Bridger and CLC assembler. From the last one I just got the ready-to-go assembly, to which I compared other assemblers.
- Trinity was executed with default parameters (k=25),
- Velvet: velveth/velvetg k=21,51,2; For each assembly run oases and compared n50 and total number of contigs. Chose the range of contigs from 25 till 39, with kmer=35, velveth/velvetg on the oases results and then merged them with oases.
- Bridger. Default parameters with kmers 25 and 27.
After I compared assemblers with Quast. As a result got the next table
|Oases||Trinity||Bridger 25||Bridger 27||CLC|
|# contigs (>= 0 bp)||36576||40429||37011||36162||35589|
|# N's per 100 kbp||0||0||0.05||0.21||212.61|
So, I could not reach the N50 of CLC, by using default parameters. But at least I can see how do they reach it. The number of mismatches in CLC is super high, so I was wondering - is it really worth to increase the total length and n50 by putting double amount of N's into the contigs?
And the next question - for some reason I though it would be a good idea to increase the input sequence length by merging R1 and R1, R2 and R2 reads. So basically doubling the amount of data assembler should deal with. So far I tested only Bridger (k=25), but the results are surprising, N50 increased to 1720, and the total length and the largest contigs became almost the same as in CLC results. What I think is happened that during the building of the graph low quality k-mers had a better chance to survive, thus increasing the contig length.
Thanks in advance for any kind of responds!