Question: Difficulty understanding SOAPdenovo2 parameters
0
gravatar for eischzj12
3.3 years ago by
eischzj120
eischzj120 wrote:

Hello,

I'm currently trying to construct contigs using DNA libraries and am stuck on a particular parameter within the SOAPdenovo2 configuration file. I'm having a hard time understanding what to set "rd_len_cutoff" to. I've read the instructions from the website and they weren't very thorough, at least within my spectrum of understanding. Specifically, the instructions say that rd_len_cutoff tells the assembler what length to cut the reads from the current library to.

How do I determine the ideal length? I apologize if there is already a post that explains this somewhere on here, but I couldn't find one that was more thorough than the source below (I took my time to look before posting).

I also had the same problem for the "rank" parameter.

Thanks for your time!

http://soap.genomics.org.cn/soapdenovo.html

soapdenovo2 assembly • 1.1k views
ADD COMMENTlink modified 3.3 years ago by Rohit1.4k • written 3.3 years ago by eischzj120

can you post some info about the data you have? specially length (after trimming and remove adaptors)

rd_len_cutof The assembler will cut the reads from the current library to this length i.e. the position after which the reads will be cut, Soapdenovo trims off all the bases after that point.
so if you have read length = 200 and you put rd_len_cutof = 150 you will cut till 150 bp from your reads

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by Medhat8.6k

Do you mean after trimming/removing adaptors from something like Trimmomatic? I did that and was able to generate a FastQC report; would I be able to get the information you're asking about from that? Or when you say length do you mean genome size estimate?

And I understand that soapdenovo trims all bases after a particular position, but I'm not sure that I understand the utility in it. So from your example, you have an example read length of 200, why would you set a rd_len_cutoff to 150? Apologies once again for the confusion, I'm an undergrad with limited knowledge about this topic.

ADD REPLYlink written 3.3 years ago by eischzj120

the question regarding length it was about library preparation info, 200 thing is just example. In conclusion after trimming set this parameter to the longest read you have

ADD REPLYlink written 3.3 years ago by Medhat8.6k
0
gravatar for Rohit
3.3 years ago by
Rohit1.4k
California
Rohit1.4k wrote:

As Medhat already mentioned, rd_len_cutoff is the length to which the reads are trimmed to. Usually it is given based on the data quality, the value I use is the length of the longest-read so that no reads are trimmed further.

The Rank parameter denotes the order in which the read-libraries to consider. For example, a library of rank-1 is first considered for scaffolding followed by rank-2 and so on. Multiple-libraries can have the same rank in-order to be used at the same time.

ADD COMMENTlink written 3.3 years ago by Rohit1.4k

How do I determine the length of the longest-read? Is that something I can find in the FastQC report?

Also for rank, if multiple libraries can have the same rank in order to be used at the same time, then why bother considering one before the other?

ADD REPLYlink written 3.3 years ago by eischzj120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1676 users visited in the last hour