Minimum length of reads
1
0
Entering edit mode
6.7 years ago
deepti1rao ▴ 50

I have used a trimmer to trim my reads, based on their qualities. My raw reads were 126- 150 bp. I did not set any minimum length to filter out short reads. Hence, I have reads as small as 0-4 bps in my processed file. I need to do a reference based assembly using these paired end reads. I am afraid of losing paired information by filtering out the shorter reads. What should my minimum length be?? I have a 40 x coverage with the raw reads.

Ngs Reads Processing Trimming Quality • 5.0k views
ADD COMMENT
0
Entering edit mode

Run FASTQC on your files, and see where the quality drops off in terms of sequence length. I'm assuming you have 2x150 bp reads, so your quality will start to drop off around 125 bp. You will lose some pairs in the process of filtering out both quality, and length. Ideally, keep the pairs that pass both of the QC processes, and then align to your reference.

ADD REPLY
0
Entering edit mode

Thanks for your reply! I used the fastq quality trimmer from the fastX - toolkit. I trimmed just one file, which had my forward reads, using a quality threshold of 30. And yes, my reads are Illumina- 150 bp in length. What do you think is a good length to set as a filter?

ADD REPLY
0
Entering edit mode

It sounds like you may have trimmed to severely. What quality threhold did you use, with which program, and what are you planning to use the data for?

ADD REPLY
0
Entering edit mode

Thanks for your reply Brian! I used the fastq quality trimmer that came with the fastX - toolkit and trimmed with a quality threshold of 30. I want to use these reads for a reference based genome assembly.

ADD REPLY
0
Entering edit mode

30 is too high for almost all purposes. I would suggest trying a much lower threshold like Q20 or Q15 and see how it impacts assembly. And I do not recommend fastX as it is (as you have found) quite slow and cannot handle paired reads.

ADD REPLY
0
Entering edit mode
6.7 years ago
GenoMax 141k

I am afraid of losing paired information by filtering out the shorter reads.

Having very short reads may make the assembly difficult/unusable. You could use reformat.sh from BBMap suite to filter your trimmed paired end reads while keeping them in sync. reformat.sh in1=read1.fq in2=read2.fq out1=filt1.fq out2=filt.fq ml=30 (change ml= as needed).

ADD COMMENT
0
Entering edit mode

Thanks for your suggestion! From your post, I can infer that reformat.sh would retain only the reads that come in pairs, with a minimum length of 30 (if ml=30). Is this right??

Does bbmap have a quality trimming tools as well?

ADD REPLY
2
Entering edit mode

Yes for the first question. Yes it does for second. bbduk.sh is the scan/trim program in BBMap suite. If you are doing reference based assembly they you could afford to trim at a lower Q threshold. Q20 or better may fine.

ADD REPLY
0
Entering edit mode

Can you compare the speeds of bbduk.sh and fastX quality trimmer? I found fastX to be pretty slow. Perhaps, I'll try both Q 20 and 30, separately and see how good my depth and breadth are.

ADD REPLY
2
Entering edit mode

bbduk.sh is thread enabled and can use more than 1 core if you have them available. It will be significantly faster than fastX.

ADD REPLY

Login before adding your answer.

Traffic: 2539 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6