Question: Paired-end reads merge tool: Multiple @ lines in merged output of FLASH tool
0
gravatar for Ankit
7 months ago by
Ankit120
Ankit120 wrote:

Hi everyone,

I have a query regarding merging paired-end read files. I am using FLASH for merging data. I ran flash as follows:

./flash sample_rep1_R1.fastq sample_rep1_R2.fastq -m 5 -t 5 -o sample_merge 2>&1 | tee flash.log

In sample_merge.extendedFrags.fastq I noticed some lines with multiple @ and quality score. For example,

> > <B/B-:@@D@D@:-:@-D@-:@D@@D-DDD@D@:@---D--::D:DDD BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFDDDDDDDDDDDDD#::DDDDDDDD@@DDDDDDDDDDDDDDDDDDDDDDDDF#FFFFFFFFFFFFFFFFF<<
> BBBBBF/BFFFBFFFFFDDDD:DDDDDDDDDDDDDDDDD@D:D@DDDDDDD:@D::@@DD@D-D@DDDDDFDB@FFFDF@:::
> BBBBBFFFFFDDD-D@DDDD@-D@-:D:@DDDDD-@DDDDDDD@:D@D-@D-D-@-D-5D@D@FFFFFFFFFF<<<-:7:
> BB@@@DF<FFF<DDDDDDDDDDFDDDDFFFFFDFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<DDDDDDFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBB
> BBBBBFFFFBFFFFB/FFFFFFFF<FFFFFFFFFFFFFFFFFFFFFB<FBFFFBFFBFBBFFFFBD@DDDDDDDD@D#::
> BB@@@DDDDDDDDDD@DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDFFFFF#FFFFFFFF#F#FFFFFFDDF@DD@D@DDDDDDDDDDDDDDDDDDDDDDDDDDDD@DDDDDDDDDDDDFBB

While the unmerged files: sample.notCombined_1.fastq and sample.notCombined_2.fastq does not have these lines.

I am wondering if these multi @ lines in extendedFrags.fastq are normal or are related to the parameter I have chosen.

My reads are 125X2

It would very help if someone can guide me.

Thanks

Ankit

ADD COMMENTlink modified 7 months ago by gb1.2k • written 7 months ago by Ankit120
1
gravatar for gb
7 months ago by
gb1.2k
gb1.2k wrote:

Not fully sure if I understand you. But that "@" character stands for a certain quality score, is this case 64 (https://www.drive5.com/usearch/manual/quality_score.html). If you look up the merged read in sample_rep1_R1.fastq and sample_rep1_R2.fastq you will see those "@" characters in the line after the line starting with a "+".

What could be the cause why you don't see them often in the non-merged files is that FLASH does not merge if the --max-mismatch-density exceeds. It does not merge if there are to many mismatches. It can be that there are mismatches because the sequencer read the basepair wrong. And mostly those wrong basepairs have a lower quality score. The "@" character stand for a relatively higher score.

You should check how to interpret fastq files. FLASH will choose the basepair with highest quality score.

ADD COMMENTlink modified 7 months ago • written 7 months ago by gb1.2k

Hi, Thanks for the reply. Yes you are right. I checked the distribution of fastq reads both in *extended.frags.fastq and *.notCombined_1 _2.fastq. It matches the sum of the original fastq. I was doing the mistake by checking read count using "@" and it was not matching sum properly so I thought this might be an issue. But now I checked "@header". It seems ok.

Can you also suggest me the appropriate value for -m and -M for 125 bp read. I am using -m 5 and -M default (65)?

Thanks for the quick help.

ADD REPLYlink modified 7 months ago • written 7 months ago by Ankit120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1768 users visited in the last hour