Minimal length of sequence for reference alignment.
2
0
Entering edit mode
8.0 years ago
mieszko91 ▴ 30

Hi! When I am using bowtie 2 to reference alignment, console displays warnings about length of my sequences:

Warning: skipping mate #2 of read 'SRR988073.8033840 HWI-ST833:124:6:2202:2830:97445 length=101' because it was < 2 characters long Warning: skipping mate #2 of read 'SRR988073.8078016 HWI-ST833:124:6:2202:19987:137144 length=101' because length (1) <= # seed mismatches (0)

I have trimmed my data to improve quality of my sequences, but in fastq file is many short sequences now. Is there any minimal length of sequence for reference alignment? Shoud I remove short sequences? And how long sequences shoud be?

alignment next-gen sequencing • 3.2k views
ADD COMMENT
1
Entering edit mode
8.0 years ago
GZ1995 ▴ 410

Typically short reads may map to multiple genome regions. Thus, it is recommended to remove the short sequences before alignment. I usually use 35 bp as the minium post-trimming length for bowtie2.

ADD COMMENT
1
Entering edit mode
8.0 years ago

Well a length <2 characters is for sure far too short. Avoid too aggressive trimming, aligners commonly can handle low quality bases and suboptimal parts of a read.

ADD COMMENT
0
Entering edit mode

This ^. There is really no need to trim based on Q scores (which may be the case here) for re-sequencing data.

ADD REPLY

Login before adding your answer.

Traffic: 2491 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6