Question: Does the PacBio basecaller filter out reads with low complexity?
gravatar for bgbrink
2.4 years ago by
bgbrink60 wrote:

I have data set where the genome is known to contain about 10% telomeric repeats. However, when I blast a sequence of 4 x TTAGGG against my reads, less than 1% show a hit. This makes me wonder if reads with low complexity are removed by the basecalling pipeline and don't end up in the subreads.fastq.

Here is my blast command, to make sure I didn't do anything wrong on my side. I also tried to use less stringend values for reward/penalty and gap costs (5/-4, 10/6), but the result remains the same.

blastn -db "all_reads.fasta" -query "telo_sequence.fasta" -word_size 6 -dust no -soft_masking false -outfmt 6
sequencing • 452 views
ADD COMMENTlink written 2.4 years ago by bgbrink60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1925 users visited in the last hour