Question: Does the PacBio basecaller filter out reads with low complexity?
2.4 years ago by
bgbrink60 wrote:

I have data set where the genome is known to contain about 10% telomeric repeats. However, when I blast a sequence of 4 x TTAGGG against my reads, less than 1% show a hit. This makes me wonder if reads with low complexity are removed by the basecalling pipeline and don't end up in the subreads.fastq.

Here is my blast command, to make sure I didn't do anything wrong on my side. I also tried to use less stringend values for reward/penalty and gap costs (5/-4, 10/6), but the result remains the same.

blastn -db "all_reads.fasta" -query "telo_sequence.fasta" -word_size 6 -dust no -soft_masking false -outfmt 6
2.4 years ago by bgbrink60
