Question

How to blastn a very short query sequence? Identifying telomeres in assembly

0

Entering edit mode

4.2 years ago

mrj ▴ 170

hello, I have a fungal genome assembled into 222 contigs (contig.fasta). I would like to detect characteristic telemeric repeats (TTAGGG/CCCTAA) in the contigs. My approach was to create an indexed databse using the fasta file containing 222 contigs and performing blastn using a very short quesry sequence (tel1.fasta) (6 nucleotides only).

>query sequence
TTAGGG

following is the command I used. It does not return matches surprisingly even though I know that there are prenty of matches. Can you explain why it is happening and how to prevent it?

blastn -query tel1.fasta -db contig.fasta -task "blastn-short" -outfmt 7 -max_target_seqs 10 -evalue 0.5 -perc_identity 90

Is there any other way to detect telomeres in a given sequence assembly?

Thanks

blastn blast shortquery telomere • 1.6k views

ADD COMMENT • link updated 4.2 years ago by mb314 ▴ 20 • written 4.2 years ago by mrj ▴ 170

1

Entering edit mode

I don't think you can use the ". Try just blastn-short.

That works only for sequences 10 bp and longer according to this post: A: Blast Settings For Short Sequences

You may want to tryfuzznuc from EMBOSS for real short sequences like that.

ADD REPLY • link 4.2 years ago by GenoMax 141k

0

Entering edit mode

The minimal seed length is 7 for blast actually. Still too much for this, unless you blast on two or more tandem repeats. For instance:

>query sequence
TTAGGGTTAGGGTTAGGG

ADD REPLY • link 4.2 years ago by Carlo Yague 8.7k

0

Entering edit mode

That's a good one. Yes, i noticed that there were tandem repeats.

ADD REPLY • link 4.2 years ago by mrj ▴ 170

0

Entering edit mode

Thanks genomax. I need to try fuzznuc as you suggested.

ADD REPLY • link 4.2 years ago by mrj ▴ 170

0

Entering edit mode

Do you actually need alignment for this, or are the matches (reasonably) well conserved? Fuzzy matching, as already mentioned, or a regex approach might be sufficient?

ADD REPLY • link 4.2 years ago by Joe 21k

0

Entering edit mode

Thanks for the suggestion. This sequence is a characteristic repeat. regex worked!!!!!

ADD REPLY • link 4.2 years ago by mrj ▴ 170

score 1 · Answer 1 · 2020-02-27

1

Entering edit mode

4.2 years ago

Carlo Yague 8.7k

If you are looking for exact repeats of at least n TTAGGG, you could just perhaps just use grep.

ADD COMMENT • link 4.2 years ago by Carlo Yague 8.7k

0

Entering edit mode

I used grep and it worked like a charm. Thanks a lot

ADD REPLY • link 4.2 years ago by mrj ▴ 170

0

Entering edit mode

Hi mrj, if you have a moment, please consider upvoting and/or marking this answer as accepted. This helps others down the road, who may ask a similar question and find this answer useful!

ADD REPLY • link 4.2 years ago by Alex Reynolds 35k

score 1 · Answer 2 · 2020-02-27

Just spitballin':

Could you build a quasi-PWM MEME motif table file from your telomeric repeat motif, and then use FIMO to scan your contig FASTA for hits of various quality (q-value/p-value)? FIMO is kind enough to look for reverse-strand hits, as well. For instance, here's an example of how to search human assembly for TF motifs, which could be extrapolated to your situation with some alterations to inputs: https://bioinformatics.stackexchange.com/a/2491/776

The sequence is short enough, and your contigs likely short enough, that another option might be to programmatically build a set of off-target sequences (one mismatch, two mismatches, etc., along with their reverse complements) and search for those as regex patterns via a command-line script (using grep or re library in Python).

If you get same-strand hits back that are spaced every N bases (N being the length of the motif, e.g. N=6) then you may have located a telomeric repeat.

score 1 · Answer 3 · 2020-02-27

1

Entering edit mode

4.2 years ago

mb314 ▴ 20

I think the program HOMER might be able to search for your motif from a fasta file. Check out this link!

http://homer.ucsd.edu/homer/motif/genomeWideMotifScan.html

ADD COMMENT • link 4.2 years ago by mb314 ▴ 20