Question

Is there a way to map long fasta reads to short reference sequences?

0

Entering edit mode

3.4 years ago

Apex92 ▴ 280

I am having difficulties in mapping fast files with longer reads (~150 bp) against short custom reference reads (~21 bp) - I would like to get a proper bowtie like report after the alignment.

Basically, I want to know which read in which fasta files has one of those 21bp reads (ref reads). I would like to try this with 0,1 and 2 mismatches - could anybody help me with this?

I tried with --local option in bowtie 2 but it did not work.

my command line: bowtie2 -a --local -x AB_seq -f 1_S1_L001_R1_001.fasta

Same question with details: https://www.biostars.org/p/474903/#474905

alignment rna-seq bowtie • 1.3k views

ADD COMMENT • link updated 3.4 years ago by karl.stamm 4.1k • written 3.4 years ago by Apex92 ▴ 280

1

Entering edit mode

A good suggestion has already been provided in the other question, why are you repeating the same question? Did you try BBDuk? If yes, why it didn't fulfill your needs?

Also, some technical and biological background could provide clues for better ways at solving your problem.

ADD REPLY • link 3.4 years ago by h.mon 35k

score 1 · Answer 1 · 2020-11-23

Those tools (bowtie) think the reference is a genome, and there's no way a 150bp read came from a 21bp genome. So finding no matches is the correct result.

You could pad the reference with a series of NNN on both sides, to allow the aligner to find a place the 150bp read matches.

You could chop the read up into smaller parts (sub 20), and let each part find an alignment.

You could trim and only align the first 18 bases of each read.

I have some experience with reads longer than the DNA source, and the ends of the read are generally technical artifact, or illumina adapter sequence you can trim.

Or, finally if you don't want to do those things, then you need a different tool, because bowtie and tophat and bwa are looking to put reads onto the genome they came from and are not suited to your task.