Question

How to Align multiple short reads (NGS data) to a short (~200bp) reference?

1

Entering edit mode

4.1 years ago

itamar ▴ 10

Hi everyone,

As a part of plant genotyping research and CRISPR cas9 mutation screening methods, I need to sequence and align many reads in order to capture low-frequency varients. therefore, im using NGS (rather then Sanger) Im trying to align reads from a fastq file (pair-end) to a short amplicon/reference which is about 200 bp.

I tried using the Bowtie2 aligner. first, used the Build option pretty generically: bowtie2-build ref.fa ref

then do the alignment: bowtie2 -x ref -1 R1.fastq -2 R2.fastq -X 200 --fr -S output.sam, which executed with no issues, though not one read was aligned...

Can anyone please explain what's going on? or even better, how to perform this kind of analysis.

Thanks in advance, Itamar

genomics NGS CRISPR genotyping • 1.7k views

ADD COMMENT • link updated 4.1 years ago by Ram 45k • written 4.1 years ago by itamar ▴ 10

score 1 · Answer 1 · 2021-05-28

1

Entering edit mode

4.1 years ago

shelkmike ★ 1.6k

Try bowtie2 with the key "--local". In the default mode bowtie2 tries to align reads entirely. Consequently, reads that overlap edges of the reference will not be aligned in the default mode.

ADD COMMENT • link 4.1 years ago by shelkmike ★ 1.6k

0

Entering edit mode

thanks shelkmike and Carlo! I both took a larger genomic segment as reference (1kbp) and used the --local argument and that did the trick. I also changed the max average read length parameter afterwards to better adjust the alignment.

ADD REPLY • link 4.1 years ago by itamar ▴ 10

score 0 · Answer 2 · 2021-05-27

It is possible, and even likely depending on the library preparation, that many fragments (read pairs) span more than 200 nt and are unlikely to map concordantly on the 200nt reference. Instead, a more common strategy would be to map on the full genome, then extract the genomic region of interest (the one targeted for mutagenesis). This strategy would also be more robust against spurious mapping.

Before going forward however, I would double check the quality of the reads (use FASTQC for instance) and double check the reference (bowtie2-inspect on the index would be a good start).