Question

Aligner for 50bp paired end short reads

0

Entering edit mode

6.1 years ago

bipin ▴ 30

I am aligning 50 bp paired-end data for ATAC seq.

For aligning it to the reference genome I have tried bowtie with a larger insert size i.e. -X 1000 but the results are not optimal.

Is there any other aligner which is expected to perform better for this input dataset?

ChIP-Seq Assembly alignment • 2.6k views

ADD COMMENT • link updated 6.1 years ago by colindaven 6.3k • written 6.1 years ago by bipin ▴ 30

1

Entering edit mode

Any short-read aligner will do, the insert size range is defined by your data! You have to estimate it for example by mapping a subset of 10,000 or 100,000 read pairs on your reference and plotting the TLEN field of the output bam file. That will tell you your estimated insert size distribution, which then you use in bowtie with -I and -X.

With 50 nt reads, you have to carefully set the --score-min´,--mp, ´--rdg and --rfg paremeters because the read is very short and you might lose many of them because of too many mismatches.

Also, are you accepting 0 or 1 mismatch in the seed?

Are the reads from the same species as the reference?

ADD REPLY • link 6.1 years ago by Matteo Schiavinato ★ 3.6k

0

Entering edit mode

Thanks for your reply.

Apologies I forgot to specify that I am using bowtie and not bowtie2 since I find it is recommended if the reads < 50 bp.

The options I am giving to bowtie are -k 2 -m 2 --best --strata.

The maximum insert size in my case is 600 so I was giving a slightly higher number for -X i.e. 1000.

The reads are from the same species as reference i.e. mouse genome.

ADD REPLY • link 6.1 years ago by bipin ▴ 30

2

Entering edit mode

That is now how you should be doing it. Even if the people who made the library told you that the library size was 600 bp, it may not be the actual case. In reality, fragments tend to be smaller than you estimate them to be.

You can get actual insert sizes by using this method: C: Target fragment size versus final insert size

Edit: While there you could try bbmap.sh the mapper from BBMap suite for your mapping needs.

ADD REPLY • link 6.1 years ago by GenoMax 141k

1

Entering edit mode

So you should try to give a -I and -X interval that ranges around 600. Try to change the scoring function, the gap and the mismatch penalty to allow more gaps / mismatches, if that is not satisfying.

You'll find how to do it on the manuals of bowtie / bowtie2 / hisat2 / tophat2 (works the same way).

ADD REPLY • link 6.1 years ago by Matteo Schiavinato ★ 3.6k

score 2 · Answer 1 · 2018-03-01

2

Entering edit mode

6.1 years ago

colindaven 6.3k

I would use bwa aln, not bwa mem and certainly not bowtie1 for this. Bowtie1 will not align reads with indels (ie. cannot align split reads), which is going to cause major problems.

Other good short read aligners in my view include subread but I am not certain if it is good for very short reads.

ADD COMMENT • link 6.1 years ago by colindaven 6.3k