Question

Aligning read with non-genome encoded sequence

0

Entering edit mode

6.3 years ago

biplab ▴ 110

I have pair-end RNA seq data containing variable length of non-genome encoded sequence, what will be the best way to align them to reference genome? Thank you so much.

alignment rna-seq • 929 views

ADD COMMENT • link updated 6.3 years ago by Devon Ryan 104k • written 6.3 years ago by biplab ▴ 110

0

Entering edit mode

non-genome encoded sequence

Can you clarify what that means? Is this an artificial insert that was introduced into the genome?

ADD REPLY • link 6.3 years ago by GenoMax 141k

0

Entering edit mode

Sequencing reads came from 3' end of mRNA that contain polyA tail in R2 file. However R1 does not have any non-genome encoded sequence and align properly to the genome. Here non-genome encoded sequence is the polyT stretches in R2 reads.

ADD REPLY • link 6.3 years ago by biplab ▴ 110

1

Entering edit mode

Those should be easy to trim using bbduk.sh with literal=AAAAA option. bbmap.sh will also be able to softclip those tails as long as there is enough sequence present to allow a reasonable match. Both programs are part of BBMap suite.

ADD REPLY • link 6.3 years ago by GenoMax 141k

0

Entering edit mode

Thank you so much. I will look into BBMap.

ADD REPLY • link 6.3 years ago by biplab ▴ 110

score 1 · Answer 1 · 2018-01-08

1

Entering edit mode

6.3 years ago

Devon Ryan 104k

Either trim off the polyA tail on read 2 or use STAR (or another local aligner) and allow a lot of soft-clipping. I suspect that trimming read 2 a (then aligning the remaining pairs and orphans) will produce the best results.

ADD COMMENT • link 6.3 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks you so much. I will try soft-clipping and trimming to see which one work best. There might be some non-A bases in polyA tail because of sequencing error, what percent of error should I consider during trimming polyA tail?

ADD REPLY • link 6.3 years ago by biplab ▴ 110

0

Entering edit mode

With bbduk.sh you can remove all sequence to the right of the poly-A tail once it is located. So it does not matter what sequence may be there.

ADD REPLY • link 6.3 years ago by GenoMax 141k