Aligning read with non-genome encoded sequence
1
0
Entering edit mode
6.3 years ago
biplab ▴ 110

I have pair-end RNA seq data containing variable length of non-genome encoded sequence, what will be the best way to align them to reference genome? Thank you so much.

alignment rna-seq • 929 views
ADD COMMENT
0
Entering edit mode

non-genome encoded sequence

Can you clarify what that means? Is this an artificial insert that was introduced into the genome?

ADD REPLY
0
Entering edit mode

Sequencing reads came from 3' end of mRNA that contain polyA tail in R2 file. However R1 does not have any non-genome encoded sequence and align properly to the genome. Here non-genome encoded sequence is the polyT stretches in R2 reads.

ADD REPLY
1
Entering edit mode

Those should be easy to trim using bbduk.sh with literal=AAAAA option. bbmap.sh will also be able to softclip those tails as long as there is enough sequence present to allow a reasonable match. Both programs are part of BBMap suite.

ADD REPLY
0
Entering edit mode

Thank you so much. I will look into BBMap.

ADD REPLY
1
Entering edit mode
6.3 years ago

Either trim off the polyA tail on read 2 or use STAR (or another local aligner) and allow a lot of soft-clipping. I suspect that trimming read 2 a (then aligning the remaining pairs and orphans) will produce the best results.

ADD COMMENT
0
Entering edit mode

Thanks you so much. I will try soft-clipping and trimming to see which one work best. There might be some non-A bases in polyA tail because of sequencing error, what percent of error should I consider during trimming polyA tail?

ADD REPLY
0
Entering edit mode

With bbduk.sh you can remove all sequence to the right of the poly-A tail once it is located. So it does not matter what sequence may be there.

ADD REPLY

Login before adding your answer.

Traffic: 2680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6