I have pair-end RNA seq data containing variable length of non-genome encoded sequence, what will be the best way to align them to reference genome? Thank you so much.
I have pair-end RNA seq data containing variable length of non-genome encoded sequence, what will be the best way to align them to reference genome? Thank you so much.
Either trim off the polyA tail on read 2 or use STAR (or another local aligner) and allow a lot of soft-clipping. I suspect that trimming read 2 a (then aligning the remaining pairs and orphans) will produce the best results.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Can you clarify what that means? Is this an artificial insert that was introduced into the genome?
Sequencing reads came from 3' end of mRNA that contain polyA tail in R2 file. However R1 does not have any non-genome encoded sequence and align properly to the genome. Here non-genome encoded sequence is the polyT stretches in R2 reads.
Those should be easy to trim using
bbduk.sh
withliteral=AAAAA
option.bbmap.sh
will also be able to softclip those tails as long as there is enough sequence present to allow a reasonable match. Both programs are part of BBMap suite.Thank you so much. I will look into BBMap.