Question

How to trim illumina reads for poly-A tail and mini-exon presence

0

Entering edit mode

8.1 years ago

Buffo ★ 2.4k

I want to assemble my illumina reads from RNA-seq project, I have a reference genome, so, I want to assemble the transcripts with the reference using cufflinks, but my reads comes from a parasite that have a complicated transcriptional maturation that includes insertion of 35 nucleotides (mini-exon) and poly-A tailing for future mRNA translation, so I want to process the 35 mini-exon nucleotides and poly-A tail before assembly. How can I do that? I just had tried to trimming using a simple grep, but I have 34 million of reads where the miniexon could be present with insertions or deletions and grep does not work in that cases, does somebody knows about a perl or python script for do this? Thanks :-)

RNA-Seq Assembly • 2.5k views

ADD COMMENT • link updated 4.1 years ago by Biostar 20 • written 8.1 years ago by Buffo ★ 2.4k

0

Entering edit mode

Are the 35 nucleotides the same or are they somehow individually derived from every mRNA?

ADD REPLY • link 8.1 years ago by GenoMax 141k

0

Entering edit mode

thanks for answer, they are the same 35 nucleotides for each mature mRNA, it`s called mini-exon and it is not present in the genome.

ADD REPLY • link 8.1 years ago by Buffo ★ 2.4k

0

Entering edit mode

thats why I want to remove from the reads in order to improve the assembly

ADD REPLY • link 8.1 years ago by Buffo ★ 2.4k

score 0 · Answer 1 · 2016-04-04

0

Entering edit mode

8.1 years ago

GenoMax 141k

You could use BBDuk from BBMap tools. Add the 35 nucleotides to the "adapters.fa" file (as a fasta entry) in the "resources" directory as a separate entry. If you expect AAAA's to show up then you could add an entry for that as well. BBDuk will trim reads to the right (ktrim=r) when it encounters the sequences in the adapter file in your reads.

ADD COMMENT • link 8.1 years ago by GenoMax 141k

0

Entering edit mode

Thank you for answer and recomendation @genomax2, actually I did that with grep command redirectioning the output, and i found some results (about 1,000 reads with these adaptors) and i search it in 5-3, 3-5, but, i have a litle bit more than 34 million reads, and as we expected, that mini-exon presence is not excent of insertions or snps, so, with that commands you only trimms adaptors without insertions or deletions, im pretty sure that it can be solved by using a python or R script :-( but i will keep trying.

ADD REPLY • link 8.1 years ago by Buffo ★ 2.4k

0

Entering edit mode

That is the reason you want to use BBDuk and an entry like following appended to the "adapters.fa" file.

>Seq_min_exon
Sequence_here
>Poly_A
AAAAAAAAAAAAAAAAAAAAAA

You can allow for errors by using the hdist= option (this is hamming distance). BBDuk will automatically search in both strand orientations of the sequence and partial matches.
You could always remove the poly-A tails/adapters first and then deal with the min-exon down the road.

ADD REPLY • link 8.1 years ago by GenoMax 141k

0

Entering edit mode

I haven`t read enough, my fault, thank you!!

ADD REPLY • link 8.1 years ago by Buffo ★ 2.4k