Question

Illumina Rna-Seq Ambiguous Base Calls Identification

0

Entering edit mode

12.4 years ago

Agatha ▴ 350

Hello,

I am a newbie in sequencing data analysis and I am trying to run a complete analysis on raw RNA-seq in order to detect the miRNA variability and to possibly identify novel miRNAs. My data is in Illumina tag count format and before proceeding to the aligning to the genome step I should remove erroneous reads from the raw data. However, I do not know how to identify the reads containing erroneous base calls by using the count values. I have read that the error containing reads generated by Illumina contain adenines more than any other base per each read. Is it enough to remove these reads or should anything else be cosnidered?

The adaptors are removed and the sequences are in the following format

TTTTTTTTTTTTGTTTTTATGCTTTAGTCTTCTTTG  34

How can one identify the ambiguous base calls in order to remove those sequences?

Thank you!

next-gen sequencing rna illumina mirna • 3.7k views

ADD COMMENT • link updated 12.4 years ago by Rm 8.3k • written 12.4 years ago by Agatha ▴ 350

score 1 · Answer 1 · 2011-11-30

My two cents...: mirdeep2 is one of the good methods to deal with miRNA analysis...both for known and novel miRNA detection.

BTW with "Illumina tag count format" I dont think with this in hand you can touch on erroneous base calls...try look into corresponding fastqs and apply quality filters their...you can use fastx toolkit to handle that.

score 0 · Answer 2 · 2011-11-30

0

Entering edit mode

12.4 years ago

Sean Davis 26k

The normal process is to align to the genome/transcriptome (after removing adapters). Then, errors (as well as polymorphisms) are noted as differences between the aligned read and the reference. I suggest that you find a software package or tutorial designed for dealing with miRNA data and follow those instructions, as a first step.

ADD COMMENT • link 12.4 years ago by Sean Davis 26k

0

Entering edit mode

@ Sean Devis -The adapters been removed...the rows look something like AAAAAGAGAAAAAAATTGTTTTTCGTGTGTTGTTTT 1 . So don't I need to remove the reads containing ambiguous bases before aligning them ? Sorry, but I have got a bit confused by all these tutorials..and, I don't really understand this format...so from qseq to fast-q to tag_count? How should it be approached?

ADD REPLY • link 12.4 years ago by Agatha ▴ 350