Illumina Rna-Seq Ambiguous Base Calls Identification
2
0
Entering edit mode
12.4 years ago
Agatha ▴ 350

Hello,

I am a newbie in sequencing data analysis and I am trying to run a complete analysis on raw RNA-seq in order to detect the miRNA variability and to possibly identify novel miRNAs. My data is in Illumina tag count format and before proceeding to the aligning to the genome step I should remove erroneous reads from the raw data. However, I do not know how to identify the reads containing erroneous base calls by using the count values. I have read that the error containing reads generated by Illumina contain adenines more than any other base per each read. Is it enough to remove these reads or should anything else be cosnidered?

The adaptors are removed and the sequences are in the following format

TTTTTTTTTTTTGTTTTTATGCTTTAGTCTTCTTTG  34

How can one identify the ambiguous base calls in order to remove those sequences?

Thank you!

next-gen sequencing rna illumina mirna • 3.7k views
ADD COMMENT
1
Entering edit mode
12.4 years ago
Rm 8.3k

My two cents...: mirdeep2 is one of the good methods to deal with miRNA analysis...both for known and novel miRNA detection.

BTW with "Illumina tag count format" I dont think with this in hand you can touch on erroneous base calls...try look into corresponding fastqs and apply quality filters their...you can use fastx toolkit to handle that.

ADD COMMENT
0
Entering edit mode
12.4 years ago

The normal process is to align to the genome/transcriptome (after removing adapters). Then, errors (as well as polymorphisms) are noted as differences between the aligned read and the reference. I suggest that you find a software package or tutorial designed for dealing with miRNA data and follow those instructions, as a first step.

ADD COMMENT
0
Entering edit mode

@ Sean Devis -The adapters been removed...the rows look something like AAAAAGAGAAAAAAATTGTTTTTCGTGTGTTGTTTT 1 . So don't I need to remove the reads containing ambiguous bases before aligning them ? Sorry, but I have got a bit confused by all these tutorials..and, I don't really understand this format...so from qseq to fast-q to tag_count? How should it be approached?

ADD REPLY

Login before adding your answer.

Traffic: 2338 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6