Question

AB SOLiD small RNA-seq data handling

0

Entering edit mode

4.6 years ago

Rodrigo Streit ▴ 10

Hi everyone,

I need to work with SOLiD small RNA-seq libraries from SRA, and I have some questions about quality control and the to-do and not-to-do with this data.

1- Which is the best approach for quality checking my reads? I'm used to fastqc for quality checking but I'm not sure if it can handle colourspace reads (or handle it properly).

2- How I should filter/trim my reads? I may need to trim adaptors, but don't want to quality-trim my reads, as they represent entire RNAs (sRNAs) and trimming it could lead to artifacts. I would rather filter out reads with low quality bases. Which is a good tool for trimming adaptors/quality filtering the libraries and what range of quality values are considered acceptable (we usually apply a Q30 filter for illumina libraries)?

3 - As far as I'm concerned, SOLiD sequences a small portion of the 5' adapter, besides the possible sequencing of the 3' adapter. Should I take this into account during adaptor trimming or is it usually already removed?

4- This set of data has both data from SOLiD and Illumina (different libraries, of course), but for the sake of standardization we would like to use STAR for aligning all libraries. I don't know if it can handle colourspace reads, so I would like to know if it is ok to turn my colourspace reads into fastq, and at which point (quality checking, filtering our before alignment) could I do this. If its not advisable to do so, which tools are the best for SOLiD reads alignment.

I know I've asked a lot of questions, but I never did anything with this kind of data. If someone could at least point some directions, some papers that could help with one of this questions, it would be great.

Thanks!

RNA-Seq Quality control • 1.8k views

ADD COMMENT • link updated 3.1 years ago by predeus ★ 1.9k • written 4.6 years ago by Rodrigo Streit ▴ 10

1

Entering edit mode

3 - As far as I'm concerned, SOLiD sequences a small portion of the 5' adapter

Where did you get this information from? Can you provide a source?

Regarding your questions, I think the best quality metrics for SOLiD reads are those based on mapping. FastQC supports SOLiD, but (in my experience) be prepared to see some very bad metrics. As SOLiD are most useful when mapping, I wouldn't bother quality trimming. I never put too much thinking about adapter trimming for SOLiD reads, but I think the best result would be obtained after mapping.

This set of data has both data from SOLiD and Illumina (different libraries, of course), but for the sake of standardization we would like to use STAR for aligning all libraries.

The effect of sequencing technology will be grater than any effect different mapping software would introduce, so I think this is a pointless stardardization. Subread is a fast aligner that supports both basespace and colorspace reads, but you have to build one index for each technology anyway. As JC said, do not convert from colorspace to basespace. The main (advertised) advantage of SOLiD sequencing is the di-nucleotide sequencing is better to tell apart sequencing errors from mutations, when mapping to a reference genome. Or, to put it conversely, if you translate colorspace to basespace, when there is a sequencing error in colorspace, when one converts the read to basespace, all bases downstream the error will also be wrongly converted.

ADD REPLY • link 4.6 years ago by h.mon 35k

1

Entering edit mode

Where did you get this information from? Can you provide a source?

There's this article on NGS technologies that says SOLiD sequencing has 5 ligation rounds, and after the first one the primers added in order to start the new sequencing round anneals at -1 position from the previous one, so at least 4 bases from the 5' adapter should be sequenced. I wasn't sure if this sequence would be found in the read.

So, the whole idea is to deal with quality after mapping my libraries to my reference? This is very different from what I'm used to, the explanation was very helpful. Thank you so much!

ADD REPLY • link 4.6 years ago by Rodrigo Streit ▴ 10

score 4 · Answer 1 · 2019-08-29

4

Entering edit mode

4.6 years ago

JC 13k

Be prepared for many problems, working with colour-space reads is not so straightforward (another reason this is a dead-tech), I have many years not analysing this type of sequencing data, but here are my suggestions:

FastQC supports SOLiD reads, also don't be surprised for quality variations across the reads, if I remember correctly, there was a drop-down every 5 calls because of the technology.
cut-adapt supports SOLiD, and you can specify the parameters as you needed.
It should be removed, but to make sure you need to align your reads.
STAR doesn't support that, you can use Bowtie (v1) or SHRiMP, it not so easy to translate the colour-space to nucleotides because you need to know the exact order of di-nucleotides combination used in the ligation steps, it 's better to align to the colour-spaced genome.

ADD COMMENT • link 4.6 years ago by JC 13k

0

Entering edit mode

Thank you very much!

ADD REPLY • link 4.6 years ago by Rodrigo Streit ▴ 10

0

Entering edit mode

So apparently the newest bowtie (version 1.3) does not support colorspace? Absolutely mad.

ADD REPLY • link 3.1 years ago by predeus ★ 1.9k