Question

Analysing SOLiD colorspace reads for enriched peaks

0

Entering edit mode

9.6 years ago

James Ashmore ★ 3.4k

Hello, beginner in HT sequencing here!

I am analysing this GEO dataset which has been produced on an AB SOLiD platform. I downloaded the samples in .sra format and the reference genome in .fasta format. My project involves aligning the reads against the reference genome and calling the enriched peaks. However, I have run into a few problems and have confused myself along the way. Can a user who has had experience with AB SOLiD colorspace reads explain to me the correct way to analyse such data?

I first tried converting the .sra files to .csfasta and .qual files using abi-dump, then aligning them using bowtie against a colorspace index. This seemed to work fine until I tried to convert from .sam to .bam and an error about sequence lengths was thrown. I also tried using fastq-dump to convert from .sra to .fastq but I have read this isn't correct as converting from colorspace to basespace is errorsome. I've also followed the advice on this website which seemed promising, however the fastq output at the end, when assessed using FASTQC gave dreadful results and so I assumed there was a muck up in the conversion?

Ultimately, I have confused myself and would really appreciate any advice.

Thanks,
James

SOLiD • 2.8k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by James Ashmore ★ 3.4k

0

Entering edit mode

Hey James, Have a look at this post. It may be able to help you Error In Converting Sam To Bam By Samtools

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by Ashutosh Pandey 12k

Ram · Accepted Answer · 2014-09-22

2

Entering edit mode

9.6 years ago

Istvan Albert 100k

Here is a post that can help Transforming And Manipulating Color Space Reads

In a nutshell there are multiple ways to convert color space reads and tools often don't do a good job of explaining what they expect as input.

You best bet is to align with a color space aware tool like SHRiMP

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by Istvan Albert 100k

0

Entering edit mode

So I should convert my .sra files into .csfasta files (using abi-dump) and then align them against a .fasta file genome in SHRiMP? If that's the case, what am I missing out by not using the data in the .qual file? I would like to assess read quality as well.

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by James Ashmore ★ 3.4k

0

Entering edit mode

Turn your data into csfasta and align with SHRiMP. You might want to try bowtie1 as well.

I wouldn't worry about that too much. Qualities are important when calling SNPs

A ChIP-seq analysis is not going to change much. You are not going to ever see that data anyhow, mappers don't make use qualities during alignment.

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by Istvan Albert 100k