Question

How Do I Map, Align, And Plot My Solid Results?

5

Entering edit mode

14.3 years ago

Jason ▴ 920

Hi, I recently performed an RNA immunoprecipitation followed by SOLiD sequencing (50 bp fragmented reads). I haven't received my first SOLiD sequencing results yet, but I was told I should have them soon. I've tried doing my own research on how to map, align, and plot my results but I don't have a concrete workflow as to how I will analyze my results yet. I have very little experience doing any programming and would prefer to use galaxy. There are labs on my campus I can go to to get my color space data mapped, but I would like to do things myself. Is there a way on galaxy (or another program) to convert my color space data to sequence, then map those reads to the yeast transcriptome and analyze it? Even if you can't answer my question directly I'd appreciate any tips from anyone who has worked with RNA-seq data already.

Thanks in advance

galaxy rna solid • 9.3k views

ADD COMMENT • link updated 6 months ago by Ram 43k • written 14.3 years ago by Jason ▴ 920

Ram · Answer 1 · 2010-03-06

First of all, you should not convert to base sequence first and then map - you should do the mapping directly on the color-space reads. The short-read mapper will (typically) report the genome matches for you in base sequence format. There are several short read mappers / aligners that handle color space alignment: Bowtie, BFAST, BWA, SHRiMP, PerM and many others including ABI's own mapreads and Bioscope. You can get the mapping output in SAM format, a handy format which contains a lot of information about the alignments and which you can manipulate in Galaxy (via the NGS: SAM tools menu) to get the "pileup" of reads in certain regions and so on.

Edit: I just noticed that Galaxy now features Bowtie mapping for color space.

Ram · Answer 2 · 2010-01-22

4

Entering edit mode

14.2 years ago

Istvan Albert 100k

Personally I would advise that if you know someone who can partially perform the task you should have them do it, and ask them to explain and show it to you how they've done it.

The task at hand is complex. The solution always depends immensely on the particulars of the problem, moreover you will be facing myriads of frustrating limitations, errors and problems.

Learning directly from someone who has done it, establishing a personal rapport with them will allow you to ease into this problem domain. In fact when you are finished mapping your RNA - your are still likely to be far from being done - yet you might have expanded a lot of energy and excitement.

ADD COMMENT • link updated 6 months ago by Ram 43k • written 14.2 years ago by Istvan Albert 100k

0

Entering edit mode

Thanks, I think I'll try getting some help from the bioinformaticists here. In addition, I recently came across some other possibilities, have you or anyone here tried using CLC genomics workbench 3 (http://www.clcbio.com/index.php?id=1240) or SeqWeb GCG Wisconsin Sequence Analysis Package (http://www.hmc.psu.edu/core/computer/seqweb.htm)? I know the SeqWeb is described in vague terms, but the CLC GW3 provides a means to do everything I need, in theory.

ADD REPLY • link updated 6 months ago by Ram 43k • written 14.2 years ago by Jason ▴ 920

0

Entering edit mode

you should ask questions separately not in the comments - those can get lost

ADD REPLY • link 14.2 years ago by Istvan Albert 100k

Ram · Answer 3 · 2010-02-19

2

Entering edit mode

14.2 years ago

Allen Yu ▴ 200

You can try BWA as well: http://maq.sourceforge.net/bwa-man.shtml

ADD COMMENT • link updated 6 months ago by Ram 43k • written 14.2 years ago by Allen Yu ▴ 200

0

Entering edit mode

Use the direct url, as it has been split from the MAQ project.

ADD REPLY • link updated 4.6 years ago by Ram 43k • written 14.1 years ago by Jonathan Manning ▴ 630

Ram · Answer 4 · 2010-09-25

RNA-Seq Data Analysis Tools

rQuant.web – is a web service to provide convenient access to tools for the quantitative analysis of RNA-Seq data. It allows to determine abundances of multiple transcripts per gene locus from RNA-Seq measurements. rQuant.web is available free of charge, to all users as a tool in a Galaxy installation.

Scripture – is a method for transcriptome reconstruction that relies solely on RNA-Seq reads and an assembled genome to build a transcriptome ab initio.

Cufflinks – assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.

SpliceMap – SpliceMap is a de novo splice junction discovery tool. It offers high sensitivity and support for arbitrarily long RNA-seq read lengths.

TopHat – is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.

PALMapper – a combination of the spliced alignment method QPALMA with the short read alignment tool GenomeMapper. The resulting method, called PALMapper, efficiently computes both spliced and unspliced alignments at high accuracy while taking advantage of base quality information and splice site predictions.

RNA-MATE – A recursive mapping strategy for high-throughput RNA-sequencing data.

ERANGE – Mapping and Quantifying Mammalian Transcriptomes by RNA-Seq

SeqMap – A Tool For Mapping Millions Of Short Sequences To The Genome.

Bioconductor – Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data.

BWA – BWA is a fast light-weighted tool that aligns relatively short sequences (queries) to a sequence database (targe), such as the human reference genome.

CisGenome – An integrated tool for tiling array, ChIP-seq, genome and cis-regulatory element analysis.

GenePattern – is a powerful genomic analysis platform that provides access to more than 100 tools for gene expression analysis, proteomics, SNP analysis and common data processing tasks. A web-based interface provides easy access to these tools and allows the creation of multi-step analysis pipelines that enable reproducible in silico research.

Galaxy – Mapping pipeline for Illumina, 454, and SOLiD sequencing data.

MAQ – stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences.

UCSC Genome Browser – This site contains the reference sequence and working draft assemblies for a large collection of genomes. It also provides portals to the ENCODE and Neandertal projects.