Question: RNA-seq downloaded dataset - how to proceed with analysis
0
gravatar for sim.j.baum
2.9 years ago by
sim.j.baum10
sim.j.baum10 wrote:

Hi,

I downloaded an interesting dataset of which I need to get some information out. The data looks like this:

readID Seq 0-misHit 1-misHit 2-misHit chr start end strand

HWUSI-EAS230-R:2:99:1151:1802#0/1 GAGCTCATTGGTGGCGTGGTGGCCTTGACCTTCCGG 1 0 0 chr10 70914936 70914971 -

HWUSI-EAS230-R:2:44:642:495#0/1 TTGGCTGCCTTCTGGGGTGAACTTTCTGCTATTTCC 0 0 1 chr7 47298110 47298145 -

...

In the GEO dataset (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22260) it is stated that the

"Alignment: Sequence reads were obtained and mapped to the hg18 (March, 2006) using the Illumina Genome Analyzer Pipeline. All reads mapping with two or fewer mismatches were retained"

My aim is to detect a splice variant of a certain gene. I tried to further process the data with cufflinks however get the error that the format is not correct (as it is not SAM or BAM).

I would really appreciate if someone could give me some hints and suggestions on what tool to use best and if I can work with this data format or have to change it?

rna-seq • 735 views
ADD COMMENTlink modified 2.9 years ago by charbo2440 • written 2.9 years ago by sim.j.baum10
2
gravatar for charbo24
2.9 years ago by
charbo2440
Michigan State University
charbo2440 wrote:

I think what you have is an Eland format: http://ccg.vital-it.ch/chipseq/elandformat.php

In which case, this thread might help you: How To Convert Eland File To Bam?

However, I'm not sure that the current version of Samtools supports the conversion.

ADD COMMENTlink written 2.9 years ago by charbo2440
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 666 users visited in the last hour