Question

RNA-seq downloaded dataset - how to proceed with analysis

0

Entering edit mode

7.5 years ago

sim.j.baum ▴ 140

Hi,

I downloaded an interesting dataset of which I need to get some information out. The data looks like this:

readID Seq 0-misHit 1-misHit 2-misHit chr start end strand

HWUSI-EAS230-R:2:99:1151:1802#0/1 GAGCTCATTGGTGGCGTGGTGGCCTTGACCTTCCGG 1 0 0 chr10 70914936 70914971 -

HWUSI-EAS230-R:2:44:642:495#0/1 TTGGCTGCCTTCTGGGGTGAACTTTCTGCTATTTCC 0 0 1 chr7 47298110 47298145 -

...

In the GEO dataset (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22260) it is stated that the

"Alignment: Sequence reads were obtained and mapped to the hg18 (March, 2006) using the Illumina Genome Analyzer Pipeline. All reads mapping with two or fewer mismatches were retained"

My aim is to detect a splice variant of a certain gene. I tried to further process the data with cufflinks however get the error that the format is not correct (as it is not SAM or BAM).

I would really appreciate if someone could give me some hints and suggestions on what tool to use best and if I can work with this data format or have to change it?

RNA-Seq • 1.5k views

ADD COMMENT • link updated 3.4 years ago by Biostar 20 • written 7.5 years ago by sim.j.baum ▴ 140

score 2 · Accepted Answer · 2016-10-14

2

Entering edit mode

7.5 years ago

charbo24 ▴ 40

I think what you have is an Eland format: http://ccg.vital-it.ch/chipseq/elandformat.php

In which case, this thread might help you: How To Convert Eland File To Bam?

However, I'm not sure that the current version of Samtools supports the conversion.

ADD COMMENT • link 7.5 years ago by charbo24 ▴ 40