RNA-seq downloaded dataset - how to proceed with analysis
1
0
Entering edit mode
7.5 years ago
sim.j.baum ▴ 140

Hi,

I downloaded an interesting dataset of which I need to get some information out. The data looks like this:

readID Seq 0-misHit 1-misHit 2-misHit chr start end strand

HWUSI-EAS230-R:2:99:1151:1802#0/1 GAGCTCATTGGTGGCGTGGTGGCCTTGACCTTCCGG 1 0 0 chr10 70914936 70914971 -

HWUSI-EAS230-R:2:44:642:495#0/1 TTGGCTGCCTTCTGGGGTGAACTTTCTGCTATTTCC 0 0 1 chr7 47298110 47298145 -

...

In the GEO dataset (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22260) it is stated that the

"Alignment: Sequence reads were obtained and mapped to the hg18 (March, 2006) using the Illumina Genome Analyzer Pipeline. All reads mapping with two or fewer mismatches were retained"

My aim is to detect a splice variant of a certain gene. I tried to further process the data with cufflinks however get the error that the format is not correct (as it is not SAM or BAM).

I would really appreciate if someone could give me some hints and suggestions on what tool to use best and if I can work with this data format or have to change it?

RNA-Seq • 1.5k views
ADD COMMENT
2
Entering edit mode
7.5 years ago
charbo24 ▴ 40

I think what you have is an Eland format: http://ccg.vital-it.ch/chipseq/elandformat.php

In which case, this thread might help you: How To Convert Eland File To Bam?

However, I'm not sure that the current version of Samtools supports the conversion.

ADD COMMENT

Login before adding your answer.

Traffic: 1964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6