Question: Paired end data loaded into geo as single end runs ... How to extraxct the data
1
gravatar for ChIP
4.6 years ago by
ChIP500
Netherlands
ChIP500 wrote:

Hi,

This sample run http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM835232 at geo has the sra files in bit different manner submitted.

Such that the mate pairs are loaded to SRA as single end runs resulting in two files per sample.

My problem is, how can I get proper fastq files from these two SRA files.

I tried

fastq-dump -A SRR364680.sra

fastq-dump -A SRR384964.sra

and after that bowtie, but it doesn't work. Has anybody ever dealt with such a data, if yes how can I proceed to get unaligned FAstq files that can be used for alignment.

here is the head of the two fastqs

file 1

@SRR364680.sra.1 SFGF-GA2-1_63:2:112:1559:999 length=80
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+SRR364680.sra.1 SFGF-GA2-1_63:2:112:1559:999 length=80
################################################################################
@SRR364680.sra.2 SFGF-GA2-1_63:2:112:9048:999 length=80
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+SRR364680.sra.2 SFGF-GA2-1_63:2:112:9048:999 length=80
################################################################################
@SRR364680.sra.3 SFGF-GA2-1_63:2:112:10809:999 length=80
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

File 2

@SRR384964.sra.1 SFGF-GA2-1_63:2:14:1899:1000 length=80
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+SRR384964.sra.1 SFGF-GA2-1_63:2:14:1899:1000 length=80
################################################################################
@SRR384964.sra.2 SFGF-GA2-1_63:2:14:11711:999 length=80
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
+SRR384964.sra.2 SFGF-GA2-1_63:2:14:11711:999 length=80
################################################################################
@SRR384964.sra.3 SFGF-GA2-1_63:2:14:13989:1000 length=80
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

 

thank you

rna-seq samtools picard • 1.9k views
ADD COMMENTlink modified 4.1 years ago by skittely0 • written 4.6 years ago by ChIP500

Can you post the first 5 read names from two fastq files ? What's the error with bowtie ?

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by geek_y9.8k

please check the updated question.

 

ADD REPLYlink written 4.6 years ago by ChIP500

I am wondering, if it's paired end data, it will have the read1 and read2 information in the read name ( like #1,#2 or /1, /2 etc) to distinguish the read pairs. But I don't see them here. The read pairs (R1 and R2) should be in same order for alignment.

ADD REPLYlink written 4.6 years ago by geek_y9.8k
1
gravatar for Madelaine Gogol
4.6 years ago by
Madelaine Gogol5.1k
Kansas City
Madelaine Gogol5.1k wrote:

I wonder if this is what you're dealing with? How To Convert Sra-Lite Paired-End Submission To Fastq?

ADD COMMENTlink written 4.6 years ago by Madelaine Gogol5.1k

I don't think it is, since they submitter at GEO  submitted and loaded mate pairs to SRA as single end runs resulting in two files per sample. The split utility of fastq-dump is not helping in this case.

 

ADD REPLYlink written 4.6 years ago by ChIP500
0
gravatar for skittely
4.1 years ago by
skittely0
United States
skittely0 wrote:

Based on the original geo page for the .sra reads, it seems the reads are single end rather than paired. http://www.ncbi.nlm.nih.gov/sra?term=SRX105932 - if you see the Library section and click on "more..." then you can see the layout of the library. My guess is the two separate files are some sort of replicate.

ADD COMMENTlink written 4.1 years ago by skittely0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1588 users visited in the last hour