Question: SRA to BAM
2
gravatar for marina-orlova
3.5 years ago by
Russian Federation
marina-orlova70 wrote:

Hi everyone

Can you please help me to extract SAM file from SRA?

I took the dataset from here http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1208162 

Downloaded SRA file.

Then did sra-dump on this (as it is said on the page of dataset that reads are already aligned).

But as a result I got very small file (~50 Mb). When I tried to convert it to sorted BAM file I got:

[bam_header_read] EOF marker is absent. The input is probably truncated.

[sam_header_line_parse] expected '@XY', got [@HD VN:1.3]

Hint: The header tags must be tab-separated.
[samopen] no @SQ lines in the header.

I tried to look at summary information of this SRA file here:

http://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR951914

And didn't see any information about alignment

Tried to look at alignment information by command:

vdb-dump ./SRR951915.sra | grep "ALIGNMENT_COUNT"

got an error

vdb-dump.2.1.7 int: data bad version while constructing page map within virtual database module - VCursorCellData( col:PANEL at row #1074 ) failed

 

fastq-dump command led to an error: data bad version while constructing page map within virtual database module - failed SRR951914.sra

 

Any idea?

 

sam-dump sra • 2.9k views
ADD COMMENTlink modified 3.5 years ago by Evgeniia Golovina940 • written 3.5 years ago by marina-orlova70
3
gravatar for Evgeniia Golovina
3.5 years ago by
New Zealand
Evgeniia Golovina940 wrote:

Hi, yesterday I got a good output from fastq-dump.

Your command line is wrong. To run fastq-dump correctly you should know whether your reads are single or paired. In your case, we have single reads, then the command line will be:

./fastq-dump --split-spot SRR951914.sra 

For paired reads:

./fastq-dump --split-files file.sra

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Evgeniia Golovina940

Thank you for your answer, it helped

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by marina-orlova70

I read the parameter of   --split-spot, However, I am still don't quite understand what does it mean? Can you make a more clear interpretation to --split-spot? 

Read Splitting                     Sequence data may be used in raw form or split into individual reads
  --split-spot                     Split spots into individual reads

 

ADD REPLYlink written 2.6 years ago by Shicheng Guo6.9k
2

What does "spot" mean?

As I understand, a spot contains biological information (the reads themself) and technical information such as adapters, barcodes for multiplexing, etc. More about this here --> What Is A "Spot" In Sra Format

From SRA Hsndbook:

"The spot descriptor captures information that would allow the user of the SRA to interpret the sequencing data and differentiate between technical and application extents in the read. Reads that are mate pairs are concatenated into a single monolithic “spot” sequence." (http://www.ncbi.nlm.nih.gov/books/NBK54984/)

About fastq-dump options

Let's take an example - SRR385952.sra sample (http://www.ncbi.nlm.nih.gov/sra/?term=SRR385952). You can see that teh sample should contain forward and reverse sequences, each with length = 101. These sequences are joined in the SRA file and need to be split. You can do it by using:

1) --split-spot option: ./fastq-dump --split-spot SRR385952.sra This gives you a single file with the reverse read of each pair below the forward read for that pair

2) --split-files option: ./fastq-dump --split-files SRR385952.sra This outputs two fastq files: one for forward, another - for reverse reads.

You can find more info in this blog post --> https://nsaunders.wordpress.com/2011/12/22/sequencing-for-relics-from-the-sanger-era-part-1-getting-the-raw-data/

Hope, it will help.

PS. There is another option - --split-3 - which gives you a pair of fastq files, each corresponding record representing a pair of reads.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Evgeniia Golovina940
1
gravatar for Evgeniia Golovina
3.5 years ago by
New Zealand
Evgeniia Golovina940 wrote:

Hi, Marina

You mean sam-dump, right?


It seems to me, that you your file is not sam file. It's just raw reads. Look here for your sra file --> http://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR951914 (the tab "Reads")

Let me try to look at this dataset.

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Evgeniia Golovina940

Hi Evgeniia

Yes, you are right, it is raw reads. But fastq-dump also didn't work:

fastq-dump SRR951914.sra > 951914.fastq
2015-04-08T08:50:36 fastq-dump.2.1.7 err: data bad version while constructing page map within virtual database module - failed SRR951914.sra

Maybe sra archive is broken.. 

I took it from here: ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP%2FSRP028%2FSRP028808/SRR951914/

 

 

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by marina-orlova70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1710 users visited in the last hour