Question

.sra to bam conversion results in a "< myfile > does not have BAM or CRAM format error"

1

Entering edit mode

6.1 years ago

jordi.planells ▴ 480

Hi all! I am having problems and I hope I can get some help from you.

I will explain my situation: I'm trying to perform a PCA analysis to see how different several bam files are. I'm using the next pipeline:

Getting the accession files. I am using the R library "SRAdb", so I am getting 4 files in .sra format.
I use SRA-tools in order to convert the .sra file into .bam format with the following code:

sam-dump -r --min-mapq 25 $file | samtools view -bS > $file.bam
Sort samtools sort $file -o $file_sorted
Index samtools index $file_sorted $file_sorted.bai
Compute a matrix to generate the PCA plot

multiBamSummary bins -b $files.bam -o my/out/path --smartLabels -bs 10000 -p 2

At this point I'm getting the following error:

The file < myfile > does not have BAM or CRAM format.

I haven't been able to trace the error, as any of the earlier steps reported any source of error. Any suggestions? (ideally I would like to skip the alignment step, I want to keep the file as original as possible)

sra-tools --version 2.9.1_1
samtools --version 1.9
deeptools --version 3.3.0

Thanks before hand!!

SRA-toolkit samtools deeptools • 3.9k views

ADD COMMENT • link updated 6.1 years ago by GenoMax 152k • written 6.1 years ago by jordi.planells ▴ 480

1

Entering edit mode

Can you post example accession numbers so we can see what data you are looking at?

ADD REPLY • link 6.1 years ago by GenoMax 152k

0

Entering edit mode

The accession number I am looking at is SRP060510, which consists of 4 samples: SRR2089860, SRR2089861, SRR2089862, SRR2089863

ADD REPLY • link 6.1 years ago by jordi.planells ▴ 480

0

Entering edit mode

Take a look at the BAM files you've generated - probably there's something wrong with the format. Are these aligned files you are downloading from SRA?

You can also try samtools quickcheck on the BAM files you've generated.

ADD REPLY • link 6.1 years ago by predeus ★ 2.1k

0

Entering edit mode

I have checked. I get the following message: SRR2089860.bam had no targets in header (for all 4 of them)

ADD REPLY • link 6.1 years ago by jordi.planells ▴ 480

0

Entering edit mode

Any errors? Are you sure the SAM file you are dumping is even aligned? Seconding predeus, check quickcheck

ADD REPLY • link 6.1 years ago by ATpoint 88k

0

Entering edit mode

I have performed other operations:

vdb-validate -> everything seems to be fine

fastq-dump -> resulting in the following error "Error: reads file does not look like a FASTQ file"

ADD REPLY • link 6.1 years ago by jordi.planells ▴ 480

0

Entering edit mode

6.1 years ago

ATpoint 88k

Hi jordi.planells, if you ask for help, please always post full command lines so that others can reproduce the problem. There are plenty of pitfalls using these commands that cannot be reproduced by only telling which tool you used.

Essentially, to download sra files or fastq files, you can simply follow Fast download of FASTQ files from the European Nucleotide Archive (ENA) and then proceed with alignment. The tutorials covers both fastq download from the ENA or sra from NCBI.

ADD COMMENT • link 6.1 years ago by ATpoint 88k

0

Entering edit mode

I have posted more commands today because I have performed further checks this morning as I was not able to get the .bam (following the suggestions from other users). Thank you for the tutorial, I will give it a shot and try to get the fastq files how is explained there.

ADD REPLY • link 6.1 years ago by jordi.planells ▴ 480

score 3 · Accepted Answer · 2019-05-30

Let us use one of the example accession numbers above (SRR2089860). These are single-end reads.

Your options are:

Use fastq-dump to dump the reads out in fastq format (remove -X 5 for full set)

$ fastq-dump -X 5 SRR2089860
Read 5 spots for SRR2089860
Written 5 spots for SRR2089860

Use sam-dump to create fastq format files

$ sam-dump --fastq SRR2089860 | head -16
@HWI-D00473:169:HFK7WADXX:1:1101:1202:2011/1 unaligned
NGAGTCTATACTCGTTACATTCGCGTAACTCATTGTTAATCGCGAAGTTGA
+
#1=DDDDFGHHGHJJJIJIIJGIIJJIJHICGIIIJJJIJGIJEHJIGIIG
@HWI-D00473:169:HFK7WADXX:1:1101:1195:2074/1 unaligned
CTCGAACTCCTCGTAGTGGCGATTGTCGGTGCTGCCCACCAGGTCCACTGT
+
CCCFFFFFHGHHHJIJIJJJJHIIGGGHIECEHFHGIEFIGGJGHJIIGIG
@HWI-D00473:169:HFK7WADXX:1:1101:1230:2087/1 unaligned
TGCCGGGAATTGTACAGTGCTCAGCTTTATAGGACATTTCCAAACAGTTAT
+
BBBFFFF8FHHHHJJJIJJJJIJGJJIJFGJIFGIIIJJJIGIEIIIIJGG
@HWI-D00473:169:HFK7WADXX:1:1101:1222:2168/1 unaligned
CCGAGACTTGCCTGCTCACCAGCGAAGAGGGCGAGGAGCGTTTGACGGCCG
+
@@CDDADDHFHHHIIIIIHGGE<GEGIEHIGIIDHGHGGIHHHEFFFCCCB

Use sam-dump to write SAM format files. This data appears to be unaligned (so --min-mapq should not affect anything, you can check).

$ sam-dump -r SRR2089860 | head -4
HWI-D00473:169:HFK7WADXX:1:1101:1202:2011       4       *       0       0       *       *       0       0       NGAGTCTATACTCGTTACATTCGCGTAACTCATTGTTAATCGCGAAGTTGA     #1=DDDDFGHHGHJJJIJIIJGIIJJIJHICGIIIJJJIJGIJEHJIGIIG
HWI-D00473:169:HFK7WADXX:1:1101:1195:2074       4       *       0       0       *       *       0       0       CTCGAACTCCTCGTAGTGGCGATTGTCGGTGCTGCCCACCAGGTCCACTGT     CCCFFFFFHGHHHJIJIJJJJHIIGGGHIECEHFHGIEFIGGJGHJIIGIG
HWI-D00473:169:HFK7WADXX:1:1101:1230:2087       4       *       0       0       *       *       0       0       TGCCGGGAATTGTACAGTGCTCAGCTTTATAGGACATTTCCAAACAGTTAT     BBBFFFF8FHHHHJJJIJJJJIJGJJIJFGJIFGIIIJJJIGIEIIIIJGG
HWI-D00473:169:HFK7WADXX:1:1101:1222:2168       4       *       0       0       *       *       0       0       CCGAGACTTGCCTGCTCACCAGCGAAGAGGGCGAGGAGCGTTTGACGGCCG     @@CDDADDHFHHHIIIIIHGGE<GEGIEHIGIIDHGHGGIHHHEFFFCCCB

To do PCA analysis you will need to align fastq data to reference, count aligned reads to get an expression estimate. You could also use something like salmon to align to transcriptome to get counts.