.sra to bam conversion results in a "< myfile > does not have BAM or CRAM format error"
2
1
Entering edit mode
2.5 years ago

Hi all! I am having problems and I hope I can get some help from you.

I will explain my situation: I'm trying to perform a PCA analysis to see how different several bam files are. I'm using the next pipeline:

  1. Getting the accession files. I am using the R library "SRAdb", so I am getting 4 files in .sra format.
  2. I use SRA-tools in order to convert the .sra file into .bam format with the following code:

    sam-dump -r --min-mapq 25 $file | samtools view -bS > $file.bam

  3. Sort samtools sort $file -o $file_sorted

  4. Index samtools index $file_sorted $file_sorted.bai

  5. Compute a matrix to generate the PCA plot

    multiBamSummary bins -b $files.bam -o my/out/path --smartLabels -bs 10000 -p 2

At this point I'm getting the following error:

The file < myfile > does not have BAM or CRAM format.

I haven't been able to trace the error, as any of the earlier steps reported any source of error. Any suggestions? (ideally I would like to skip the alignment step, I want to keep the file as original as possible)

  • sra-tools --version 2.9.1_1
  • samtools --version 1.9
  • deeptools --version 3.3.0

Thanks before hand!!

SRA-toolkit samtools deeptools • 1.2k views
ADD COMMENT
1
Entering edit mode

Can you post example accession numbers so we can see what data you are looking at?

ADD REPLY
0
Entering edit mode

The accession number I am looking at is SRP060510, which consists of 4 samples: SRR2089860, SRR2089861, SRR2089862, SRR2089863

ADD REPLY
0
Entering edit mode

Take a look at the BAM files you've generated - probably there's something wrong with the format. Are these aligned files you are downloading from SRA?

You can also try samtools quickcheck on the BAM files you've generated.

ADD REPLY
0
Entering edit mode

I have checked. I get the following message: SRR2089860.bam had no targets in header (for all 4 of them)

ADD REPLY
0
Entering edit mode

Any errors? Are you sure the SAM file you are dumping is even aligned? Seconding predeus, check quickcheck

ADD REPLY
0
Entering edit mode

I have performed other operations:

vdb-validate -> everything seems to be fine

fastq-dump -> resulting in the following error "Error: reads file does not look like a FASTQ file"

ADD REPLY
3
Entering edit mode
2.5 years ago
GenoMax 109k

Let us use one of the example accession numbers above (SRR2089860). These are single-end reads.

Your options are:

Use fastq-dump to dump the reads out in fastq format (remove -X 5 for full set)

$ fastq-dump -X 5 SRR2089860
Read 5 spots for SRR2089860
Written 5 spots for SRR2089860

Use sam-dump to create fastq format files

$ sam-dump --fastq SRR2089860 | head -16
@HWI-D00473:169:HFK7WADXX:1:1101:1202:2011/1 unaligned
NGAGTCTATACTCGTTACATTCGCGTAACTCATTGTTAATCGCGAAGTTGA
+
#1=DDDDFGHHGHJJJIJIIJGIIJJIJHICGIIIJJJIJGIJEHJIGIIG
@HWI-D00473:169:HFK7WADXX:1:1101:1195:2074/1 unaligned
CTCGAACTCCTCGTAGTGGCGATTGTCGGTGCTGCCCACCAGGTCCACTGT
+
CCCFFFFFHGHHHJIJIJJJJHIIGGGHIECEHFHGIEFIGGJGHJIIGIG
@HWI-D00473:169:HFK7WADXX:1:1101:1230:2087/1 unaligned
TGCCGGGAATTGTACAGTGCTCAGCTTTATAGGACATTTCCAAACAGTTAT
+
BBBFFFF8FHHHHJJJIJJJJIJGJJIJFGJIFGIIIJJJIGIEIIIIJGG
@HWI-D00473:169:HFK7WADXX:1:1101:1222:2168/1 unaligned
CCGAGACTTGCCTGCTCACCAGCGAAGAGGGCGAGGAGCGTTTGACGGCCG
+
@@CDDADDHFHHHIIIIIHGGE<GEGIEHIGIIDHGHGGIHHHEFFFCCCB

Use sam-dump to write SAM format files. This data appears to be unaligned (so --min-mapq should not affect anything, you can check).

$ sam-dump -r SRR2089860 | head -4
HWI-D00473:169:HFK7WADXX:1:1101:1202:2011       4       *       0       0       *       *       0       0       NGAGTCTATACTCGTTACATTCGCGTAACTCATTGTTAATCGCGAAGTTGA     #1=DDDDFGHHGHJJJIJIIJGIIJJIJHICGIIIJJJIJGIJEHJIGIIG
HWI-D00473:169:HFK7WADXX:1:1101:1195:2074       4       *       0       0       *       *       0       0       CTCGAACTCCTCGTAGTGGCGATTGTCGGTGCTGCCCACCAGGTCCACTGT     CCCFFFFFHGHHHJIJIJJJJHIIGGGHIECEHFHGIEFIGGJGHJIIGIG
HWI-D00473:169:HFK7WADXX:1:1101:1230:2087       4       *       0       0       *       *       0       0       TGCCGGGAATTGTACAGTGCTCAGCTTTATAGGACATTTCCAAACAGTTAT     BBBFFFF8FHHHHJJJIJJJJIJGJJIJFGJIFGIIIJJJIGIEIIIIJGG
HWI-D00473:169:HFK7WADXX:1:1101:1222:2168       4       *       0       0       *       *       0       0       CCGAGACTTGCCTGCTCACCAGCGAAGAGGGCGAGGAGCGTTTGACGGCCG     @@CDDADDHFHHHIIIIIHGGE<GEGIEHIGIIDHGHGGIHHHEFFFCCCB

To do PCA analysis you will need to align fastq data to reference, count aligned reads to get an expression estimate. You could also use something like salmon to align to transcriptome to get counts.

ADD COMMENT
0
Entering edit mode

Thanks for your help! this worked smoothly.

ADD REPLY
0
Entering edit mode
2.5 years ago
ATpoint 55k

Hi jordi.planells, if you ask for help, please always post full command lines so that others can reproduce the problem. There are plenty of pitfalls using these commands that cannot be reproduced by only telling which tool you used.

Essentially, to download sra files or fastq files, you can simply follow Fast download of FASTQ files from the European Nucleotide Archive (ENA) and then proceed with alignment. The tutorials covers both fastq download from the ENA or sra from NCBI.

ADD COMMENT
0
Entering edit mode

I have posted more commands today because I have performed further checks this morning as I was not able to get the .bam (following the suggestions from other users). Thank you for the tutorial, I will give it a shot and try to get the fastq files how is explained there.

ADD REPLY

Login before adding your answer.

Traffic: 1786 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6