Question: .sra to bam conversion results in a "< myfile > does not have BAM or CRAM format error"
1
gravatar for jordi.planells
5 months ago by
jordi.planells90 wrote:

Hi all! I am having problems and I hope I can get some help from you.

I will explain my situation: I'm trying to perform a PCA analysis to see how different several bam files are. I'm using the next pipeline:

  1. Getting the accession files. I am using the R library "SRAdb", so I am getting 4 files in .sra format.
  2. I use SRA-tools in order to convert the .sra file into .bam format with the following code:

    sam-dump -r --min-mapq 25 $file | samtools view -bS > $file.bam

  3. Sort samtools sort $file -o $file_sorted

  4. Index samtools index $file_sorted $file_sorted.bai

  5. Compute a matrix to generate the PCA plot

    multiBamSummary bins -b $files.bam -o my/out/path --smartLabels -bs 10000 -p 2

At this point I'm getting the following error:

The file < myfile > does not have BAM or CRAM format.

I haven't been able to trace the error, as any of the earlier steps reported any source of error. Any suggestions? (ideally I would like to skip the alignment step, I want to keep the file as original as possible)

  • sra-tools --version 2.9.1_1
  • samtools --version 1.9
  • deeptools --version 3.3.0

Thanks before hand!!

ADD COMMENTlink modified 5 months ago by genomax74k • written 5 months ago by jordi.planells90
1

Can you post example accession numbers so we can see what data you are looking at?

ADD REPLYlink modified 5 months ago • written 5 months ago by genomax74k

The accession number I am looking at is SRP060510, which consists of 4 samples: SRR2089860, SRR2089861, SRR2089862, SRR2089863

ADD REPLYlink written 5 months ago by jordi.planells90

Take a look at the BAM files you've generated - probably there's something wrong with the format. Are these aligned files you are downloading from SRA?

You can also try samtools quickcheck on the BAM files you've generated.

ADD REPLYlink written 5 months ago by predeus1.2k

I have checked. I get the following message: SRR2089860.bam had no targets in header (for all 4 of them)

ADD REPLYlink written 5 months ago by jordi.planells90

Any errors? Are you sure the SAM file you are dumping is even aligned? Seconding predeus, check quickcheck

ADD REPLYlink written 5 months ago by ATpoint26k

I have performed other operations:

vdb-validate -> everything seems to be fine

fastq-dump -> resulting in the following error "Error: reads file does not look like a FASTQ file"

ADD REPLYlink modified 5 months ago • written 5 months ago by jordi.planells90
3
gravatar for genomax
5 months ago by
genomax74k
United States
genomax74k wrote:

Let us use one of the example accession numbers above (SRR2089860). These are single-end reads.

Your options are:

Use fastq-dump to dump the reads out in fastq format (remove -X 5 for full set)

$ fastq-dump -X 5 SRR2089860
Read 5 spots for SRR2089860
Written 5 spots for SRR2089860

Use sam-dump to create fastq format files

$ sam-dump --fastq SRR2089860 | head -16
@HWI-D00473:169:HFK7WADXX:1:1101:1202:2011/1 unaligned
NGAGTCTATACTCGTTACATTCGCGTAACTCATTGTTAATCGCGAAGTTGA
+
#1=DDDDFGHHGHJJJIJIIJGIIJJIJHICGIIIJJJIJGIJEHJIGIIG
@HWI-D00473:169:HFK7WADXX:1:1101:1195:2074/1 unaligned
CTCGAACTCCTCGTAGTGGCGATTGTCGGTGCTGCCCACCAGGTCCACTGT
+
CCCFFFFFHGHHHJIJIJJJJHIIGGGHIECEHFHGIEFIGGJGHJIIGIG
@HWI-D00473:169:HFK7WADXX:1:1101:1230:2087/1 unaligned
TGCCGGGAATTGTACAGTGCTCAGCTTTATAGGACATTTCCAAACAGTTAT
+
BBBFFFF8FHHHHJJJIJJJJIJGJJIJFGJIFGIIIJJJIGIEIIIIJGG
@HWI-D00473:169:HFK7WADXX:1:1101:1222:2168/1 unaligned
CCGAGACTTGCCTGCTCACCAGCGAAGAGGGCGAGGAGCGTTTGACGGCCG
+
@@CDDADDHFHHHIIIIIHGGE<GEGIEHIGIIDHGHGGIHHHEFFFCCCB

Use sam-dump to write SAM format files. This data appears to be unaligned (so --min-mapq should not affect anything, you can check).

$ sam-dump -r SRR2089860 | head -4
HWI-D00473:169:HFK7WADXX:1:1101:1202:2011       4       *       0       0       *       *       0       0       NGAGTCTATACTCGTTACATTCGCGTAACTCATTGTTAATCGCGAAGTTGA     #1=DDDDFGHHGHJJJIJIIJGIIJJIJHICGIIIJJJIJGIJEHJIGIIG
HWI-D00473:169:HFK7WADXX:1:1101:1195:2074       4       *       0       0       *       *       0       0       CTCGAACTCCTCGTAGTGGCGATTGTCGGTGCTGCCCACCAGGTCCACTGT     CCCFFFFFHGHHHJIJIJJJJHIIGGGHIECEHFHGIEFIGGJGHJIIGIG
HWI-D00473:169:HFK7WADXX:1:1101:1230:2087       4       *       0       0       *       *       0       0       TGCCGGGAATTGTACAGTGCTCAGCTTTATAGGACATTTCCAAACAGTTAT     BBBFFFF8FHHHHJJJIJJJJIJGJJIJFGJIFGIIIJJJIGIEIIIIJGG
HWI-D00473:169:HFK7WADXX:1:1101:1222:2168       4       *       0       0       *       *       0       0       CCGAGACTTGCCTGCTCACCAGCGAAGAGGGCGAGGAGCGTTTGACGGCCG     @@CDDADDHFHHHIIIIIHGGE<GEGIEHIGIIDHGHGGIHHHEFFFCCCB

To do PCA analysis you will need to align fastq data to reference, count aligned reads to get an expression estimate. You could also use something like salmon to align to transcriptome to get counts.

ADD COMMENTlink modified 5 months ago • written 5 months ago by genomax74k

Thanks for your help! this worked smoothly.

ADD REPLYlink written 5 months ago by jordi.planells90
0
gravatar for ATpoint
5 months ago by
ATpoint26k
Germany
ATpoint26k wrote:

Hi jordi.planells, if you ask for help, please always post full command lines so that others can reproduce the problem. There are plenty of pitfalls using these commands that cannot be reproduced by only telling which tool you used.

Essentially, to download sra files or fastq files, you can simply follow Fast download of FASTQ files from the European Nucleotide Archive (ENA) and then proceed with alignment. The tutorials covers both fastq download from the ENA or sra from NCBI.

ADD COMMENTlink modified 5 months ago • written 5 months ago by ATpoint26k

I have posted more commands today because I have performed further checks this morning as I was not able to get the .bam (following the suggestions from other users). Thank you for the tutorial, I will give it a shot and try to get the fastq files how is explained there.

ADD REPLYlink written 5 months ago by jordi.planells90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1675 users visited in the last hour