Issue With CRAM -> BAM -> FASTQ Conversion
2
0
Entering edit mode
22 months ago
denis.k ▴ 20

Please help! I am trying to obtain fastq files from the GDSC, all we have in the lab is CRAM files. Unfortunately, the reference genome seems to not exist when pulled from an online source. I have attempted to use the Samtools code which allows me to put in my own reference but I get an error message

code used:

 samtools view -b  -T /GRCh37.p13.genome.fa -o /EGAR00001252191_13305_1.bam /EGAR00001252191_13305_1.cram

error message:

[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/1b22b98cdeb4a9304cb5d48026a85128": Protocol not supported
[E::cram_get_ref] Failed to populate reference for id 0
[E::cram_decode_slice] Unable to fetch reference #0 10507..19965
[E::cram_next_slice] Failure to decode slice
[main_samview] truncated file.

Samtools v1.9, OS: Ubuntu 18.04.2 LTS

Any ideas? Thank you in advance!

samtools fastq cram bam view • 2.0k views
ADD COMMENT
0
Entering edit mode

I assume this data is coming from European Genome Phenome Archive? You should email datasharing at sanger.ac.uk to get assistance with reference.

ADD REPLY
0
Entering edit mode

Hello,

how have you installed samtools? Is htslib installed as well?

Related: https://github.com/samtools/samtools/issues/857

fin swimmer

ADD REPLY
4
Entering edit mode
22 months ago
jkbonfield ▴ 590

Has your copy of samtools (htslib actually) been compiled with curl support? If you just did "make" the answer will probably be no. You need ./configure;make. It's dumb and an annoyance of mine that we have two different build routes that by default choose two different combinations of features. Argh. This is almost certainly what the "protocol not supported" means as without curl it won't be able to handle https; only http.

If you do indeed have a curl capable samtools, then is it maybe failing due to a firewall rule? You can use htslib's "hts_file -vvvvv -c in.cram > /dev/null" to read a CRAM file with maximum verbosity enabled, which may shed some light on the manner. (Sadly samtools itself doesn't have any way to enter htslib debug mode, unless you attach a debugger to it.)

ADD COMMENT
1
Entering edit mode

Thank you! It was a lib curl problem! Turns out that I just needed the --enable lib curl option. Such a silly problem. Thank you for all the help

ADD REPLY
0
Entering edit mode

Hi,

In our institute, the tools are all installed on a cluster by the systems admin, so I don't know whether or not ./configure was run. Could you please tell me at which step did you use --enable lib curl? I'm guessing you used it with samtools view -b -T /GRCh37.p13.genome.fa -o /EGAR00001252191_13305_1.bam /EGAR00001252191_13305_1.cram, but in samtools-1.9 (the version on our cluster), there is no --enable lib curl option under samtools view.

ADD REPLY
1
Entering edit mode
7 weeks ago
bw. ▴ 200

For anyone else that runs into this issue, the following worked for me:

wget https://storage.googleapis.com/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta
wget https://storage.googleapis.com/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai 
wget https://storage.googleapis.com/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.ref_cache.tar.gz

tar xzf Homo_sapiens_assembly38.ref_cache.tar.gz
export REF_PATH="$(pwd)/ref/cache/%2s/%2s/%s:http://www.ebi.ac.uk/ena/cram/md5/%s"
export REF_CACHE="$(pwd)/ref/cache/%2s/%2s/%s"

...

{command that was generating the '[W::find_file_url] Failed to open reference..' error}
ADD COMMENT

Login before adding your answer.

Traffic: 2848 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6