Issue With CRAM -> BAM -> FASTQ Conversion
2
0
Entering edit mode
2.9 years ago
denis.k ▴ 20

Please help! I am trying to obtain fastq files from the GDSC, all we have in the lab is CRAM files. Unfortunately, the reference genome seems to not exist when pulled from an online source. I have attempted to use the Samtools code which allows me to put in my own reference but I get an error message

code used:

 samtools view -b  -T /GRCh37.p13.genome.fa -o /EGAR00001252191_13305_1.bam /EGAR00001252191_13305_1.cram


error message:

[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/1b22b98cdeb4a9304cb5d48026a85128": Protocol not supported
[E::cram_get_ref] Failed to populate reference for id 0
[E::cram_decode_slice] Unable to fetch reference #0 10507..19965
[E::cram_next_slice] Failure to decode slice
[main_samview] truncated file.


Samtools v1.9, OS: Ubuntu 18.04.2 LTS

Any ideas? Thank you in advance!

samtools fastq cram bam view • 3.4k views
0
Entering edit mode

I assume this data is coming from European Genome Phenome Archive? You should email datasharing at sanger.ac.uk to get assistance with reference.

0
Entering edit mode

Hello,

how have you installed samtools? Is htslib installed as well?

fin swimmer

4
Entering edit mode
2.8 years ago
jkbonfield ▴ 760

Has your copy of samtools (htslib actually) been compiled with curl support? If you just did "make" the answer will probably be no. You need ./configure;make. It's dumb and an annoyance of mine that we have two different build routes that by default choose two different combinations of features. Argh. This is almost certainly what the "protocol not supported" means as without curl it won't be able to handle https; only http.

If you do indeed have a curl capable samtools, then is it maybe failing due to a firewall rule? You can use htslib's "hts_file -vvvvv -c in.cram > /dev/null" to read a CRAM file with maximum verbosity enabled, which may shed some light on the manner. (Sadly samtools itself doesn't have any way to enter htslib debug mode, unless you attach a debugger to it.)

1
Entering edit mode

Thank you! It was a lib curl problem! Turns out that I just needed the --enable lib curl option. Such a silly problem. Thank you for all the help

0
Entering edit mode

Hi,

In our institute, the tools are all installed on a cluster by the systems admin, so I don't know whether or not ./configure was run. Could you please tell me at which step did you use --enable lib curl? I'm guessing you used it with samtools view -b -T /GRCh37.p13.genome.fa -o /EGAR00001252191_13305_1.bam /EGAR00001252191_13305_1.cram, but in samtools-1.9 (the version on our cluster), there is no --enable lib curl option under samtools view.

1
Entering edit mode
14 months ago
bw. ▴ 220

For anyone else that runs into this issue, the following worked for me:

wget https://storage.googleapis.com/gcp-public-data--broad-references/hg38/v0/Homo_sapiens_assembly38.fasta

tar xzf Homo_sapiens_assembly38.ref_cache.tar.gz
export REF_PATH="$(pwd)/ref/cache/%2s/%2s/%s:http://www.ebi.ac.uk/ena/cram/md5/%s" export REF_CACHE="$(pwd)/ref/cache/%2s/%2s/%s"

...

{command that was generating the '[W::find_file_url] Failed to open reference..' error}

0
Entering edit mode

is there somemthing similar for hg19?

1
Entering edit mode

Broad seems to have removed all hg19 files from their google public bucket. Other people have asked about these and we have told them to ask Broad tech support. No one has come back and posted a response as far as I recollect. If you are willing, please do that. It would be useful for all.