Pipe output from curl to tabix to slice .vcf file
1
0
Entering edit mode
4.0 years ago
j.lunger18 ▴ 30

New to bioinformatics here.

I would like to pull a vcf from TCGA and have a command to do so using curl. I don't want the whole vcf file, but rather a specified region. Is there a way that I can pipe output from cul to tabix without having curl locally download the whole vcf file?

My current command is as follows:

module load google-cloud-sdk; curl --header "X-Auth-Token:$token" "https://api.gdc.cancer.gov/data/${file_id}" | tabix -h chr1:XXXXX-XXXXX > /desitnation/${file_id}.sliced.vcf; done
VCF Tabix Curl • 1.3k views
ADD COMMENT
0
Entering edit mode
4.0 years ago
ATpoint 82k

I do not think so, at least not using curl. The idea of tabix is that it uses an associated index to then pull only the slices that you want from the full file. For this you would need the respective index somewhere the source location. It is possible using tabix alone, see the tabix manual (keyword remote data retrieval). Check if an index is available.

ADD COMMENT
0
Entering edit mode

What might complicate this is HTTPS support (which might now kinda be in tabix) and support for authentication tokens or other custom HTTP headers (which might not). I would be interested to know what the status is of that support, but just noting that they might be issues.

It might be possible to use Node.js to set up an HTTP > HTTPS proxy service. This is basically setting up a local HTTP server that points to the remote HTTPS service running on api.gdc.cancer.gov. Requests to the HTTP service are unauthenticated, but the proxy service passes along the custom authentication token header.

In other words, one would then run tabix to point to the VCF file "hosted" on that local, unauthenticated HTTP proxy, e.g., http://localhost/${file_id}.

On the proxy side, http://localhost/${file_id} gets swapped out for https://api.gdc.cancer.gov/data/${file_id}.

Any requests to the proxy are, in turn, now-authenticated requests for data from the original HTTPS service — as far as tabix is concerned, it is just talking to an HTTP server.

This includes requests for the index file, which are then turned into requests for byte ranges from the original bgzip file — the proxy would need to be configured to pass along any such byte-range headers that tabix puts into its request.

ADD REPLY

Login before adding your answer.

Traffic: 2575 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6