Question: Querying Gemini database by using genomic coordinates from a bed file.
0
gravatar for eDNAuRNA
13 months ago by
eDNAuRNA20
eDNAuRNA20 wrote:

Hi everyone,

I have a bed file (tab separated columns) with hundreds of genomic coordinates as follows.

chr1    88833393    88834022    EXr19   1   +
chr1    22531002    22531628    EXr20   1   +
chr1    10355070    10355696    EXr21   1   +

I am trying to query a gemini database by using a genomic region based query as follows.

gemini query --header --show-samples --region 1:88833393-88834022 -q "select * from variants" gemini.db >> output.tsv

Is there a way I can generate a query for each genomic coordinate given in the bed file automatically? An urgent help will be appreciated.

Thanks

query database bed gemini • 305 views
ADD COMMENTlink modified 13 months ago by finswimmer13k • written 13 months ago by eDNAuRNA20
3
gravatar for finswimmer
13 months ago by
finswimmer13k
Germany
finswimmer13k wrote:

Hello,

you could use gnu parallel for this:

$ parallel --dry-run --colsep "\t" 'gemini query --header --show-samples --region {1}:{2}-{3} -q "select * from variants" gemini.db' :::: regions.bed >> output.tsv

Remove the --dry-run if you're happe with the commands created.

fin swimmer

ADD COMMENTlink modified 13 months ago • written 13 months ago by finswimmer13k

Hi Fin,

thanks a bunch. This worked like a charm. I need a quick modification. The query shouldn't include "chr" from the bed file. The code you shared includes "chr" in the output and it won't work like this. Can you please suggest how to avoid adding "chr" in the output? Right now following query is being generated.

gemini query --header --show-samples --region chr1:88833393-88834022 -q "select * from variants" gemini.db >> output.tsv

Secondly, can you please explain how the code you suggested actually works? If you don't have time, please point me to a tutorial. Thirdly, what --dry-run is doing and what will happen if i remove it?

Thanks again, I am very close to solve a problem I was facing for two months.

Cheers,

ADD REPLYlink modified 13 months ago • written 13 months ago by eDNAuRNA20
1

Hello,

a good introduction to parallel is here in biostars :)

What my code do is, to start for each line in the regions.bed the command between the quotation marks. With --colsep "\t"we also tell that there are multiple arguments in each line delimited by a tab. Doing so we can use the placeholders {n} in the command.

With --dry-run we force parallel to not execute the command and just print out the command it will use instead. This is good for having a look, if everything of our input parameters is parsed correct. To finally execute the commands we need to remove the option.

To get rid of the chr we can use sed and pipe the result to parallel:

$ sed 's/^chr//'  regions.bed|parallel --dry-run  --colsep "\t" 'gemini query --header --show-samples --region {1}:{2}-{3} -q "select * from variants" gemini.db' >> output.tsv

fin swimmer

ADD REPLYlink modified 13 months ago • written 13 months ago by finswimmer13k

Hi Fin,

You are amazing. Its working perfectly :)

Thanks a bunch. Have a great weekend.

Best,

ADD REPLYlink written 13 months ago by eDNAuRNA20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2052 users visited in the last hour