Tutorial:Managing your data (BAM, VCF, sample, phenotype) with RDF and SPARQL.
0
2
Entering edit mode
7 months ago

13 years after How Do You Manage Your Files & Directories For Your Projects ? , I wrote a tutorial about how I now manage my data : BAM, VCF, sample, phenotype, reference etc... how to link everything with RDF and how to query with SPARQL.

See https://github.com/lindenb/hts-rdf/

An example for biostars:

find the BAM files , their references, samples , etc..

SPARQL query

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX u: <https://umr1087.univ-nantes.fr/rdf/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rel: <http://purl.org/vocab/relationship/> 
PREFIX dc: <http://purl.org/dc/elements/1.1/>


SELECT DISTINCT ?bamPath
        (SAMPLE(?fasta) as ?colFasta)
        (SAMPLE(?taxonName) as ?colTaxon)
        (SAMPLE(?sampleName) as ?colSampleName )
        (GROUP_CONCAT(DISTINCT ?groupName; SEPARATOR=";") as ?colGroups )
        (GROUP_CONCAT(DISTINCT ?gender; SEPARATOR=";") as ?colGender )
        (GROUP_CONCAT(DISTINCT ?diseaseName; SEPARATOR=";") as ?colDiseases)
        (SAMPLE(?fatherName) as ?colFather )
        (SAMPLE(?motherName)  as ?colMother)
        (GROUP_CONCAT(DISTINCT ?childName; SEPARATOR="; ") as ?colChildren)
WHERE {
  ?bam a u:Bam .
  ?bam u:filename ?bamPath .

  OPTIONAL {
    ?bam u:reference ?ref .
    ?ref a u:Reference .
    ?ref u:filename ?fasta

    OPTIONAL {
        ?ref u:taxon ?taxon .
        ?taxon a u:Taxon .
        ?taxon dc:title ?taxonName .
        }
    }

  OPTIONAL {
    ?bam u:sample ?sample .
    ?sample a foaf:Person .
    OPTIONAL {?sample foaf:name ?sampleName .}
    OPTIONAL {?sample foaf:gender ?gender .}
    OPTIONAL {
        ?group foaf:member ?sample .
        ?group a foaf:Group .
        ?group foaf:name ?groupName .
        } .
    OPTIONAL {
        ?sample u:has-disease ?disease .
        ?disease a owl:Class .
        ?disease rdfs:label ?diseaseName .
        } .
    OPTIONAL {
        ?father a foaf:Person .
        ?sample rel:childOf ?father .
        ?father foaf:gender "male" .
        ?father foaf:name ?fatherName .
        } .
    OPTIONAL {
        ?mother a foaf:Person .
        ?sample rel:childOf ?mother .
        ?mother foaf:gender "female" .
        ?mother foaf:name ?motherName .
        } .
    OPTIONAL {
        ?child a foaf:Person .
        ?child rel:childOf ?sample .
        ?child foaf:name ?childName .
        } .
    }.
}
GROUP BY  ?bamPath

execute:

arq --data=knowledge.rdf --query=data/query.bams.01.sparql > TMP/bams.01.out

output

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| bamPath                                        | colFasta               | colTaxon       | colSampleName | colGroups | colGender | colDiseases                | colFather | colMother | colChildren |
=======================================================================================================================================================================================================
| "/home/lindenb/src/hts-rdf/data/S1.grch38.bam" | "data/hg38.fasta"      | "Homo Sapiens" | "S1"          | "Fam01"   | "female"  | "Turner Syndrome;COVID-19" | "S3"      | "S2"      |             |
| "/home/lindenb/src/hts-rdf/data/S2.grch37.bam" | "data/hg19.fasta"      | "Homo Sapiens" | "S2"          | "Fam01"   | "female"  |                            |           |           | "S1"        |
| "/home/lindenb/src/hts-rdf/data/S4.RF.bam"     | "data/rotavirus_rf.fa" | "Rotavirus"    | "S4"          |           |           |                            |           |           |             |
| "/home/lindenb/src/hts-rdf/data/S5.grch38.bam" | "data/hg38.fasta"      | "Homo Sapiens" | "S5"          | "Fam01"   |           |                            |           |           |             |
| "/home/lindenb/src/hts-rdf/data/S3.grch38.bam" | "data/hg38.fasta"      | "Homo Sapiens" | "S3"          | "Fam01"   | "male"    | "Severe COVID-19"          |           |           | "S1"        |
| "/home/lindenb/src/hts-rdf/data/S1.grch37.bam" | "data/hg19.fasta"      | "Homo Sapiens" | "S1"          | "Fam01"   | "female"  | "Turner Syndrome;COVID-19" | "S3"      | "S2"      |             |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
rdf sparql data-management graph • 567 views
ADD COMMENT

Login before adding your answer.

Traffic: 1817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6