Entering edit mode
12 days ago
Pierre Lindenbaum
157k
13 years after How Do You Manage Your Files & Directories For Your Projects ? , I wrote a tutorial about how I now manage my data : BAM, VCF, sample, phenotype, reference etc... how to link everything with RDF and how to query with SPARQL.
See https://github.com/lindenb/hts-rdf/
An example for biostars:
find the BAM files , their references, samples , etc..
SPARQL query
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX u: <https://umr1087.univ-nantes.fr/rdf/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rel: <http://purl.org/vocab/relationship/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT DISTINCT ?bamPath
(SAMPLE(?fasta) as ?colFasta)
(SAMPLE(?taxonName) as ?colTaxon)
(SAMPLE(?sampleName) as ?colSampleName )
(GROUP_CONCAT(DISTINCT ?groupName; SEPARATOR=";") as ?colGroups )
(GROUP_CONCAT(DISTINCT ?gender; SEPARATOR=";") as ?colGender )
(GROUP_CONCAT(DISTINCT ?diseaseName; SEPARATOR=";") as ?colDiseases)
(SAMPLE(?fatherName) as ?colFather )
(SAMPLE(?motherName) as ?colMother)
(GROUP_CONCAT(DISTINCT ?childName; SEPARATOR="; ") as ?colChildren)
WHERE {
?bam a u:Bam .
?bam u:filename ?bamPath .
OPTIONAL {
?bam u:reference ?ref .
?ref a u:Reference .
?ref u:filename ?fasta
OPTIONAL {
?ref u:taxon ?taxon .
?taxon a u:Taxon .
?taxon dc:title ?taxonName .
}
}
OPTIONAL {
?bam u:sample ?sample .
?sample a foaf:Person .
OPTIONAL {?sample foaf:name ?sampleName .}
OPTIONAL {?sample foaf:gender ?gender .}
OPTIONAL {
?group foaf:member ?sample .
?group a foaf:Group .
?group foaf:name ?groupName .
} .
OPTIONAL {
?sample u:has-disease ?disease .
?disease a owl:Class .
?disease rdfs:label ?diseaseName .
} .
OPTIONAL {
?father a foaf:Person .
?sample rel:childOf ?father .
?father foaf:gender "male" .
?father foaf:name ?fatherName .
} .
OPTIONAL {
?mother a foaf:Person .
?sample rel:childOf ?mother .
?mother foaf:gender "female" .
?mother foaf:name ?motherName .
} .
OPTIONAL {
?child a foaf:Person .
?child rel:childOf ?sample .
?child foaf:name ?childName .
} .
}.
}
GROUP BY ?bamPath
execute:
arq --data=knowledge.rdf --query=data/query.bams.01.sparql > TMP/bams.01.out
output
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| bamPath | colFasta | colTaxon | colSampleName | colGroups | colGender | colDiseases | colFather | colMother | colChildren |
=======================================================================================================================================================================================================
| "/home/lindenb/src/hts-rdf/data/S1.grch38.bam" | "data/hg38.fasta" | "Homo Sapiens" | "S1" | "Fam01" | "female" | "Turner Syndrome;COVID-19" | "S3" | "S2" | |
| "/home/lindenb/src/hts-rdf/data/S2.grch37.bam" | "data/hg19.fasta" | "Homo Sapiens" | "S2" | "Fam01" | "female" | | | | "S1" |
| "/home/lindenb/src/hts-rdf/data/S4.RF.bam" | "data/rotavirus_rf.fa" | "Rotavirus" | "S4" | | | | | | |
| "/home/lindenb/src/hts-rdf/data/S5.grch38.bam" | "data/hg38.fasta" | "Homo Sapiens" | "S5" | "Fam01" | | | | | |
| "/home/lindenb/src/hts-rdf/data/S3.grch38.bam" | "data/hg38.fasta" | "Homo Sapiens" | "S3" | "Fam01" | "male" | "Severe COVID-19" | | | "S1" |
| "/home/lindenb/src/hts-rdf/data/S1.grch37.bam" | "data/hg19.fasta" | "Homo Sapiens" | "S1" | "Fam01" | "female" | "Turner Syndrome;COVID-19" | "S3" | "S2" | |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------