Hi all,
I would like to evaluate pygr for annotation mapping work that I have been doing but after reading through the recipes and docs I'm quite a bit lost.
My questions revolve around the below datasets. How could you use pygr to map annotations in one dataset to overlapping ones in another? From looking through the pygr documentation it seems that when you use an annotation you also need to include a sequence database. Though I'm lost a bit by the docs..
Any help by someone who has used this library (or a similar one) for these types of tasks would be much appreciated.
I'm starting with data before goals so you have an idea of what I'm working with.
Datasets:
- TCGA datasets (48 columns.. MAF like format)
- SSM mutations
- gene expression
- PolymiRTS (diverse columns and formats)
- seed mutations miRNA
- miRNA target mutations
- miRTarBase (various formats)
- 3UTR database
- miRNA targets from Sanger
- Disease specific SNPs
- SNPdb
In any case the goals are:
- To determine if TCGA mutations appear in any miRNAs, in particular their seeds
- To determine if TCGA mutations appear in any miRNA targets.
- To map TCGA mutations to 3UTR
- To map TCGA mutations to miRNA transcript info
- Any known prostate cancer SNPs exist in target or miRNA