I am looking for a way to find the SNPs (rsID like "rs559632360") in the 3' and 5' UTRs of mRNA of BRCA1 and BRCA2. I know how to get the 3' and 5' UTRs using XSLT. Now am looking to get the SNPs and their positions in the extracted 3' and 5' UTRs. I would really appreciate if you could share the solution with me.
Thanks much Pierre! I understand the parameters used in the command line like, snp150 is the dbsnpbuild database number, the chromosome number is 17, and the chromosome start and end. In my case the chromosome start and end is: (in the xml file output of BRCA1)
chromosome="17" start="26935980" end="81742541"
When I ran the command, it gave me all the rsids like this: (total 144107 rsids)
Please help me understand it better by answering a few queries related to it:
a) Among these rsids, for the 3' and 5' UTRs, if am right, I should extract only the "ncRNA,untranslated-3" & "ncRNA,untranslated-5"? Because there are many other entries such as, "intron,near-gene-5,untranslated-5", "untranslated-3"..etc
Let's say I have found using XSLT, 3' and 5' UTRs; for refseq geneid "NM_007299.3" of BRCA1 (shown below), and this refseq geneid has rsids, one such rsid is "rs863224421".
b) When I searched for "rs863224421" in the list of "144107" rsids obtained from the mysql ucsc command, I do not find it in the list? If am right, this rsid "rs863224421" should be present in the list? Please let me know if am missing some information.
c) From the XSLT result of BRCA1 refseq geneids like below, could you please tell me, how to find what rsid belong to these 5' and 3' UTR, and at what position? Fro instance this information:
refseq-gene-id mutated-allele position
NM_007299.3|-195| 5' UTR (let's say) c to t let's say (200)
>NM_007299.3|-195|5' UTR
cttagcggtagccccttggtttccgtggcaacggaaaagcgcgggaattacagataaatt
aaaactgcgactgcgcggcgtgagctcgctgagacttcctggacgggggacaggctgtgg
ggtttctcagataactgggcccctgcgctcaggaggccttcaccctctgctctggttcat
tggaacagaaagaa
>NM_007299.3|2294-|3' UTR
ggcacctgtggtgacccgagagtgggtgttggacagtgtagcactctaccagtgccagga
gctggacacctacctgataccccagatcccccacagccactactgactgcagccagccac
aggtacagagccacaggaccccaagaatgagcttacaaagtggcctttccaggccctggg
agctcctctcactcttcagtccttctactgtcctggctactaaatattttatgtacatca
gcctgaaaaggacttctggctatgcaagggtcccttaaagattttctgcttgaagtctcc
Thanks much Pierre! I understand the parameters used in the command line like, snp150 is the dbsnpbuild database number, the chromosome number is 17, and the chromosome start and end. In my case the chromosome start and end is: (in the xml file output of BRCA1)
When I ran the command, it gave me all the rsids like this: (total 144107 rsids)
Please help me understand it better by answering a few queries related to it:
a) Among these rsids, for the 3' and 5' UTRs, if am right, I should extract only the "ncRNA,untranslated-3" & "ncRNA,untranslated-5"? Because there are many other entries such as, "intron,near-gene-5,untranslated-5", "untranslated-3"..etc
Let's say I have found using XSLT, 3' and 5' UTRs; for refseq geneid "NM_007299.3" of BRCA1 (shown below), and this refseq geneid has rsids, one such rsid is "rs863224421".
b) When I searched for "rs863224421" in the list of "144107" rsids obtained from the mysql ucsc command, I do not find it in the list? If am right, this rsid "rs863224421" should be present in the list? Please let me know if am missing some information.
c) From the XSLT result of BRCA1 refseq geneids like below, could you please tell me, how to find what rsid belong to these 5' and 3' UTR, and at what position? Fro instance this information:
Thanks much! :-) DK