Dear All,
I have data related to the SNPs like this:
1) The chromosome region for the 5' and 3' UTRs obtained from Ensembl. For example, in this example below the 5' and 3' UTRs are from chromosome 17 and their respective regions are also shown.
5'-UTR: 17:43124097-43124115
3'-UTR: 17:43044295-43045677
2) The rsIDs of the SNPs within the 5' and 3' UTRs regions mentioned above.
17 43124097 43124097 rs587781565 five_prime_UTR
17 43124098 43124098 rs273899693 five_prime_UTR
17 43124099 43124099 rs273900720 five_prime_UTR
17 43124106 43124106 rs748057929 five_prime_UTR
17 43124107 43124107 rs273897654 five_prime_UTR
17 43044494 43044494 rs931310802 three_prime_UTR
17 43044518 43044517 rs34214126 three_prime_UTR
17 43044518 43044518 rs1010615361 three_prime_UTR
17 43044539 43044539 CR1212932 three_prime_UTR
17 43044539 43044539 rs746187092 three_prime_UTR
17 43044560 43044560 rs1021990558 three_prime_UTR
17 43044561 43044561 rs571007748 three_prime_UTR
Can someone please tell me;
a) How to extract the sequence between the regions in 1) like this:
>NM_007299.3 | 43124097-43124115 | 5' UTR
cttagcggtagccccttggtttccgtggcaacggaaaagcgcgggaattacagataaatt
aaaactgcgactgcgcggcgtgagctcgctgagacttcctggacgggggacaggctgtgg
ggtttctcagataactgggcccctgcgctcaggaggccttcaccctctgctctggttcat
tggaacagaaagaa
>NM_007299.3 | 43044295-43045677 | 3' UTR
ggcacctgtggtgacccgagagtgggtgttggacagtgtagcactctaccagtgccagga
gctggacacctacctgataccccagatcccccacagccactactgactgcagccagccac
aggtacagagccacaggaccccaagaatgagcttacaaagtggcctttccaggccctggg
agctcctctcactcttcagtccttctactgtcctggctactaaatattttatgtacatca
gcctgaaaaggacttctggctatgcaagggtcccttaaagattttctgcttgaagtctcc
b) Get the information about the mutant alleles corresponding to a respective rsID like this:
17 43124097 43124097 rs587781565 five_prime_UTR C(reference allele) T(mutant alele)
17 43124098 43124098 rs273899693 five_prime_UTR A(reference allele) G(mutant alele)
17 43044560 43044560 rs1021990558 three_prime_UTR C(reference allele) G(mutant alele)
17 43044561 43044561 rs571007748 three_prime_UTR T(reference allele) G(mutant alele)
Thanks much!
Part of answer to question 1 is bedtools getfasta.
Thanks! I guess, I need to download the chromosome fasta file locally to run bedtools? Can it be done without downloading the chromosome fasta file locally?
yes, but it's slower: How To Get The Sequence Of A Genomic Region From Ucsc?
bedtools getfasta needs the file locally. Without downloading you would need another method. It's possible without downloading, but I don't see why you want that.
Thanks! Can you please tell me a reliable source from where I can download the chromosome fasta files for bedtools?
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/