Hi everyone, I want to make a virtual library fasta file for all the possible exons in a gene and also all possible exon-intron fasta file. Example;-- I have gene SDF4 which contain 7 exons:
ENSE00003851528:
GCGACCGCTTCCGCCCGGAGGAGAGATGGTGGCGCCGGCGTCCCCTCCGTGAGGTCGCGC
CCGTTCGCACCGCCCCCGCCCGCAAGAAAGATGGCAGTGGCCTGATCCGGGCCCGTTGGC
GGCGTCACTGACGCTTCGCTCCGGTCCTCGGATCCCGAGCGCGGGGAGGCAGACCG
ENSE00003459646:
ACTGTGAGCTGCTTGTCCCCATCCTGCGGCCGTCCTGGGGACACAGAGCCCTCCGTGGTG
CCCGGGGATTGGATTGGAGCCAGGACCTCACTTCCTCCTCTGCCCCTGCCCCTGCCCCTC
CCAGCACCTGGCCCACACCCTGCAGCCCGCCCCATGGTCTGGCCCTGGGTGGCGATGGCG
TCCAGGTGGGGTCCCCTCATTGGCCTGGCTCCGTGCTGCCTCTGGCTCCTGGGGGCAGTC
CTTCTGATGGACGCGTCTGCACGGCCTGCCAACCACTCGTCCACTCGAGAGAGAGTAGCC
AACAGGGAGGAGAATGAGATCCTGCCCCCAGACCACCTGAACGGGGTGAAGCTGGAGATG
GACGGGCACCTCAATCGCGGCTTCCACCAGGAGGTCTTCCTAGGCAAGGACCTGGGTGGC
TTTGATGAGGACGCGGAGCCGCGGCGGAGCCGGAGGAAGCTGATGGTCATCTTTTCCAA
ENSE00000869640:
GGTGGATGTGAACACTGACCGGAAGATCAGTGCCAAGGAGATGCAGCGCTGGATCATGGA
GAAGACGGCCGAGCACTTCCAGGAGGCCATGGAGGAGAGCAAGACACACTTCCGCGCCGT
GGACCCTGACGGGGACG
So i want to create all possible like-----:--
ENSE00003851528+ENSE00003459646
GCGACCGCTTCCGCCCGGAGGAGAGATGGTGGCGCCGGCGTCCCCTCCGTGAGGTCGCGC
CCGTTCGCACCGCCCCCGCCCGCAAGAAAGATGGCAGTGGCCTGATCCGGGCCCGTTGGC
GGCGTCACTGACGCTTCGCTCCGGTCCTCGGATCCCGAGCGCGGGGAGGCAGACCGACTGTGAGCTGCTTGTCCCCATCCTGCGGCCGTCCTGGGGACACAGAGCCCTCCGTGGTG
CCCGGGGATTGGATTGGAGCCAGGACCTCACTTCCTCCTCTGCCCCTGCCCCTGCCCCTC
CCAGCACCTGGCCCACACCCTGCAGCCCGCCCCATGGTCTGGCCCTGGGTGGCGATGGCG
TCCAGGTGGGGTCCCCTCATTGGCCTGGCTCCGTGCTGCCTCTGGCTCCTGGGGGCAGTC
CTTCTGATGGACGCGTCTGCACGGCCTGCCAACCACTCGTCCACTCGAGAGAGAGTAGCC
AACAGGGAGGAGAATGAGATCCTGCCCCCAGACCACCTGAACGGGGTGAAGCTGGAGATG
GACGGGCACCTCAATCGCGGCTTCCACCAGGAGGTCTTCCTAGGCAAGGACCTGGGTGGC
TTTGATGAGGACGCGGAGCCGCGGCGGAGCCGGAGGAAGCTGATGGTCATCTTTTCCAA
ENSE00003851528+ENSE00000869640
GCGACCGCTTCCGCCCGGAGGAGAGATGGTGGCGCCGGCGTCCCCTCCGTGAGGTCGCGC
CCGTTCGCACCGCCCCCGCCCGCAAGAAAGATGGCAGTGGCCTGATCCGGGCCCGTTGGC
GGCGTCACTGACGCTTCGCTCCGGTCCTCGGATCCCGAGCGCGGGGAGGCAGACCGGGTGGATGTGAACACTGACCGGAAGATCAGTGCCAAGGAGATGCAGCGCTGGATCATGGAGAAGACGGCCGAGCACTTCCAGGAGGCCATGGAGGAGAGCAAGACACACTTCCGCGCCGTGGACCCTGACGGGGACG
Like this, I want to make an all possible library preparation. So anyone can help me how do I download all Exons of genes and create a virtual RNA sequence with all possible exons of genes. Thanks in advance.
You are probably looking for GTF files which store these information. Once you extracted the coordinates of the features you want you can use
bedtools fasta
to pull the sequences of the coordinates from a reference genome fasta file.No, I want to make a library of mRNA of all possible combinations of exons and introns present in one gene. Likewise, I want to prepare a library for whole human genes.
Can anyone help me to make an mRNA library? I don't know how to start it. Thanks in advance
As @ATPoint suggested above it should be possible to get the sequence of all exons but to create all possible combinatorial (pair-wise? since you seem to have only pairs in example above) sequences out of those exons will require writing some custom code. Conceptually you would create a list of exons for a particular gene and then write out combinations. Then move on to gene2 , gene3.
But I want to make a library of the whole human genes so like this I have to save different genes in different files or there is any other way to do this. I am new in this thing to write a script. So there is no software available for this to do easily. If you know how to make the custom code for this thing , then can you help me to make the library. Thanks in advance
if you don't want to elaborate that's fine.
but I was just wondering why you would want to do this? I can somewhat see to get a set of different exon/exon combinations in the same order as the original gene but not to make all possible combinations of exons in a gene. There is a biological 'order' in those exons, so randomly combining them does not make much sense to me ...
and why you want to then even throw in all the introns makes even less sense :/