creating virtual library preparation of all possible exons in a gene in human
1
0
Entering edit mode
3.3 years ago
harry ▴ 30

Hi everyone, I want to make a virtual library fasta file for all the possible exons in a gene and also all possible exon-intron fasta file. Example;-- I have gene SDF4 which contain 7 exons:

ENSE00003851528:
GCGACCGCTTCCGCCCGGAGGAGAGATGGTGGCGCCGGCGTCCCCTCCGTGAGGTCGCGC
CCGTTCGCACCGCCCCCGCCCGCAAGAAAGATGGCAGTGGCCTGATCCGGGCCCGTTGGC
GGCGTCACTGACGCTTCGCTCCGGTCCTCGGATCCCGAGCGCGGGGAGGCAGACCG  

ENSE00003459646:
ACTGTGAGCTGCTTGTCCCCATCCTGCGGCCGTCCTGGGGACACAGAGCCCTCCGTGGTG
CCCGGGGATTGGATTGGAGCCAGGACCTCACTTCCTCCTCTGCCCCTGCCCCTGCCCCTC
CCAGCACCTGGCCCACACCCTGCAGCCCGCCCCATGGTCTGGCCCTGGGTGGCGATGGCG
TCCAGGTGGGGTCCCCTCATTGGCCTGGCTCCGTGCTGCCTCTGGCTCCTGGGGGCAGTC
CTTCTGATGGACGCGTCTGCACGGCCTGCCAACCACTCGTCCACTCGAGAGAGAGTAGCC
AACAGGGAGGAGAATGAGATCCTGCCCCCAGACCACCTGAACGGGGTGAAGCTGGAGATG
GACGGGCACCTCAATCGCGGCTTCCACCAGGAGGTCTTCCTAGGCAAGGACCTGGGTGGC
TTTGATGAGGACGCGGAGCCGCGGCGGAGCCGGAGGAAGCTGATGGTCATCTTTTCCAA

ENSE00000869640:
GGTGGATGTGAACACTGACCGGAAGATCAGTGCCAAGGAGATGCAGCGCTGGATCATGGA
GAAGACGGCCGAGCACTTCCAGGAGGCCATGGAGGAGAGCAAGACACACTTCCGCGCCGT
GGACCCTGACGGGGACG

So i want to create all possible like-----:--

ENSE00003851528+ENSE00003459646
GCGACCGCTTCCGCCCGGAGGAGAGATGGTGGCGCCGGCGTCCCCTCCGTGAGGTCGCGC
CCGTTCGCACCGCCCCCGCCCGCAAGAAAGATGGCAGTGGCCTGATCCGGGCCCGTTGGC
GGCGTCACTGACGCTTCGCTCCGGTCCTCGGATCCCGAGCGCGGGGAGGCAGACCGACTGTGAGCTGCTTGTCCCCATCCTGCGGCCGTCCTGGGGACACAGAGCCCTCCGTGGTG
CCCGGGGATTGGATTGGAGCCAGGACCTCACTTCCTCCTCTGCCCCTGCCCCTGCCCCTC
CCAGCACCTGGCCCACACCCTGCAGCCCGCCCCATGGTCTGGCCCTGGGTGGCGATGGCG
TCCAGGTGGGGTCCCCTCATTGGCCTGGCTCCGTGCTGCCTCTGGCTCCTGGGGGCAGTC
CTTCTGATGGACGCGTCTGCACGGCCTGCCAACCACTCGTCCACTCGAGAGAGAGTAGCC
AACAGGGAGGAGAATGAGATCCTGCCCCCAGACCACCTGAACGGGGTGAAGCTGGAGATG
GACGGGCACCTCAATCGCGGCTTCCACCAGGAGGTCTTCCTAGGCAAGGACCTGGGTGGC
TTTGATGAGGACGCGGAGCCGCGGCGGAGCCGGAGGAAGCTGATGGTCATCTTTTCCAA


ENSE00003851528+ENSE00000869640
GCGACCGCTTCCGCCCGGAGGAGAGATGGTGGCGCCGGCGTCCCCTCCGTGAGGTCGCGC
CCGTTCGCACCGCCCCCGCCCGCAAGAAAGATGGCAGTGGCCTGATCCGGGCCCGTTGGC 
GGCGTCACTGACGCTTCGCTCCGGTCCTCGGATCCCGAGCGCGGGGAGGCAGACCGGGTGGATGTGAACACTGACCGGAAGATCAGTGCCAAGGAGATGCAGCGCTGGATCATGGAGAAGACGGCCGAGCACTTCCAGGAGGCCATGGAGGAGAGCAAGACACACTTCCGCGCCGTGGACCCTGACGGGGACG

Like this, I want to make an all possible library preparation. So anyone can help me how do I download all Exons of genes and create a virtual RNA sequence with all possible exons of genes. Thanks in advance.

exons fasta human library preparation • 1.1k views
ADD COMMENT
3
Entering edit mode

You are probably looking for GTF files which store these information. Once you extracted the coordinates of the features you want you can use bedtools fasta to pull the sequences of the coordinates from a reference genome fasta file.

ADD REPLY
0
Entering edit mode

No, I want to make a library of mRNA of all possible combinations of exons and introns present in one gene. Likewise, I want to prepare a library for whole human genes.

ADD REPLY
0
Entering edit mode

Can anyone help me to make an mRNA library? I don't know how to start it. Thanks in advance

ADD REPLY
0
Entering edit mode

As @ATPoint suggested above it should be possible to get the sequence of all exons but to create all possible combinatorial (pair-wise? since you seem to have only pairs in example above) sequences out of those exons will require writing some custom code. Conceptually you would create a list of exons for a particular gene and then write out combinations. Then move on to gene2 , gene3.

ADD REPLY
0
Entering edit mode

But I want to make a library of the whole human genes so like this I have to save different genes in different files or there is any other way to do this. I am new in this thing to write a script. So there is no software available for this to do easily. If you know how to make the custom code for this thing , then can you help me to make the library. Thanks in advance

ADD REPLY
0
Entering edit mode

if you don't want to elaborate that's fine.

but I was just wondering why you would want to do this? I can somewhat see to get a set of different exon/exon combinations in the same order as the original gene but not to make all possible combinations of exons in a gene. There is a biological 'order' in those exons, so randomly combining them does not make much sense to me ...

and why you want to then even throw in all the introns makes even less sense :/

ADD REPLY
3
Entering edit mode
3.3 years ago

So anyone can help me how do I download all Exons of genes and create a virtual RNA sequence with all possible exons of genes.

what was said: use a GTF and getfasta.

using javascript (jjs)

var exons=[
    {"name":"exon1","seq":"AATCTGATGCTA"},
    {"name":"exon2","seq":"TCTGATGCTACC"},
    {"name":"exon3","seq":"CATGCTG"},
    {"name":"exon4","seq":"CCC"}
    ];


function recursive(i,array) {
    if(i==exons.length) {
        if(array.length==0) return;
        var t="";
        var s="";
        for(var j in array) {
            t+=exons[array[j]].name+"|";
            s+=exons[array[j]].seq;
            }
        print(">"+t+"\n"+s);
        return;
        }

    var copy = array.slice();
    recursive(i+1,copy);
    copy = array.slice();
    copy.push(i);
    recursive(i+1,copy);
    }


var array=[];
recursive(0,array);

invoke:

$ jjs ~/jeter.js
>exon4|
CCC
>exon3|
CATGCTG
>exon3|exon4|
CATGCTGCCC
>exon2|
TCTGATGCTACC
>exon2|exon4|
TCTGATGCTACCCCC
>exon2|exon3|
TCTGATGCTACCCATGCTG
>exon2|exon3|exon4|
TCTGATGCTACCCATGCTGCCC
>exon1|
AATCTGATGCTA
>exon1|exon4|
AATCTGATGCTACCC
>exon1|exon3|
AATCTGATGCTACATGCTG
>exon1|exon3|exon4|
AATCTGATGCTACATGCTGCCC
>exon1|exon2|
AATCTGATGCTATCTGATGCTACC
>exon1|exon2|exon4|
AATCTGATGCTATCTGATGCTACCCCC
>exon1|exon2|exon3|
AATCTGATGCTATCTGATGCTACCCATGCTG
>exon1|exon2|exon3|exon4|
AATCTGATGCTATCTGATGCTACCCATGCTGCCC
ADD COMMENT
0
Entering edit mode

Thanks for your reply. This example works fine but it works only for 1 gene. But I want to write a script for the whole human genes. I download the coordinates of all exons from biomart: It contains 5 columns 1st three are chromosome no., start exon, end exon. 4th column is transcript id, 5th column is exon rank.

10      100009838       100009947       ENST00000324109 1
10      99875577        99877336        ENST00000324109 17
10      99879811        99880361        ENST00000324109 16
10      99884011        99884209        ENST00000324109 15
10      99885687        99885866        ENST00000324109 14
10      99886300        99886632        ENST00000324109 13
10      99888825        99888953        ENST00000324109 12
10      99894946        99895050        ENST00000324109 11
10      99896267        99896397        ENST00000324109 10
10      99898086        99898285        ENST00000324109 9
10      99898743        99898760        ENST00000324109 8
10      99899919        99900066        ENST00000324109 7
10      99907995        99908094        ENST00000324109 6
10      99908953        99909146        ENST00000324109 5
10      99955214        99957205        ENST00000324109 4
10      99969115        99969237        ENST00000324109 3
10      99971980        99972134        ENST00000324109 2
10      100042193       100042573       ENST00000370418 9
10      100048758       100048876       ENST00000370418 8
10      100054347       100054446       ENST00000370418 7
10      100057013       100057152       ENST00000370418 6
10      100063614       100063725       ENST00000370418 5
10      100065188       100065370       ENST00000370418 4
10      100069714       100069869       ENST00000370418 3
10      100075911       100076107       ENST00000370418 2
10      100081403       100081869       ENST00000370418 1

So by these coordinates file can I make all possible combinations of whole genes and get in fasta format with different names. Thanks in advance

ADD REPLY
0
Entering edit mode

Thanks for your reply. This example works fine but it works only for 1 gene. But I want to write a script for the whole human genes. I download the coordinates of all exons from biomart

feel free to refromat the biomart output and to run a loop. Will I write it for you ? no.

ADD REPLY
0
Entering edit mode

ok, thanks for your suggestions I will see. I just want your suggestion only which step I followed because I never write a script , so by biomart I do this thing or i have do this by gtf file.

ADD REPLY

Login before adding your answer.

Traffic: 1391 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6