Question: 3' UTR sequence
0
gravatar for leo1985.arnab
3.2 years ago by
leo1985.arnab20 wrote:

Hello, This is my first post here !

I have a pair of fasta sequence files one having several CDS sequences (file X) and the other with several full length transcript sequences (file Y).

The file with CDS (file X) sequence have entries as follows:

>ABCD (40..120)
…………the sequence……………….

This goes on for another 200 different gene entries

The file with the full length transcript sequence (file Y) have entries as follows:

>ABCD (1..700)
…………the sequence……………….

And this also goes on for another 200 different gene entries.

The ABCD implies a gene name which is of course identical in both the fasta files. Hence the sequence is also identical in between coordinates 40..120 for both sets. I want to extract the 3’ UTR sequences from all the fasta sequence entries that are in the full transcript file. So basically, I am trying to come up with a script that will allow me to extract the sequence for every gene from the full transcript file right after where the CDS sequence ends. So in the above example I am trying to get the sequence between coordinates (121 ..700) for gene ABCD from the full transcript fasta file. I am trying to make the script so that it loops through the entire file for all the 200 gene entries.

I am primarily using bash and little of perl. Any help will be most appreciated.

Thanks and regards!

sequence • 1.1k views
ADD COMMENTlink modified 12 months ago by akeethpinto0 • written 3.2 years ago by leo1985.arnab20

Hello leo1985.arnab!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?p=203861#post203861

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 3.2 years ago by WouterDeCoster43k

Ok, thanks! I appreciate your comment.

ADD REPLYlink written 3.2 years ago by leo1985.arnab20

Hello, I am trying to do the same for a bunch of ORFs. Were you successful in extracting the 3' UTRs? I've got FASTA sequences of my transcripts and I have run transdecoder to identify the 3' UTR regions. I'm wondering how to extract these sequences. Please help!

ADD REPLYlink written 12 months ago by akeethpinto0

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your reaction but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

ADD REPLYlink written 12 months ago by WouterDeCoster43k
1
gravatar for geek_y
3.2 years ago by
geek_y10k
Barcelona
geek_y10k wrote:

Do they follow same pattern always ? Like >GENE (start..end) and both have same genes ? You may want to look into bedtools getfasta

You should get a bed format that says:

ABCD    121    700
EFGH 500    900
....
....

Then run:

bedtools getfasta -fi transcript.fasta -bed in.bed

So you are asking bedtools to get the sequence from 120 to 700 bp from full length transcript file. Make sure about 0 and 1 based coordinates.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by geek_y10k

Goutham,

Thanks so much for your response. Sorry I could not follow up sooner. To answer your question : yes, the genes in the two fasta files do follow the same order in appearance. However, in your post when you mentioned "you should get a bed format:...." I looked into getfasta for bedtools but I am a bit confused how the bed file will be entertained here, as both are fasta files that I have. I can try bowtie to generate a bam file and then use bedtools to convert bam to bed file. Is that what you meant?

ADD REPLYlink written 3.2 years ago by leo1985.arnab20

You need to convert your fasta header of CDS to a bed format.

ADD REPLYlink written 3.2 years ago by geek_y10k

Okay, thanks...........

ADD REPLYlink written 3.2 years ago by leo1985.arnab20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1796 users visited in the last hour