Question: transcriptome shotgun assembly annotation
0
gravatar for najibveto
2.8 years ago by
najibveto40
Korea, Republic Of
najibveto40 wrote:

hello i am working on fish and i am doing research on fathead minnow but there is lack in genes deposited in NCBI database but there is a published transcriptome shotgun assembly without annotation https://www.ncbi.nlm.nih.gov/nuccore/GCVQ00000000.1 can someone tell me how to do annotation for this and get CDS sequences and UTR sequences based on closet relative of the fish which usc zebrafish? thanks a lot for your help.

transcriptome assembly • 1.2k views
ADD COMMENTlink modified 2.8 years ago by Farbod3.3k • written 2.8 years ago by najibveto40
0
gravatar for EVR
2.8 years ago by
EVR560
Earth
EVR560 wrote:

Hi,

Use TransDecoder tool for annotating the transcriptome. Quite useful and reliable

ADD COMMENTlink written 2.8 years ago by EVR560
0
gravatar for Farbod
2.8 years ago by
Farbod3.3k
Toronto
Farbod3.3k wrote:

Dear najibveto, Hi

You could begin with BLASTX your transcripts against SwissProt, or BlastX them against Danio rerio Reference protein.

~ Take Care

ADD COMMENTlink written 2.8 years ago by Farbod3.3k

thanks for your advice however i am newbie to bioinformatics so can you please a tutorial how to do blastx against zebrafish in linux? thanks a lot for your help.

ADD REPLYlink written 2.8 years ago by najibveto40

Hi Najib,

You can find some simple script for making blast database and running BlastX in here (ofcourse you need to use more relax e-value, e.g e-value=1e-6).

you can add other parameter to your blast, ofcourse.

You can use "outfmt 5" parameter for XML blast result format and then use it in Blast2GO for annotation (which for huge number of transcripts is very time consuming)

As @Thamizh has said, you can also use TransDecoder for prediction of ORF of your transcripts and integrate blast result into it for more accuracy.

~ Best

NOTE: It would be good to check for the original paper of fathead carp to see which annotation strategy they had proceeded.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Farbod3.3k

thanks a lot for your help. I did as follow I looked for the longest ORF using Transdecoder, after that i blasted them against zebrafish reference proteins and now I need the 3'UTR for each gene to be used for miRNA target prediction.

ADD REPLYlink written 2.8 years ago by najibveto40
1

Nice to hear there is some progress ;-)

If you have a GFF3 file generated by transdecoder for your transcriptome then you could use grep "three_prime_UTR" your_gff3 > output.txt". This command will output only the 3_prime_UTR regions (Start and End) for every transcript in your transcriptome.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Farbod3.3k
1

thanks a lot for your help^^ now I got the 3'utr start and end point and I got the blast results and I integrated with the transcoder results how is it possible to get 3'utr sequences with respective gene id based on blast results?

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by najibveto40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1233 users visited in the last hour