Question: Annotating Rna-Seq Data Using A Reference Genome.
3
gravatar for Linda
7.6 years ago by
Linda160
Linda160 wrote:

I have RNA-seq reads from a non-model organism. I used cufflinks to identify transcripts. Is there an existing pipeline to BLAST these transcripts to a model organism's proteins to identify orthologs?

rna blast • 3.1k views
ADD COMMENTlink modified 6.1 years ago by Zhidkov560 • written 7.6 years ago by Linda160
2
gravatar for Zhidkov
7.6 years ago by
Zhidkov560
Israel
Zhidkov560 wrote:

Hi Linda, Regarding BLAST usage: you can download local blast from here: http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download

available databases can be found here: ftp://ftp.ncbi.nlm.nih.gov/blast/db/

most probably you would like to run blastx against nr, or you can download uniref90. I suggest to use tabular format for output (saves some time and space). You can filter out obtained results by alignment length and /or E-value smaller than -5 (for example).

Ilia

ADD COMMENTlink written 7.6 years ago by Zhidkov560

HI @Zhidkov, I have a similar question. To identify the sequence conservation of our de novo assembled transcriptome of a non-model plant, we blasted our transcriptome against several plants' proteome database using blastx (NCBI Blast+ 2.2.26), with the output of tabular format. As you mentioned, we can filter our results by alignment length and/or E-value. Since I already set the e-value to 1e-5, how to set the alignment length in filter? Generally, the value of the alignment length. Thank you. Regards

ADD REPLYlink written 6.2 years ago by lzsph70

Hi, I don't really understand where is the problem... if you used default tabular output, column number 4 correspond to alignment length. Do you run blast from command line (terminal) or web-based? In any case, I'm not sure you can set filter "minimum alignment length" as parameter in blast search. If you run blast from command line, you can give something like: 'blastx -query <File_In> -db <your_database> -evalue <your favorite=""> -outfmt 6 |awk -F "\t" '{if ($4>=yourlength) print}' > Tabularblastx.txt'

ADD REPLYlink modified 6.2 years ago • written 6.2 years ago by Zhidkov560

Hi @Zhidkov, sorry for the late reply. I did run blastx from command line, I just set the evalue to 1e-5, leaving the alignment length undefined, it it ok?

ADD REPLYlink modified 6.2 years ago • written 6.2 years ago by lzsph70
0
gravatar for Zhidkov
6.1 years ago by
Zhidkov560
Israel
Zhidkov560 wrote:

Hi,
the length of alignment is additional filter, if you get to many results using e-value cutoff only, you can stringent your filter by filtering out to short alignments, query coverage etc.
Your data, your goals, your filters.
Just for example: you have transcript-Y 2kb long, after BLASTX you got hit with 1e-7 to X-protein and alignment length was 200bp with several indels, can you conclude that transcript-Y is X-protein?
What will be your filters for reliable annotation in that case?

Ilia

ADD COMMENTlink written 6.1 years ago by Zhidkov560

Hi Ilia,

Thank you very much.

I have a query file contains ~200,000 sequences with various lengths, and the tabular output of blastx also contains many sequences with different lengths, is it possible to set only one alignment length value in the command "...($4>=yourlength)"? If I was wrong, please figure it out.

Regards, lzsph

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by lzsph70

Yes it possible, (you set a minimum length >= something), if that doesn't feel right for you you can filter on coverage, for example you can demand that at least 50% of your query sequence will be covered. I suggest you to perform small test (you'll feel much more confident after that) - take several known transcripts , run BLASTX against all plants proteins and check which alignments get you unreliable results (i.e you used SOD1 for query but getting p53 as hit), set your filters accordingly.

Ilia

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Zhidkov560

OK, Ilia, I'll get it a try.

Thank you!

Regards,lzsph

ADD REPLYlink written 6.1 years ago by lzsph70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1274 users visited in the last hour