anlyzing expression of gene from sra database
1
0
Entering edit mode
5.1 years ago

hi all i am very new to bioinformatics.I wanted to see the expression of single gene from different transcriptomic data of organisms without downloading sra data and without doing assembly .what could be simplest way to find the expression. thanks in advance

RNA-Seq • 2.2k views
ADD COMMENT
1
Entering edit mode

The simplest method is to email the various authors and ask them to send that data to you...

ADD REPLY
0
Entering edit mode

I tried this once, asking the corresponding author for their TSA assembly (made with CLC genomics) described in their published paper, I wanted to check for one gene only. In return, the person asked me for co-authorship in the paper we are preparing, because it took them a long time to collect the RNA. As the raw reads data was available in SRA, I assembled these instead using Trinity :/

ADD REPLY
0
Entering edit mode

what if I do SRA BLAST against particular SRA experiment for target gene and then Count up the total reads in a sample and divide that number by 1,000,000 – this will be “per million” scaling factor. Divide the read counts by the “per million” scaling factor. This normalizes for sequencing depth, giving reads per million (RPM) Divide the RPM values by the length of the gene, in kilobases. and giving RPKM.

ADD REPLY
0
Entering edit mode

There are too many confounding factors that cannot be corrected for.

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT or ADD REPLY to reply to earlier comments, as such this thread remains logically structured and easy to follow. I moved this post now, but as you can see it's not optimal.

ADD REPLY
0
Entering edit mode

Do I understand correctly that you want to compare the gene-expression quantitatively for single genes over several experiments and very different organisms and different protocols, and that even without an assembly? This can't be done.

ADD REPLY
0
Entering edit mode

What I am trying to do, I will explain it to you, I wanted to see the expression of a gene family. I have identified those superfamily genes from different plant pathogens and then I am looking for expression level of those genes in corresponding organisms using SRA experiment ( pathogen infected samples of hosts or pathogen transcriptomes ). at different time intervals but all intervals are same stress is similar ( biotic stress) only the organisms and pathogen is getting changed.I am not interested in showing the exact value of expression as it can be calculated only using assembled data. I wanted to show only a idea how the expression of these genes is varied in different pathogens at similar time intervals.

ADD REPLY
0
Entering edit mode

I am sorry, but I do not like the idea of generating some semiquantitative estimate (that is possibly what your use of "idea" implies). You cannot base any conclusion on such procedure and it would not be convincing anyone. Yes you could use kmer based methods instead of alignment, but everything requires a transcriptome. There is a way to solve this problem by generating assemblies using e.g. Trinity and then map the reads back, so why not take it?

ADD REPLY
1
Entering edit mode
5.0 years ago

SRA blast will not help you because it can only do blastn. But to find a gene in distantly related organisms you need at least tblastn. In my opinion there exists no good alternative to downloading the data, best, draft assemblies or to download raw reads and run the assembly yourself using e.g. Trinity on transcriptome shotgun reads. I can post a script that does the automatic download of all draft assemblies given a certain taxid using eutils and sratools.

#!/bin/sh

## usage: fetchAllAssembliesByTaxid.sh <taxid>
## saves the query result in <taxid>.esearch.xml for your reference and further processing

set -u
TAX=$1
RESULT=`esearch -db nuccore -query '((txid'${TAX}'[Organism:exp]) AND ( "tsa master"[Properties] OR "wgs master"[Properties] ))' | \
 efetch -format xml | tee ${TAX}.esearch.xml`
 ID=`echo $RESULT | xtract -pattern Seq-entry  -element Textseq-id_name`

for I in $ID ; do
  echo Downloading $I ...
  if [ -e $I.fasta ]
  then
    echo " skipping because file exists."
    continue # skip if the file has been downloaded already
   fi
   fastq-dump -fasta -F $I
done

If you tried the same for raw reads, it would certainly grow in volume and computational requirements. Therefore I would recommend to do a de-novo assembly only on a few hand-picked transcriptomes and use the raw reads. Transcriptomes can be heavily contaminated with RNA from symbionts, ingested material, etc. Therefore, if you find a hit to your gene of interest, you still needed to do a phylogenetic analysis to exclude this possibility.

ADD COMMENT

Login before adding your answer.

Traffic: 1749 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6