anlyzing expression of gene from sra database
1
0
Entering edit mode
6.2 years ago

hi all i am very new to bioinformatics.I wanted to see the expression of single gene from different transcriptomic data of organisms without downloading sra data and without doing assembly .what could be simplest way to find the expression. thanks in advance

RNA-Seq • 2.6k views
1
Entering edit mode

The simplest method is to email the various authors and ask them to send that data to you...

0
Entering edit mode

I tried this once, asking the corresponding author for their TSA assembly (made with CLC genomics) described in their published paper, I wanted to check for one gene only. In return, the person asked me for co-authorship in the paper we are preparing, because it took them a long time to collect the RNA. As the raw reads data was available in SRA, I assembled these instead using Trinity :/

0
Entering edit mode

what if I do SRA BLAST against particular SRA experiment for target gene and then Count up the total reads in a sample and divide that number by 1,000,000 – this will be “per million” scaling factor. Divide the read counts by the “per million” scaling factor. This normalizes for sequencing depth, giving reads per million (RPM) Divide the RPM values by the length of the gene, in kilobases. and giving RPKM.

0
Entering edit mode

There are too many confounding factors that cannot be corrected for.

0
Entering edit mode
0
Entering edit mode

Please use ADD COMMENT or ADD REPLY to reply to earlier comments, as such this thread remains logically structured and easy to follow. I moved this post now, but as you can see it's not optimal.

0
Entering edit mode

Do I understand correctly that you want to compare the gene-expression quantitatively for single genes over several experiments and very different organisms and different protocols, and that even without an assembly? This can't be done.

0
Entering edit mode

What I am trying to do, I will explain it to you, I wanted to see the expression of a gene family. I have identified those superfamily genes from different plant pathogens and then I am looking for expression level of those genes in corresponding organisms using SRA experiment ( pathogen infected samples of hosts or pathogen transcriptomes ). at different time intervals but all intervals are same stress is similar ( biotic stress) only the organisms and pathogen is getting changed.I am not interested in showing the exact value of expression as it can be calculated only using assembled data. I wanted to show only a idea how the expression of these genes is varied in different pathogens at similar time intervals.

0
Entering edit mode

I am sorry, but I do not like the idea of generating some semiquantitative estimate (that is possibly what your use of "idea" implies). You cannot base any conclusion on such procedure and it would not be convincing anyone. Yes you could use kmer based methods instead of alignment, but everything requires a transcriptome. There is a way to solve this problem by generating assemblies using e.g. Trinity and then map the reads back, so why not take it?

1
Entering edit mode
6.2 years ago

#!/bin/sh

## usage: fetchAllAssembliesByTaxid.sh <taxid>
## saves the query result in <taxid>.esearch.xml for your reference and further processing

set -u
TAX=$1 RESULT=esearch -db nuccore -query '((txid'${TAX}'[Organism:exp]) AND ( "tsa master"[Properties] OR "wgs master"[Properties] ))' | \
efetch -format xml | tee ${TAX}.esearch.xml ID=echo$RESULT | xtract -pattern Seq-entry  -element Textseq-id_name

for I in $ID ; do echo Downloading$I ...
if [ -e $I.fasta ] then echo " skipping because file exists." continue # skip if the file has been downloaded already fi fastq-dump -fasta -F$I
done


If you tried the same for raw reads, it would certainly grow in volume and computational requirements. Therefore I would recommend to do a de-novo assembly only on a few hand-picked transcriptomes and use the raw reads. Transcriptomes can be heavily contaminated with RNA from symbionts, ingested material, etc. Therefore, if you find a hit to your gene of interest, you still needed to do a phylogenetic analysis to exclude this possibility.