Question: New to RNA-Seq: Reads assembled...now what? Non-sequenced organism. So lost!!
0
gravatar for samantha_jeschonek
4.7 years ago by
United States
samantha_jeschonek50 wrote:

Hello! Thanks for taking the time to read. I'm very new to RNA-Seq, and have been banging my head against a wall for a few weeks now trying to analyze my data.

We performed RIP-Seq in our lab, with IPs against 3 different antibodies (Antibody-A, Antibody-B, and an IgG negative control). Each sample also had three biological replicates. The IP'd samples were processed for paired-end RNA-Seq.
Our goal is to determine the transcripts that were pulled down with both Antibody-A and Antibody-B, but NOT the IgG.  

Currently, I have assembled the paired end reads (using a Trinity-based program, Agalma). I am now stuck at postassembly and data analysis. I have tried a few different approaches but I have not had much luck with any.

The main issue is that I am working in Xenopus laevis (oocytes), for which only an incomplete draft genome exists.  The closest well-sequenced cousin is Xenopus tropicalis. 

Initially, I performed a postassembly in the Agalma program  in which transcripts were annotated against a subset swissprot database (based on the GI numbers associated with Xenopus laevis).  This postassembly step also used RSEM to give FPKM values.  The problem I encountered was that a control transcript seemed to have much higher FPKM values in our negative IgG controls than in our experimental IPs (in all replicates). This is unusual as we KNOW that this transcript should be abundantly expressed in the Antibody-A and Antibody-B IPs.  

I thought the issue might be comparing FPKM across samples, and that a different approach would be better. I did some reading and EdgeR seemed like the program I wanted to use for differential expression across samples.

Following the edgeR manual, I first went to setup a table of read counts, using the featureCounts function of the Subread package. This program takes a BAM file and assigns mapped reads to genomic features in a GTF file. The output gives read counts for each gene. Since I could not find a gene annotation file (GTF) for Xenopus laevis, I used one for the related species, tropicalis.  Unfortunately, this resulted in 0 feature counts for each gene and no assigned reads. So, I couldn't move on to the edgeR analysis.

I think that the problem lies with referencing my samples to laevis sequences in one step and tropicalis sequences in another. I think it would be better if I had a GTF file for Xenopus laevis, but I'm not sure if this is possible.

What I have that might be useful is assembled reads for the laevis oocytes. In addition to RIP-Seq, I did straight RNA-seq on the laevis oocytes. I'm not sure if this data can somehow be used as a comparison or baseline for anything in the RIP-seq experiment. 

Does anyone have any insight on what I am doing wrong, or if there's a better way to approach my question? Or if there's a way use my whole oocyte RNA-seq data to help with the RIP-seq data?

I'm sorry this was such a long read or if anything is confusing. If you read though it all, thank you so much. Any and all insight is very appreciated. I feel like I'm just hitting a giant wall here!

Thanks,

-Sam

xenopus rip-seq rna-seq • 2.8k views
ADD COMMENTlink modified 4.7 years ago by Sam2.2k • written 4.7 years ago by samantha_jeschonek50
2
gravatar for Sam
4.7 years ago by
Sam2.2k
London
Sam2.2k wrote:

From my record with RSEM, we did RNA Sequencing on one of our samples and get 5 genes validated using rtPCR showing that they are differentially expressed but RSEM said that they were not. So I am a bit skeptical with the RSEM pipeline. However, you can actually use the RSEM pipeline to generate the required read table for edgeR: http://trinityrnaseq.sourceforge.net/analysis/diff_expression_analysis.html

 

As for the GTF. If you use a different organism, I can only imagine that the chromosome name and coordinates will not fit your alignment file and as a result of that, you will not get any gene features. 

I will suggest you read through the trinity manual: http://trinityrnaseq.sourceforge.net/index.html 

Their resources are relatively comprehensive and are very helpful. Hope that will help.

 

ADD COMMENTlink written 4.7 years ago by Sam2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1384 users visited in the last hour