Question: Rna-Seq Polymorphism Detection In Non-Model Plant Organism
gravatar for Jlw
9.8 years ago by
Jlw180 wrote:

10 individuals (samples) will be pooled and normalized into a single library for transcriptome sequencing. The reads will be paired-end and 85bp in length.

This is a non-model plant organism (so no close relative for alignment purposes).

Is this design the best strategy for SNP detection? (should barcoding and multi-plexing be employed?)

What are the best tools for SNP detection in this design (de novo assembly with a tool such as Velvet will be necessary and then?)

illumina rna snp velvet • 3.3k views
ADD COMMENTlink written 9.8 years ago by Jlw180
gravatar for Brad Chapman
9.8 years ago by
Brad Chapman9.5k
Boston, MA
Brad Chapman9.5k wrote:

What was your plan for re-assigning reads back to each of the 10 individuals that went into the pool? If your reason for pooling is to save lanes, it would make sense to use barcoding so you can unambiguously associated reads with individuals after sequencing.

Here's one general workflow:

  • Barcode and sequence
  • Identify barcodes in each of reads, remove them and assign read to one of 10 individuals.
  • Use an assembly tool like Velvet to generate a set of consensus contigs from all reads.
  • For each of the 10 individuals, align the reads separately to your consensus sequences, and identify SNPs with software like GATK or GigaBayes.

Some things to consider:

  • How big is the projected transcriptome? Will you have enough reads from each sample to both assemble and call SNPs? For instance, if you will end up with 2x coverage for each individual would you feel confident enough assigning SNPs with only 2 supporting reads? Titus has a good post on estimating coverage for ChIP-seq, which is a similar exercise.

  • How much duplication do you expect to see in the genome? If your plant species is very polyploid, differentiating individual alleles may be a challenge.

Best of luck with the experiment; it sounds fun.

ADD COMMENTlink modified 9.7 years ago • written 9.8 years ago by Brad Chapman9.5k

Thank you Brad - this is very helpful. This project is for sequencing in a conifer species. The genome itself being extremely large (seven times that of human). The transcriptome however is thought to be about the same as other sequenced tree species (such as populus trichocarpa - poplar). It is luckily not very polyploid (diploid) but we would of course want to do better than 2x coverage for SNP calling. I'll do more digging around with resources but if you have any further advice with this information, I would love to hear it. Thanks again!

ADD REPLYlink written 9.7 years ago by Jlw180

My very first research project in undergrad was on Austrian pines; of course I was counting seeds and you're sequencing a genome. It's great that you don't have to deal with any tricky polyploid issues. The next step would be to use the number of reads you'll expect from a lane to estimate your projected number of sequenced bases. Relative to the transcriptome size, this gives an estimate of coverage. Based on your budget, you can then dial in how many barcoded samples to run per lane so you'll get a level of coverage you feel comfortable will let you answer your biological questions.

ADD REPLYlink written 9.7 years ago by Brad Chapman9.5k
gravatar for Rm
9.8 years ago by
Danville, PA
Rm8.0k wrote:

Very good review on "De novo assembly of short sequence reads" by Konrad Paszkiewicz and David J. Studholme Brief Bioinform (2010) 11 (5): 457-472.

A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads Bioinformatics. 2009 May 1;25(9):1118-24.

De novo assembly and analysis of RNA-seq data. Nat Methods. 2010 Oct 10.

These should address some of your doubts.

ADD COMMENTlink written 9.8 years ago by Rm8.0k

The Trans-AbySS paper looks interesting, I am about to give it a whirl for another non-model plant RNA-Seq experiment we've done.

ADD REPLYlink written 9.8 years ago by Daniel Swan13k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1059 users visited in the last hour