Rna-Seq Polymorphism Detection In Non-Model Plant Organism
2
9
Entering edit mode
11.0 years ago
Jlw ▴ 190

10 individuals (samples) will be pooled and normalized into a single library for transcriptome sequencing. The reads will be paired-end and 85bp in length.

This is a non-model plant organism (so no close relative for alignment purposes).

Is this design the best strategy for SNP detection? (should barcoding and multi-plexing be employed?)

What are the best tools for SNP detection in this design (de novo assembly with a tool such as Velvet will be necessary and then?)

rna illumina snp velvet • 3.6k views
ADD COMMENT
9
Entering edit mode
10.9 years ago

What was your plan for re-assigning reads back to each of the 10 individuals that went into the pool? If your reason for pooling is to save lanes, it would make sense to use barcoding so you can unambiguously associated reads with individuals after sequencing.

Here's one general workflow:

  • Barcode and sequence
  • Identify barcodes in each of reads, remove them and assign read to one of 10 individuals.
  • Use an assembly tool like Velvet to generate a set of consensus contigs from all reads.
  • For each of the 10 individuals, align the reads separately to your consensus sequences, and identify SNPs with software like GATK or GigaBayes.

Some things to consider:

  • How big is the projected transcriptome? Will you have enough reads from each sample to both assemble and call SNPs? For instance, if you will end up with 2x coverage for each individual would you feel confident enough assigning SNPs with only 2 supporting reads? Titus has a good post on estimating coverage for ChIP-seq, which is a similar exercise.

  • How much duplication do you expect to see in the genome? If your plant species is very polyploid, differentiating individual alleles may be a challenge.

Best of luck with the experiment; it sounds fun.

ADD COMMENT
0
Entering edit mode

Thank you Brad - this is very helpful. This project is for sequencing in a conifer species. The genome itself being extremely large (seven times that of human). The transcriptome however is thought to be about the same as other sequenced tree species (such as populus trichocarpa - poplar). It is luckily not very polyploid (diploid) but we would of course want to do better than 2x coverage for SNP calling. I'll do more digging around with resources but if you have any further advice with this information, I would love to hear it. Thanks again!

ADD REPLY
0
Entering edit mode

My very first research project in undergrad was on Austrian pines; of course I was counting seeds and you're sequencing a genome. It's great that you don't have to deal with any tricky polyploid issues. The next step would be to use the number of reads you'll expect from a lane to estimate your projected number of sequenced bases. Relative to the transcriptome size, this gives an estimate of coverage. Based on your budget, you can then dial in how many barcoded samples to run per lane so you'll get a level of coverage you feel comfortable will let you answer your biological questions.

ADD REPLY
6
Entering edit mode
10.9 years ago
Rm 8.1k

Very good review on "De novo assembly of short sequence reads" by Konrad Paszkiewicz and David J. Studholme Brief Bioinform (2010) 11 (5): 457-472.

A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads Bioinformatics. 2009 May 1;25(9):1118-24.

De novo assembly and analysis of RNA-seq data. Nat Methods. 2010 Oct 10.

These should address some of your doubts.

ADD COMMENT
0
Entering edit mode

The Trans-AbySS paper looks interesting, I am about to give it a whirl for another non-model plant RNA-Seq experiment we've done.

ADD REPLY

Login before adding your answer.

Traffic: 2810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6