Question: Reference genome for mapping RNA-seq spike-in dataset
2
gravatar for sarahmanderni
6.4 years ago by
sarahmanderni70 wrote:

Using spike-in controls is a common way of evaluating statistical methods while finding differentially expressed genes. Having Fastq files containing ERCC controls and the corresponding gtf file for the ERCCs, how can one does the alignment step with TopHat? For instance if the samples are from human, we have the fastq file, hg19 reference and the ERCC.gtf file. How can one use TopHat to align the fatsq files to the reference genome while they include the ERCC reads? Should we combine the hg19 reference genome with the ERCC. gtf file? Following article can be an example of this situation:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3166838/

How should we include the ERCC controls information to the reference genome used in Tophat?

Thanks for the help.

rna-seq alignment • 7.6k views
ADD COMMENTlink modified 6.4 years ago by karl.stamm3.9k • written 6.4 years ago by sarahmanderni70
5
gravatar for karl.stamm
6.4 years ago by
karl.stamm3.9k
United States
karl.stamm3.9k wrote:

You're on the right track. The ERCC sequences should be available as FASTA you can append to your reference genome as more chromosomes.  Then tophat/bowtie will put the reads that belong to them onto those chromosomes. If you're using a GTF, go ahead and attach them there too, knowing they're unspliced single-exon sorts of mRNA.

Be aware some reads of human genome will fit on to some of the ERCC chromosomes as well. It's not many but it's not zero.

 

ADD COMMENTlink written 6.4 years ago by karl.stamm3.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1989 users visited in the last hour