Question: Align RNASeq reads to combined genomes
1
gravatar for ddzhangzz
3.6 years ago by
ddzhangzz90
United States
ddzhangzz90 wrote:

I am trying to align RNASeq data to both human and mouse genome but seems getting into some trouble. I merged human genome.fa with mouse genome.fa into a combined genome.fa and then only used human genes.gtf for transcriptome assembly. Previously I have successfully aligned these RNASeq data to human genome only but my client said these RNA may have been contaminated with mouse RNA. Given there were more than 50% reads were unable mapped to human, I would try to align the data to combined human and mouse genome. Any suggestions how to do it using tophat2? With my procedure, I got the below errors:

[2016-04-13 14:06:17] Beginning TopHat run (v2.0.9)
-----------------------------------------------
[2016-04-13 14:06:17] Checking for Bowtie
          Bowtie version:    2.1.0.0
[2016-04-13 14:06:17] Checking for Samtools
        Samtools version:    0.1.19.0
[2016-04-13 14:06:17] Checking for Bowtie index files (genome)..
[2016-04-13 14:06:17] Checking for reference FASTA file
[2016-04-13 14:06:17] Generating SAM header for /data1/workspace/DCI/Sarantopoulos/RNASeq/Data/human_and_mouse_genome_combined/human_mouse_combined/Bowtie2Index/genome
    format:      fastq
    quality scale:   phred33 (default)
[2016-04-13 14:06:22] Reading known junctions from GTF file
[2016-04-13 14:06:27] Pre-filtering multi-mapped left reads
[2016-04-13 14:06:27] Mapping JP01_S9_L003_R1_001-trimmed_1 to genome genome with Bowtie2 
[2016-04-13 14:41:46] Pre-filtering multi-mapped right reads
[2016-04-13 14:41:46] Mapping JP01_S9_L003_R2_001-trimmed_2 to genome genome with Bowtie2 
[2016-04-13 15:16:00] Preparing reads
     left reads: min. length=114, max. length=126, 32220590 kept reads (1960224 discarded)
    right reads: min. length=107, max. length=126, 32456688 kept reads (1724126 discarded)
[2016-04-13 15:40:07] Building transcriptome data files..
    [FAILED]
 Error: gtf_to_fasta returned an error.

But there was no error when I align to human genome only previously.

rna-seq • 2.1k views
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by ddzhangzz90
1

You should try to bin the reads before aligning them. Give BBSplit a try for this.
As you discovered it will get very messy (unless you create custom GTF files, indexes etc) if you try to align to both genomes at the same time.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by genomax75k
2
gravatar for ddzhangzz
3.6 years ago by
ddzhangzz90
United States
ddzhangzz90 wrote:

I just figured out how to do it if it is necessary.

  1. rename the chromosome names in mouse .fa (such as from chr1 to mchr1) so that the chromosome names will be unique across human .fa and mouse .fa
  2. rename chromosome names in mouse .gtf file accordingly.
  3. bowtie2 build index of mouse using renamed mouse .fa
  4. merge mouse renamed .gtf with human .gtf
  5. put rebuilt mouse bowtie index and human index into one directory
  6. run bowtie2 for alignment and assembly. using the combined index and merged .gtf.
ADD COMMENTlink written 3.6 years ago by ddzhangzz90
1
gravatar for Sandeep
3.6 years ago by
Sandeep250
Manipal, India
Sandeep250 wrote:

I would have aligned the reads to human genome first and use the unmapped reads to align against the mouse genome.

By doing so, you do not have to complicate merging of gtf files and also find the contamination if any.

Also, try using STAR aligner, it is way faster than TopHat.

ADD COMMENTlink written 3.6 years ago by Sandeep250

Thanks, but I would rather know how to align reads to both human and mouse using tophat2. what are the steps?

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by ddzhangzz90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1661 users visited in the last hour