Align RNASeq reads to combined genomes
2
1
Entering edit mode
8.0 years ago
ddzhangzz ▴ 90

I am trying to align RNASeq data to both human and mouse genome but seems getting into some trouble. I merged human genome.fa with mouse genome.fa into a combined genome.fa and then only used human genes.gtf for transcriptome assembly. Previously I have successfully aligned these RNASeq data to human genome only but my client said these RNA may have been contaminated with mouse RNA. Given there were more than 50% reads were unable mapped to human, I would try to align the data to combined human and mouse genome. Any suggestions how to do it using tophat2? With my procedure, I got the below errors:

[2016-04-13 14:06:17] Beginning TopHat run (v2.0.9)
-----------------------------------------------
[2016-04-13 14:06:17] Checking for Bowtie
          Bowtie version:    2.1.0.0
[2016-04-13 14:06:17] Checking for Samtools
        Samtools version:    0.1.19.0
[2016-04-13 14:06:17] Checking for Bowtie index files (genome)..
[2016-04-13 14:06:17] Checking for reference FASTA file
[2016-04-13 14:06:17] Generating SAM header for /data1/workspace/DCI/Sarantopoulos/RNASeq/Data/human_and_mouse_genome_combined/human_mouse_combined/Bowtie2Index/genome
    format:      fastq
    quality scale:   phred33 (default)
[2016-04-13 14:06:22] Reading known junctions from GTF file
[2016-04-13 14:06:27] Pre-filtering multi-mapped left reads
[2016-04-13 14:06:27] Mapping JP01_S9_L003_R1_001-trimmed_1 to genome genome with Bowtie2 
[2016-04-13 14:41:46] Pre-filtering multi-mapped right reads
[2016-04-13 14:41:46] Mapping JP01_S9_L003_R2_001-trimmed_2 to genome genome with Bowtie2 
[2016-04-13 15:16:00] Preparing reads
     left reads: min. length=114, max. length=126, 32220590 kept reads (1960224 discarded)
    right reads: min. length=107, max. length=126, 32456688 kept reads (1724126 discarded)
[2016-04-13 15:40:07] Building transcriptome data files..
    [FAILED]
 Error: gtf_to_fasta returned an error.

But there was no error when I align to human genome only previously.

RNA-Seq • 4.3k views
ADD COMMENT
1
Entering edit mode

You should try to bin the reads before aligning them. Give BBSplit a try for this.
As you discovered it will get very messy (unless you create custom GTF files, indexes etc) if you try to align to both genomes at the same time.

ADD REPLY
2
Entering edit mode
8.0 years ago
ddzhangzz ▴ 90

I just figured out how to do it if it is necessary.

  1. rename the chromosome names in mouse .fa (such as from chr1 to mchr1) so that the chromosome names will be unique across human .fa and mouse .fa
  2. rename chromosome names in mouse .gtf file accordingly.
  3. bowtie2 build index of mouse using renamed mouse .fa
  4. merge mouse renamed .gtf with human .gtf
  5. put rebuilt mouse bowtie index and human index into one directory
  6. run bowtie2 for alignment and assembly. using the combined index and merged .gtf.
ADD COMMENT
1
Entering edit mode
8.0 years ago
Sandeep ▴ 260

I would have aligned the reads to human genome first and use the unmapped reads to align against the mouse genome.

By doing so, you do not have to complicate merging of gtf files and also find the contamination if any.

Also, try using STAR aligner, it is way faster than TopHat.

ADD COMMENT
0
Entering edit mode

Thanks, but I would rather know how to align reads to both human and mouse using tophat2. what are the steps?

ADD REPLY

Login before adding your answer.

Traffic: 1911 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6