Question: Tophat - poor mapping in control vs treament
0
gravatar for pveraw
3.6 years ago by
pveraw0
pveraw0 wrote:

Hi everyone!

I'm analysing results from a RNA-seq experiment. My organism is an annotated yeast that has some effect on vitis. I have around 15 millions of reads per sample obtained with illumina pair-end, 90 bp length.

I have 2 strains, c13 and c24 which were cultivated in a control condition and one treatment, the experiment was performed with three replicates.

In summary I have:

C13_control (R1,R2,R3), C13_treat (R1,R2,R3)

C24_control (R1,R2,R3), C24_treat (R1,R2,R3)

I want to cheack the effect of my treatment over the yeast with Tuxedo protocol. At this point I have done Tophat alignment with my 12 samples. with my fasta genome indexed as DA and the following pipeline:

tophat -G annotation.gff3 -o out_folder DA c*_control_R*.fastq c*_treatment_R*.fastq

Checking my results I get something curious. Reads from my c13_control sample aligned only in a 4%, while the same strain, but on treatment map a 60% of the reads. In the other hand, read from my c24 strain map in both cases a 60%. 

If I ignore this result and I continue the protocol, when I observe my differencial expression in c13 the fold change is huge (even 30 times) and almost all the genes are significant, because obviously is comparing high expression in the treatment over my poor mapping in control, so I have the feeling that my control is not being a control. c24 instead, show a fewer number of genes with fold changes lower that 8 times.

I'm trying to find an explanation for this behaviour or find another approach to raise this problem, so I would appreciate any comment!

Thanks guys!

 

rna-seq • 1.4k views
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by pveraw0

I've updated your summary slightly, since I think you had the C13 samples listed twice originally.

Have you trimmed adapaters?

ADD REPLYlink written 3.6 years ago by Devon Ryan89k

Can you do wildcarding like that with tophat? What do your `align_summary.txt` `tophat.log` and `run.log` files look like?

ADD REPLYlink written 3.6 years ago by mbio.kyle330

Thank you for replying:

Yes Devon, I trimmed adapters and QC.

Sorry if I confused you mbio.kyle, I just used * to show that I did the same pipeline for every sample. For example for my control/treatment samples c13 replicate1  I did:

tophat -G annotation.gff3 -o out_folder DA c13_control_R1.1.fastq c13_control_R1.2.fastq

tophat -G annotation.gff3 -o out_folder DA c13_treatment_R1.1.fastq c13_treatment_R1.2.fastq

So far my tophat and run log files does not display any errors or warnings, my align summary are for example:

---------------------------------------------------- C13_control_R1:

Left reads:
          Input     :  13349110
           Mapped   :    468836 ( 3.5% of input)
            of these:       664 ( 0.1%) have multiple alignments (2 have >20)
Right reads:
          Input     :  13349110
           Mapped   :    468082 ( 3.5% of input)
            of these:       587 ( 0.1%) have multiple alignments (2 have >20)
 3.5% overall read mapping rate.

Aligned pairs:    165921
     of these:       193 ( 0.1%) have multiple alignments
                     141 ( 0.1%) are discordant alignments
 1.2% concordant pair alignment rate.

---------------------------------------------------- C13_treatment_R1:

Left reads:
          Input     :  14380800
           Mapped   :   8700067 (60.5% of input)
            of these:     15961 ( 0.2%) have multiple alignments (318 have >20)
Right reads:
          Input     :  14380800
           Mapped   :   8684246 (60.4% of input)
            of these:     15953 ( 0.2%) have multiple alignments (318 have >20)
60.4% overall read mapping rate.

Aligned pairs:   6724170
     of these:     12113 ( 0.2%) have multiple alignments
                   12362 ( 0.2%) are discordant alignments
46.7% concordant pair alignment rate.

-------------------------------------------------------------------------------------

Thanks!

ADD REPLYlink written 3.6 years ago by pveraw0
2

You should blast a few of the unmapped reads and see if you received the wrong samples.

ADD REPLYlink written 3.6 years ago by Devon Ryan89k

Thanks Devon, I patiently did a BLAST against nr database with the 12 million reads that didn't map and I only got ~95 reads mappig a bacteria, which seems to be contation, but I still guessing what is the other huge percentage lost.

ADD REPLYlink written 3.6 years ago by pveraw0
2

Maybe throw them at https://onecodex.com/ and see if that reveals any more contamination

ADD REPLYlink written 3.6 years ago by mbio.kyle330

:D, you just save me!

ADD REPLYlink written 3.6 years ago by pveraw0

Have you run FastQC on your raw data? Do the read qualities look OK?

ADD REPLYlink written 3.6 years ago by Chris Cole680

Hey! Yes, I did, actually the person that did this sequencing removed adapters and the quality is very good. But thank you!

ADD REPLYlink written 3.6 years ago by pveraw0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1038 users visited in the last hour