Question

concordant pair alignment rate is too low

0

Entering edit mode

8.2 years ago

zizigolu ★ 4.3k

Hi,

I received my fq files from company and started anslysis with tophat, actually before this I was just practicing over another data sets but it is mine and I should report then for one sample I did like below

tophat \
  -p 8 \
  -G gene.gtf \
  -o pri2h \
  genome \
  FCH3YYFBBXX-HKARAexkEAAARABPEI-204_L6_1-trimmed.fq FCH3YYFBBXX-HKARAexkEAAARABPEI-204_L6_2-trimmed.fq

The alignment summary:

Left reads:
          Input     :  19209324
           Mapped   :  17883849 (93.1% of input)
            of these:   2113360 (11.8%) have multiple alignments (432 have >20)
Right reads:
          Input     :  19209324
           Mapped   :  17116529 (89.1% of input)
            of these:   2064295 (12.1%) have multiple alignments (431 have >20)
Unpaired reads:
          Input     :    458630
           Mapped   :    427128 (93.1% of input)
            of these:     27342 ( 6.4%) have multiple alignments (2 have >20)
91.1% overall read mapping rate.

Aligned pairs:  15935420
     of these:   1988045 (12.5%) have multiple alignments
                15613891 (98.0%) are discordant alignments
 1.7% concordant pair alignment rate.

Did I do this right? I mean can I follow this for other samples?

In trimming part I did like below with Aabidopsis enseble contaminant adapter sequence

cutadapt \
  -a AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT \
  -q 35 \
  -m 15 \
  FCH3YYFBBXX-HKARAexkEAAFRAAPEI-209_L6_1.fq > FCH3YYFBBXX-HKARAexkEAAFRAAPEI-209_L6_1-trimmed.fq

Thank you

RNA-Seq sequencing alignment • 6.1k views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 8.2 years ago by zizigolu ★ 4.3k

1

Entering edit mode

Since the individual reads are mapping well the first thing to check is if the order of the R1/R2 reads in your source files is messed up. Based on the file names it appears that you did trim this data. Did you trim both files using a PE-aware trimmer?

ADD REPLY • link 8.2 years ago by GenoMax 141k

0

Entering edit mode

thank you, i don't know but i used cutadapt for each file separately and with specific adapter sequence

ADD REPLY • link 8.2 years ago by zizigolu ★ 4.3k

1

Entering edit mode

That likely threw the order of reads off. See my reply about 3-4 posts down from the main question in this thread: de novo transcriptome assembly work flow, paired end reads.

You can either use the "repair.sh" from BBMap to "re-pair" the reads in the proper order or redo the trimming using trimmomatic/bbduk/cutadapt using both files for a sample at the same time. Either way you will need to do tophat mapping again once you fix the problem.

ADD REPLY • link 8.2 years ago by GenoMax 141k

0

Entering edit mode

Thank you so much, you were all right, I realigned with untrimmed fq files

Left reads:
          Input     :  19897899
           Mapped   :  18436480 (92.7% of input)
            of these:   1102696 ( 6.0%) have multiple alignments (169 have >20)
Right reads:
          Input     :  19897899
           Mapped   :  17303775 (87.0% of input)
            of these:   1025015 ( 5.9%) have multiple alignments (160 have >20)
89.8% overall read mapping rate.

Aligned pairs:  17084898
     of these:   1011946 ( 5.9%) have multiple alignments
                   28481 ( 0.2%) are discordant alignments
85.7% concordant pair alignment rate.

In Fastqc, there is nothing in overrepresented sequence, then what happen if I directly align fq file come from company without any process?

Thanks again

ADD REPLY • link updated 21 months ago by Ram 43k • written 8.2 years ago by zizigolu ★ 4.3k

1

Entering edit mode

While FASTQC will find over-represented sequences it may miss small fragments of adapters at the ends of reads (if they are present). Your data looks clean but there is no harm is passing it through a scan/trimming program. If there are no adapters there would be no change to the data. Only thing to keep in mind is to trim PE data with a PE-aware aligner.

Looking at the numbers above it appears that at least 689,000 reads were trimmed in R1 file during the first round so there is a little bit of adapter there.

ADD REPLY • link 8.2 years ago by GenoMax 141k

0

Entering edit mode

Thank you I used this command and everything is OK and this is the result

$CUT/cutadapt -q 35 -m 15 -a AGATCGGAAGAGC -g AGATCGGAAGAGC -o out.1.fq -p out.2.fq FCH3YYFBBXX-HKARAexkEAAARABPEI-204_L6_1.fq FCH3YYFBBXX-HKARAexkEAAARABPEI-204_L6_2.fq

Left reads:
          Input     :  19667353
           Mapped   :  18672982 (94.9% of input)
            of these:   1127159 ( 6.0%) have multiple alignments (939 have >20)
Right reads:
          Input     :  19667353
           Mapped   :  17146234 (87.2% of input)
            of these:   1019710 ( 5.9%) have multiple alignments (848 have >20)
91.1% overall read mapping rate.

Aligned pairs:  16994149
     of these:   1010648 ( 5.9%) have multiple alignments
                   25673 ( 0.2%) are discordant alignments
86.3% concordant pair alignment rate.

concordant rate increased

ADD REPLY • link updated 21 months ago by Ram 43k • written 8.2 years ago by zizigolu ★ 4.3k