Question: concordant pair alignment rate is too low
0
gravatar for F
3.1 years ago by
F3.4k
Iran
F3.4k wrote:

hi,

I received my fq files from company and started anslysis with tophat, actually before this i was just practicing over another data sets but it is mine and i should report then for one sample i did like below

tophat -p 8 -G gene.gtf -o pri2h genome FCH3YYFBBXX-HKARAexkEAAARABPEI-204_L6_1-trimmed.fq FCH3YYFBBXX-HKARAexkEAAARABPEI-204_L6_2-trimmed.fq

the alignment summary 

Left reads:
          Input     :  19209324
           Mapped   :  17883849 (93.1% of input)
            of these:   2113360 (11.8%) have multiple alignments (432 have >20)
Right reads:
          Input     :  19209324
           Mapped   :  17116529 (89.1% of input)
            of these:   2064295 (12.1%) have multiple alignments (431 have >20)
Unpaired reads:
          Input     :    458630
           Mapped   :    427128 (93.1% of input)
            of these:     27342 ( 6.4%) have multiple alignments (2 have >20)
91.1% overall read mapping rate.

Aligned pairs:  15935420
     of these:   1988045 (12.5%) have multiple alignments
                15613891 (98.0%) are discordant alignments
 1.7% concordant pair alignment rate.

if i did right? i mean can i follow for another samples?

in trimming part i did like below with Aabidopsis enseble contaminant adapter sequence

cutadapt -a AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT -q 35 -m 15 FCH3YYFBBXX-HKARAexkEAAFRAAPEI-209_L6_1.fq > FCH3YYFBBXX-HKARAexkEAAFRAAPEI-209_L6_1-trimmed.fq

 

thank you

sequencing rna-seq alignment • 3.6k views
ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by F3.4k
1

Since the individual reads are mapping well the first thing to check is if the order of the R1/R2 reads in your source files is messed up. Based on the file names it appears that you did trim this data. Did you trim both files using a PE-aware trimmer?

ADD REPLYlink written 3.1 years ago by genomax64k

thank you, i don't know but i used cutadapt for each file separately and with specific adapter sequence

ADD REPLYlink written 3.1 years ago by F3.4k
1

That likely threw the order of reads off. See my reply about 3-4 posts down from the main question in this thread: de novo transcriptome assembly work flow, paired end reads.

You can either use the "repair.sh" from BBMap to "re-pair" the reads in the proper order or redo the trimming using trimmomatic/bbduk/cutadapt using both files for a sample at the same time. Either way you will need to do tophat mapping again once you fix the problem.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by genomax64k

thank you so much, you were all right, I realigned with untrimmed fq files 

Left reads:
          Input     :  19897899
           Mapped   :  18436480 (92.7% of input)
            of these:   1102696 ( 6.0%) have multiple alignments (169 have >20)
Right reads:
          Input     :  19897899
           Mapped   :  17303775 (87.0% of input)
            of these:   1025015 ( 5.9%) have multiple alignments (160 have >20)
89.8% overall read mapping rate.

Aligned pairs:  17084898
     of these:   1011946 ( 5.9%) have multiple alignments
                   28481 ( 0.2%) are discordant alignments
85.7% concordant pair alignment rate.

in Fastqc, there is nothing in overrepresented  sequence, then what happen if I directly align fq file come from company without any process????

thanks again

ADD REPLYlink written 3.1 years ago by F3.4k
1

While FASTQC will find over-represented sequences it may miss small fragments of adapters at the ends of reads (if they are present). Your data looks clean but there is no harm is passing it through a scan/trimming program. If there are no adapters there would be no change to the data. Only thing to keep in mind is to trim PE data with a PE-aware aligner.

Looking at the numbers above it appears that at least 689,000 reads were trimmed in R1 file during the first round so there is a little bit of adapter there.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by genomax64k

thank you I used this command and every thing is ok and this is the result

$CUT/cutadapt -q 35 -m 15 -a AGATCGGAAGAGC -g AGATCGGAAGAGC -o out.1.fq -p out.2.fq FCH3YYFBBXX-HKARAexkEAAARABPEI-204_L6_1.fq FCH3YYFBBXX-HKARAexkEAAARABPEI-204_L6_2.fq

Left reads:
          Input     :  19667353
           Mapped   :  18672982 (94.9% of input)
            of these:   1127159 ( 6.0%) have multiple alignments (939 have >20)
Right reads:
          Input     :  19667353
           Mapped   :  17146234 (87.2% of input)
            of these:   1019710 ( 5.9%) have multiple alignments (848 have >20)
91.1% overall read mapping rate.

Aligned pairs:  16994149
     of these:   1010648 ( 5.9%) have multiple alignments
                   25673 ( 0.2%) are discordant alignments
86.3% concordant pair alignment rate.

concordant rate increased

 

ADD REPLYlink written 3.1 years ago by F3.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 753 users visited in the last hour