Entering edit mode
8.2 years ago
zizigolu
★
4.3k
Hi,
I received my fq files from company and started anslysis with tophat, actually before this I was just practicing over another data sets but it is mine and I should report then for one sample I did like below
tophat \
-p 8 \
-G gene.gtf \
-o pri2h \
genome \
FCH3YYFBBXX-HKARAexkEAAARABPEI-204_L6_1-trimmed.fq FCH3YYFBBXX-HKARAexkEAAARABPEI-204_L6_2-trimmed.fq
The alignment summary:
Left reads:
Input : 19209324
Mapped : 17883849 (93.1% of input)
of these: 2113360 (11.8%) have multiple alignments (432 have >20)
Right reads:
Input : 19209324
Mapped : 17116529 (89.1% of input)
of these: 2064295 (12.1%) have multiple alignments (431 have >20)
Unpaired reads:
Input : 458630
Mapped : 427128 (93.1% of input)
of these: 27342 ( 6.4%) have multiple alignments (2 have >20)
91.1% overall read mapping rate.
Aligned pairs: 15935420
of these: 1988045 (12.5%) have multiple alignments
15613891 (98.0%) are discordant alignments
1.7% concordant pair alignment rate.
Did I do this right? I mean can I follow this for other samples?
In trimming part I did like below with Aabidopsis enseble contaminant adapter sequence
cutadapt \
-a AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT \
-q 35 \
-m 15 \
FCH3YYFBBXX-HKARAexkEAAFRAAPEI-209_L6_1.fq > FCH3YYFBBXX-HKARAexkEAAFRAAPEI-209_L6_1-trimmed.fq
Thank you
Since the individual reads are mapping well the first thing to check is if the order of the R1/R2 reads in your source files is messed up. Based on the file names it appears that you did trim this data. Did you trim both files using a PE-aware trimmer?
thank you, i don't know but i used cutadapt for each file separately and with specific adapter sequence
That likely threw the order of reads off. See my reply about 3-4 posts down from the main question in this thread: de novo transcriptome assembly work flow, paired end reads.
You can either use the "repair.sh" from BBMap to "re-pair" the reads in the proper order or redo the trimming using trimmomatic/bbduk/cutadapt using both files for a sample at the same time. Either way you will need to do tophat mapping again once you fix the problem.
Thank you so much, you were all right, I realigned with untrimmed fq files
In Fastqc, there is nothing in overrepresented sequence, then what happen if I directly align fq file come from company without any process?
Thanks again
While FASTQC will find over-represented sequences it may miss small fragments of adapters at the ends of reads (if they are present). Your data looks clean but there is no harm is passing it through a scan/trimming program. If there are no adapters there would be no change to the data. Only thing to keep in mind is to trim PE data with a PE-aware aligner.
Looking at the numbers above it appears that at least 689,000 reads were trimmed in R1 file during the first round so there is a little bit of adapter there.
Thank you I used this command and everything is OK and this is the result
concordant rate increased