Is mir-seq reads quality good (Fastqc report) for DE analysis?
1
0
Entering edit mode
4 months ago

Hi, I'm analyzing miRNA-seq data to find the DE-miRNAs

In the first step, I want to investigate the quality of the reads so I use fastqc

you can find the fastqc report in below images

enter image description here

enter image description here

Sequence    Count   Percentage  Possible Source
GGCTGGTCCGATGGTAGTGGGTTATCAGAACTAGATCGGAAGAGCACACG  3243049 10.389700800039725  No Hit
CGCGACCTCAGATCAGACGTGGCGACCCGCTGAAAGATCGGAAGAGCACA  2634822 8.441134328023496   No Hit
CGCGACCTCAGATCAGACGTGGCGACCCGCTGAATAGATCGGAAGAGCAC  1626222 5.20989970069592    No Hit
TCAGTGCACTACAGAACTTTGTAGATCGGAAGAGCACACGTCTGAACTCC  1582461 5.06970333094926    Illumina Multiplexing PCR Primer 2.01 (100% over 28bp)
CGCGACCTCAGATCAGACGTGGCGACCCGCTGAATTAGATCGGAAGAGCA  1455372 4.662550468018034   No Hit
CGCGACCTCAGATCAGACGTGGCGACCCGCTGAAGATCGGAAGAGCACAC  1083586 3.4714659973105086  No Hit
GCGGGTGATGCGAACTGGAGTCTGAGCAGATCGGAAGAGCACACGTCTGA  853807  2.735326931840844   Illumina Multiplexing PCR Primer 2.01 (100% over 23bp)
TAGCTTATCAGACTGATGTTGACAGATCGGAAGAGCACACGTCTGAACTC  812844  2.6040944669992623  Illumina Multiplexing PCR Primer 2.01 (100% over 27bp)
GGCTGGTCCGATGGTAGTGGGTTATCAGAACAGATCGGAAGAGCACACGT  637999  2.04394652092045    No Hit
TGACTGTGCTGAGTCTGTTCAATCCAACCCTGAGCAGATCGGAAGAGCAC  624090  1.9993864947143236  No Hit
TGAGGTAGTAGTTTGTGCTGTTAGATCGGAAGAGCACACGTCTGAACTCC  509802  1.6332439772762768  Illumina Multiplexing PCR Primer 2.01 (100% over 28bp)
CGCGACCTCAGATCAGACGTAGATCGGAAGAGCACACGTCTGAACTCCAG  399415  1.279599027041487   Illumina Multiplexing PCR Primer 2.01 (100% over 30bp)
TTTCTGTGTGGAATTTGAATATCTGAAAAGATCGGAAGAGCACACGTCTG  385161  1.2339337302162567  Illumina Multiplexing PCR Primer 2.01 (100% over 22bp)
AATGTGTGACTGAAAGGTATTTTCTGAGCAGATCGGAAGAGCACACGTCT  306783  0.9828354676536146  Illumina Multiplexing PCR Primer 2.01 (100% over 21bp)
ATGTGTGACTGAAAGGTATTTTCTGAGCAGATCGGAAGAGCACACGTCTG  268355  0.8597243391002296  Illumina Multiplexing PCR Primer 2.01 (100% over 22bp)
GGCTGGTCCGATGGTAGTGGGTTATCAGAACTTAGATCGGAAGAGCACAC  261890  0.839012528803112   No Hit
GGCTGGTCCGATGGTAGTGGGTTATCAGAACAAGATCGGAAGAGCACACG  251323  0.8051592110289989  No Hit
TATTGCACTTGTCCCGGCCTGTAAGATCGGAAGAGCACACGTCTGAACTC  227493  0.7288154462369941  Illumina Multiplexing PCR Primer 2.01 (100% over 27bp)
AGTGATGATGACCCCAGGTAACTCTGAGTAGATCGGAAGAGCACACGTCT  206590  0.6618488614511242  Illumina Multiplexing PCR Primer 2.01 (100% over 21bp)
TTCAAGTAATCCAGGATAGGCTAGATCGGAAGAGCACACGTCTGAACTCC  197002  0.6311319492889025  Illumina Multiplexing PCR Primer 2.01 (100% over 28bp)
TGAGGTAGTAGTTTGTACAGTTAGATCGGAAGAGCACACGTCTGAACTCC  182039  0.5831952412493403  Illumina Multiplexing PCR Primer 2.01 (100% over 28bp)
TGAGGTAGTAGATTGTATAGTTAGATCGGAAGAGCACACGTCTGAACTCC  143628  0.460138575306172   Illumina Multiplexing PCR Primer 2.01 (100% over 28bp)
CGCGACCTCAGATCAGACGTGGCGACCCGCTGAGATCGGAAGAGCACACG  143088  0.4584085865110531  No Hit
CGCGGGTGATGCGAACTGGAGTCTGAGCAGATCGGAAGAGCACACGTCTG  142534  0.4566337461545793  Illumina Multiplexing PCR Primer 2.01 (100% over 22bp)
ACTGCTGACGCGGGTGATGCGAACTGGAGTCTGAGCAGATCGGAAGAGCA  138132  0.44253113379140663 No Hit
CTAGACTGAAGCTCCTTGAGGAGATCGGAAGAGCACACGTCTGAACTCCA  133930  0.42906925801901863 Illumina Multiplexing PCR Primer 2.01 (100% over 29bp)
TTACTTGATGACAATAAAATATCTGATAAGATCGGAAGAGCACACGTCTG  117470  0.3763366365974324  Illumina Multiplexing PCR Primer 2.01 (100% over 22bp)
AGTAATGATGAATGCCAACCGCTCTGATGAGATCGGAAGAGCACACGTCT  113252  0.3628235018977817  Illumina Multiplexing PCR Primer 2.01 (100% over 21bp)
TATTGCACTTGTCCCGGCCTGTAGATCGGAAGAGCACACGTCTGAACTCC  107254  0.3436078115401466  Illumina Multiplexing PCR Primer 2.01 (100% over 28bp)
CGCGACCTCAGATCAGACGAGATCGGAAGAGCACACGTCTGAACTCCAGT  101379  0.32478617418584405 Illumina Multiplexing PCR Primer 2.01 (100% over 31bp)
GGCTGGTCCGATGGTAGTGGGTTATCAGAACTTATTAGATCGGAAGAGCA  101164  0.3240973823507504  No Hit
AAACCGTTACCATTACTGAGTAGATCGGAAGAGCACACGTCTGAACTCCA  97457   0.31222132964055477 Illumina Multiplexing PCR Primer 2.01 (100% over 29bp)
TGAGATGAAGCACTGTAGCTCAGATCGGAAGAGCACACGTCTGAACTCCA  90148   0.28880561093032553 Illumina Multiplexing PCR Primer 2.01 (100% over 29bp)
TGTAAACATCCCCGACTGGAAGCAGATCGGAAGAGCACACGTCTGAACTC  88912   0.28484585879927565 Illumina Multiplexing PCR Primer 2.01 (100% over 27bp)
TAGCTTATCAGACTGATGTTGACAAGATCGGAAGAGCACACGTCTGAACT  88543   0.28366369978927775 Illumina Multiplexing PCR Primer 2.01 (100% over 26bp)
CGCGACCTCAGATCAGACGTGGCGACCCGCTGAATTTAAGATCGGAAGAG  88388   0.28316712893141954 No Hit
CATTGCACTTGTCTCGGTCTGAAGATCGGAAGAGCACACGTCTGAACTCC  85185   0.27290573243000155 Illumina Multiplexing PCR Primer 2.01 (100% over 28bp)
TAGCTTATCAGACTGATGTTGACTAGATCGGAAGAGCACACGTCTGAACT  82921   0.265652594222318   Illumina Multiplexing PCR Primer 2.01 (100% over 26bp)
GCTGAAATCCAGAGGCTGTTTCTGAGCAGATCGGAAGAGCACACGTCTGA  81671   0.26164799052991317 Illumina Multiplexing PCR Primer 2.01 (100% over 23bp)
CTTTGGTGACTCTAGATAACCTCGGGCCGATCGCACAGATCGGAAGAGCA  80727   0.25862371382140914 No Hit
TGTCGGTGCTGAAATCCAGAGGCTGTTTCTGAGCAGATCGGAAGAGCACA  77418   0.2480227269268751  No Hit
TGGAAAACTAATGACTGAGCACAAGATCGGAAGAGCACACGTCTGAACTC  76233   0.2442263626264754  Illumina Multiplexing PCR Primer 2.01 (100% over 27bp)
GGCTGGTCCGATGGTAGTGGGTTATCAGAACTTATTAAGATCGGAAGAGC  74943   0.24009361161591364 No Hit
GGGACTGACCTGAAATGAAGAGAATACTCATTGCTGATCAGATCGGAAGA  74327   0.2381201429162966  No Hit
TCACAAAGATGAGTGGTGAAAATCTGATCAGATCGGAAGAGCACACGTCT  65230   0.2089762390844515  Illumina Multiplexing PCR Primer 2.01 (100% over 21bp)
TGAGGTAGTAGGTTGTATAGTTAGATCGGAAGAGCACACGTCTGAACTCC  63913   0.2047569886341338  Illumina Multiplexing PCR Primer 2.01 (100% over 28bp)
TTCACTGATGAGAGCATTGTTCTGAGCAGATCGGAAGAGCACACGTCTGA  63767   0.2042892509228609  Illumina Multiplexing PCR Primer 2.01 (100% over 23bp)
CGACTCTTAGCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGAT  62624   0.20062744130652596 TruSeq Adapter, Index 7 (100% over 38bp)
AGAACGTGTGGAAAACTAATGACTGAGCACAAGATCGGAAGAGCACACGT  62292   0.19956381856582328 No Hit
ACTGGACTTGGAGTCAGAAGGCAGATCGGAAGAGCACACGTCTGAACTCC  61555   0.19720270422878142 Illumina Multiplexing PCR Primer 2.01 (100% over 28bp)
AGAACGTGTGGAAAACTAATGACTGAGCAGATCGGAAGAGCACACGTCTG  58242   0.18658890260243177 Illumina Multiplexing PCR Primer 2.01 (100% over 22bp)
GGCTGGTCCGAAGGTAGTGAGTTATCTCAATTAGATCGGAAGAGCACACG  57667   0.18474678490392554 No Hit
TGTAAACATCCCCGACTGGAAGCTAGATCGGAAGAGCACACGTCTGAACT  56233   0.18015270354799878 Illumina Multiplexing PCR Primer 2.01 (100% over 26bp)
CGCGACCTCAGATCAGACGTGGCGACCCGCTGAATTTAGATCGGAAGAGC  55159   0.1767119480554846  No Hit
CGCGACCTCAGATCAGACGTGGCGACCCGCTGAATTTAAAGATCGGAAGA  54608   0.17494671874787254 No Hit
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGATCATCTCGTATG  52704   0.1688469064036016  TruSeq Adapter, Index 7 (100% over 49bp)
GGCTGGTCCGAAGGTAGTGAGTTATCTCAATAGATCGGAAGAGCACACGT  50353   0.16131504777892666 No Hit
TGGAGGTGATGAACTGTCTGAGCCTGACCAGATCGGAAGAGCACACGTCT  48586   0.15565413999934324 Illumina Multiplexing PCR Primer 2.01 (100% over 21bp)
CGCAGTGATGACCCTCATCTATCACCCTTGACTGATGAGATCGGAAGAGC  48480   0.15531454960622731 No Hit
AGCAGCATTGTACAGGGCTATGAAGATCGGAAGAGCACACGTCTGAACTC  47484   0.15212368138411916 Illumina Multiplexing PCR Primer 2.01 (100% over 27bp)
CTTCGTGATCGATGTGGTGACGTCGTGCTCTAGATCGGAAGAGCACACGT  46765   0.14982023334024797 No Hit
TGAGGTAGTAGTTTGTGCTGTTAAGATCGGAAGAGCACACGTCTGAACTC  46619   0.14935249562897507 Illumina Multiplexing PCR Primer 2.01 (100% over 27bp)
TGAGAACTGAATTCCATAGGCTGAGATCGGAAGAGCACACGTCTGAACTC  45896   0.14703623285328815 Illumina Multiplexing PCR Primer 2.01 (100% over 27bp)
TTACTTGATGACAATAAAATATCTGATAGATCGGAAGAGCACACGTCTGA  45190   0.14477443268781792 Illumina Multiplexing PCR Primer 2.01 (100% over 23bp)
CTCGCTGCGATCTATTGAAAGTCAGCCCTCGACACAAGGGTTTGTAGATC  44862   0.1437236246789309  No Hit
TGAGGTAGTAGATTGTATAGTAGATCGGAAGAGCACACGTCTGAACTCCA  44712   0.14324307223584232 Illumina Multiplexing PCR Primer 2.01 (100% over 29bp)
AGAACGTGTGGAAAACTAATGACTGAGCAAGATCGGAAGAGCACACGTCT  43747   0.14015151818530583 Illumina Multiplexing PCR Primer 2.01 (100% over 21bp)
ACGCGACCTCAGATCAGACGTGGCGACCCGCTGAAAGATCGGAAGAGCAC  42260   0.1353876416328211  No Hit
CTCACTGATTACTTGATGACAATAAAATATCTGATAAGATCGGAAGAGCA  42158   0.13506086597152087 No Hit
TGAGGTAGTAGTTTGTACAGTAGATCGGAAGAGCACACGTCTGAACTCCA  40252   0.12895464626134204 Illumina Multiplexing PCR Primer 2.01 (100% over 29bp)
TCTCCCAACCCTTGTACCAGTGAGATCGGAAGAGCACACGTCTGAACTCC  38868   0.12452074905311146 Illumina Multiplexing PCR Primer 2.01 (100% over 28bp)
GTTTCCGTAGTGTAGTGGTTATCAGATCGGAAGAGCACACGTCTGAACTC  38804   0.12431571334406033 Illumina Multiplexing PCR Primer 2.01 (100% over 27bp)
GGCTGGTCCGATGGTAGTGGGTTATCAGAACTTAAGATCGGAAGAGCACA  38735   0.12409465922023957 No Hit
CTGAAATCCAGAGGCTGTTTCTGAGCAGATCGGAAGAGCACACGTCTGAA  38017   0.12179441485932227 Illumina Multiplexing PCR Primer 2.01 (100% over 24bp)
TCCCTGGTGGTCTAGTGGTTAGGATTCGGAGATCGGAAGAGCACACGTCT  37546   0.12028548018802417 Illumina Multiplexing PCR Primer 2.01 (100% over 21bp)
ATTACTTGATGACAATAAAATATCTGATAAGATCGGAAGAGCACACGTCT  35760   0.11456370243231619 Illumina Multiplexing PCR Primer 2.01 (100% over 21bp)
CGCGACCTCAGATCAGACAGATCGGAAGAGCACACGTCTGAACTCCAGTC  35225   0.11284973205196694 Illumina Multiplexing PCR Primer 2.01 (100% over 32bp)
CATTCTGAAAGAACGTGTGGAAAACTAATGACTGAGCAGATCGGAAGAGC  33657   0.10782635718021437 No Hit
GTGGAAAACTAATGACTGAGCACAAGATCGGAAGAGCACACGTCTGAACT  33388   0.10696456646560885 Illumina Multiplexing PCR Primer 2.01 (100% over 26bp)
TATCTGTGATGATCTTATCCCGAACCTGAACTTCTGTTGAAAAAAAAAAA  32037   0.10263639079485776 No Hit
TCCCTGGTGGTCTAGTGGTTAGGATTCGGCGCTAGATCGGAAGAGCACAC  31825   0.10195721000862593 No Hit
CCGTGATCGTATAGTGGTTAGTACTCTGCAGATCGGAAGAGCACACGTCT  31510   0.1009480498781399  Illumina Multiplexing PCR Primer 2.01 (100% over 21bp)
ACGCGACCTCAGATCAGACGTGGCGACCCGCTGAAGATCGGAAGAGCACA  31503   0.10092562409746245 No Hit

enter image description here

my question is based on the fastqc report is it necessary to trim reads or not?

if your answer is yes what is the best tool or command for that??

transcriptome Fastqc miRNA-seq differential-expression-analysis RNA-seq • 539 views
ADD COMMENT
2
Entering edit mode
4 months ago
GenoMax 142k

It is best practice to trim reads so that adapter (this is kit specific or you can probably figure out what it may be by looking at the common sequence after 21-26 bp in read, see an example here: How to trim miRNA reads? ) is removed. Aligners may soft-clip but you want to be certain. Your data looks reasonable (not sure why it was sequenced to 150 cycles) but you can see the read-through going into Illumina adapters on 3'-end after 20-ish bp. You may also have non-coding long RNA based on that second curve.

You can trim the data using any trimming program (bbduk.sh, fastp etc) and then align the resulting reads with bowtie v.1.x or any other aligner that does ungapped alignments (miRNA are small). Then go ahead and count before going into DE analysis.

I am sure there are dedicated workflows that can do something similar and you could follow one of those.

ADD COMMENT
0
Entering edit mode

thank you Just where can I get the adapter sequence? Because the trim software need the adaptor sequence as a parameter?

ADD REPLY
1
Entering edit mode

If you know which kit was used then from there, using a program like fastp or otherwise see this link --> How to trim miRNA reads? to guess the adapter.

ADD REPLY
0
Entering edit mode

The fastp with the below command removes the adaptors automatically from PE reads

fastp -c --detect_adapter_for_pe -i SRR_1.fastq -I SRR_2.fastq -o out.R1.fastq -O out.R2.fastq


fastp version:  0.23.4 (https://github.com/OpenGene/fastp)
sequencing: paired end (150 cycles + 150 cycles)
mean length before filtering:   150bp, 150bp
mean length after filtering:    30bp, 30bp
duplication rate:   67.302975%
Insert size peak:   32
Detected read1 adapter: AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
Detected read2 adapter: GATCGTCGGACTGTAGAACTCTGAACGTGTAGA
Before filtering
total reads:    62.428150 M
total bases:    9.364223 G
Q20 bases:  8.828036 G (94.274096%)
Q30 bases:  8.318279 G (88.830428%)
GC content: 67.499669%
After filtering
total reads:    61.659588 M
total bases:    1.883521 G
Q20 bases:  1.862841 G (98.902059%)
Q30 bases:  1.811489 G (96.175644%)
GC content: 53.303135%
Filtering result
reads passed filters:   61.659588 M (98.768886%)
reads corrected:    434.773000 K (0.696437%)
bases corrected:    812.517000 K (0.008677%)
reads with low quality: 145.824000 K (0.233587%)
reads with too many N:  1.636000 K (0.002621%)
reads too short:    621.102000 K (0.994907%)
ADD REPLY

Login before adding your answer.

Traffic: 1614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6