Hi, everyone
I have libraries of microRNAs, I do the trimming with cutadapt but It report different results
This is the raw data using FastQC
Filename SRR837437.fastq.gz
File type Conventional base calls
Encoding Sanger / Illumina 1.9
Total Sequences 17830678
Filtered Sequences 0
Sequence length 50
%GC 57
Basic statistics are ok, per base sequence quality ok, per sequence quality scores are ok, per base sequence content not ok, per base GC not ok, per base N content ok, Sequence Length Distribution ok, Sequence Duplication Levels not ok, overrepresented sequences
TCCTGTACTGAGCTGCCCCGAGTGGAATTCTCGGGTGCCAAGGAACTCCA 6084670 34.124725935828124 RNA PCR Primer, Index 1 (100% over 28bp)
TCCTGTACTGAGCTGCCCCGATGGAATTCTCGGGTGCCAAGGAACTCCAG 5105541 28.63346531186307 RNA PCR Primer, Index 1 (100% over 29bp)
TCCTGTACTGAGCTGCCCCGAGATGGAATTCTCGGGTGCCAAGGAACTCC 1847951 10.363885209524842 RNA PCR Primer, Index 1 (100% over 27bp)
TCCTGTACTGAGCTGCCCCGTGGAATTCTCGGGTGCCAAGGAACTCCAGT 509674 2.858410656061424 RNA PCR Primer, Index 1 (100% over 30bp)
TCCTGTACTGAGCTGCCCCGAGTTGGAATTCTCGGGTGCCAAGGAACTCC 484975 2.719890965447304 RNA PCR Primer, Index 1 (100% over 27bp)
TATTGCACTTGTCCCGGCCTGTTGGAATTCTCGGGTGCCAAGGAACTCCA 143730 0.806082640267521 RNA PCR Primer, Index 1 (100% over 28bp)
TCCTGTACTGAGCTGCCCCGATGGAATTCTCGGGGGCCAAGGAACTCCAG 121167 0.6795423034390503 RNA PCR Primer, Index 1 (96% over 29bp)
TATTGCACTTGTCCCGGCCTGTGGAATTCTCGGGTGCCAAGGAACTCCAG 114725 0.6434135594843898 RNA PCR Primer, Index 1 (100% over 29bp)
TCCTGTACTGAGCTGCCCCGAGGGGAATTCTCGGGTGCCAAGGAACTCCA 112651 0.631781921023979 RNA PCR Primer, Index 1 (96% over 28bp)
TCCTGTACTGAGCTGCCCCGAATGGAATTCTCGGGTGCCAAGGAACTCCA 111827 0.6271606721853201 RNA PCR Primer, Index 1 (100% over 28bp)
TCCTGTACTGAGCTGCCCCGAGGGGAATTCTCGGGGGCCAAGGAACTCCA 102622 0.5755361630107392 No Hit
TCCTGTACTGAGCTGCCCCGAGTGGAATTCTCGGGGGCCAAGGAACTCCA 83983 0.4710028412828722 RNA PCR Primer, Index 1 (96% over 28bp)
TCCTGTACTGAGCTGCCCCGGTGGAATTCTCGGGTGCCAAGGAACTCCAG 68349 0.3833224962056967 RNA PCR Primer, Index 1 (100% over 29bp)
TATTGCACTTGTCCCGGCCTGTATGGAATTCTCGGGTGCCAAGGAACTCC 52227 0.29290529502018936 RNA PCR Primer, Index 1 (100% over 27bp)
TCCTGTACTGAGCTGCCCCGAGATGGAATTCTCGGGGGCCAAGGAACTCC 50813 0.28497514228006365 RNA PCR Primer, Index 1 (96% over 27bp)
TATTGCACTTGTCCCGGCCTGATGGAATTCTCGGGTGCCAAGGAACTCCA 41374 0.2320382881682906 RNA PCR Primer, Index 1 (100% over 28bp)
CAACGGAATCCCAAAAGCAGCTGTGGAATTCTCGGGTGCCAAGGAACTCC 35506 0.19912871512793848 RNA PCR Primer, Index 1 (100% over 27bp)
TCCTGTACTGAGCTGCCCCGAGAATGGAATTCTCGGGTGCCAAGGAACTC 27244 0.1527928438840071 RNA PCR Primer, Index 1 (100% over 26bp)
TCCTGTACTGAGCTGCCCCGATTGGAATTCTCGGGTGCCAAGGAACTCCA 26773 0.150151329074531 RNA PCR Primer, Index 1 (100% over 28bp)
TATTGCACTTGTCCCGGCCTGTTTGGAATTCTCGGGTGCCAAGGAACTCC 23100 0.12955200020997518 RNA PCR Primer, Index 1 (100% over 27bp)
TCCTGTACTGAGCTGCCCTGGAATTCTCGGGTGCCAAGGAACTCCAGTCA 20684 0.11600231914905311 RNA PCR Primer, Index 1 (100% over 32bp)
TCCTGTACTGAGCTGCCCCGAGATTGGAATTCTCGGGTGCCAAGGAACTC 20448 0.11467875758846635 RNA PCR Primer, Index 1 (100% over 26bp)
AAACCGTTACCATTACTGAGTTGGAATTCTCGGGTGCCAAGGAACTCCAG 19780 0.11093240537460214 RNA PCR Primer, Index 1 (100% over 29bp)
TTCCTGTACTGAGCTGCCCCGATGGAATTCTCGGGTGCCAAGGAACTCCA 19030 0.10672617160155098 RNA PCR Primer, Index 1 (100% over 28bp)
CCACGTTCCCGTGGTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACCGA 18687 0.10480252068934227 RNA PCR Primer, Index 2 (100% over 36bp)
TCCTGTACTGAGCTGCCCCGGGTGGAATTCTCGGGTGCCAAGGAACTCCA 18033 0.10113468483924166 RNA PCR Primer, Index 1 (100% over 28bp)
I run cutadapt
cutadapt -a GGAATTCTCGGGTGCCAAGG SRR837437.fastq.gz -m 17 --length-tag 'length=' -o SRR837437.cutadapt1.fastq
The basis statistic are
Filename SRR837437.cutadapt1.fastq
File type Conventional base calls
Encoding Sanger / Illumina 1.9
Total Sequences 17698809
Filtered Sequences 0
Sequence length 17-50
%GC 59
change Per base sequence quality to not ok
and the overrepresented sequences are
TCCTGTACTGAGCTGCCCCGAGT 6470528 36.55911536194328 No Hit
TCCTGTACTGAGCTGCCCCGAT 5532068 31.25672467565473 No Hit
TCCTGTACTGAGCTGCCCCGAGAT 2005215 11.329660656827247 No Hit
TCCTGTACTGAGCTGCCCCGT 570466 3.2231886337662607 No Hit
TCCTGTACTGAGCTGCCCCGAGTT 530257 2.996003855400666 No Hit
TCCTGTACTGAGCTGCCCCGAGG 317138 1.7918606839590168 No Hit
TATTGCACTTGTCCCGGCCTGTT 151241 0.8545264260436959 No Hit
TCCTGTACTGAGCTGCCCCGAAT 121369 0.6857467075891943 No Hit
TATTGCACTTGTCCCGGCCTGT 120442 0.6805090670225323 No Hit
TCCTGTACTGAGCTGCCCCGGT 72406 0.4091009739694914 No Hit
TATTGCACTTGTCCCGGCCTGTAT 54738 0.30927504782948956 No Hit
TATTGCACTTGTCCCGGCCTGAT 43588 0.24627645848938196 No Hit
CAACGGAATCCCAAAAGCAGCTGT 37011 0.20911576592526648 No Hit
TCCTGTACTGAGCTGCCCCGAGAAT 30626 0.1730398921192946 No Hit
TTCCTGTACTGAGCTGCCCCGAT 29613 0.1673163431505476 No Hit
TCCTGTACTGAGCTGCCCCGATT 29239 0.1652032066112471 No Hit
TCCTGTACTGAGCTGCCCCGAGAG 28483 0.16093173275105688 No Hit
TATTGCACTTGTCCCGGCCTGTTT 24326 0.13744427661770914 No Hit
TCCTGTACTGAGCTGCCCCGAG 23792 0.13442712444662236 No Hit
TCCTGTACTGAGCTGCCCT 22984 0.12986184550610155 No Hit
TCCTGTACTGAGCTGCCCCGAGATT 22301 0.1260028287779138 No Hit
TTCCTGTACTGAGCTGCCCCGAGT 21007 0.11869160235584214 No Hit
TCCTGTACTGAGCTGCCCCGAGGT 20634 0.11658411591424034 No Hit
AAACCGTTACCATTACTGAGTT 20628 0.11655021532804835 No Hit
TCCTGTACTGAGCTGCCCCGGGT 19676 0.11117132231891988 No Hit
TCCTGTACTGAGCTGCCCCT 19513 0.11025035639403759 No Hit
TCCTGTACTGAGCTGCCCCGAGA 19409 0.1096627462333765 No Hit
because I have several file I use
for i in *fastq.gz;
do
echo $i;
cutadapt -a TGGAATTCTCGGGTGCCAAGG -m 17 -o ${i/.fastq/}.cutadapt.fastq --length-tag 'length=' $i;
done
and show the basic statistics
Filename SRR837437.gz.cutadapt.fastq
File type Conventional base calls
Encoding Sanger / Illumina 1.9
Total Sequences 17683138
Filtered Sequences 0
Sequence length 17-50
%GC 62
the show also change not ok of Per base sequence quality and the data before cutadapt was ok and the overrepresented sequences
TCCTGTACTGAGCTGCCCCGAG 6766671 38.26623419440599 No Hit
TCCTGTACTGAGCTGCCCCGA 5552096 31.3976851846092 No Hit
TCCTGTACTGAGCTGCCCCGAGA 2029103 11.474790277607967 No Hit
TCCTGTACTGAGCTGCCCCG 574490 3.2488012025919835 No Hit
TCCTGTACTGAGCTGCCCCGAGT 534294 3.0214886068298514 No Hit
TATTGCACTTGTCCCGGCCTGT 151437 0.8563921177338547 No Hit
TCCTGTACTGAGCTGCCCCGAA 121949 0.6896343850282682 No Hit
TATTGCACTTGTCCCGGCCTG 120821 0.6832554267234695 No Hit
TCCTGTACTGAGCTGCCCCGG 74513 0.4213788299339178 No Hit
TATTGCACTTGTCCCGGCCTGTA 54785 0.30981492085850376 No Hit
TATTGCACTTGTCCCGGCCTGA 43643 0.24680574228397698 No Hit
CAACGGAATCCCAAAAGCAGCTG 37189 0.21030769538754943 No Hit
TCCTGTACTGAGCTGCCCCGAGAA 30837 0.17438646918889622 No Hit
TCCTGTACTGAGCTGCCCCGAT 30562 0.17283131534685756 No Hit
TTCCTGTACTGAGCTGCCCCGA 30326 0.17149671059514437 No Hit
TTCCTGTACTGAGCTGCCCCGAG 27068 0.15307237889564623 No Hit
TATTGCACTTGTCCCGGCCTGTT 24360 0.13775835488022545 No Hit
TCCTGTACTGAGCTGCCC 23470 0.13272531153690031 No Hit
TCCTGTACTGAGCTGCCCCGAGG 22859 0.12927004245513438 No Hit
TCCTGTACTGAGCTGCCCCGAGAT 22565 0.12760744162037305 No Hit
TCCTGTACTGAGCTGCCCCGGG 21446 0.1212793792594957 No Hit
AAACCGTTACCATTACTGAGT 20642 0.11673267493586263 No Hit
TCCTGTACTGAGCTGCCCC 20219 0.11434056557156315 No Hit
Why it is different it is the same command, it is necessary to clean all this overrepresented sequences and why decrease the per base sequence quality?
Thanks for your help
Hi, Is it normal that after the trimming of adapter the quality per base sequence decrease? I use cutadapt before and it did not decrease the quality per base sequence.
Thanks you so much
Adriana