I have several sequencing files of Illumina paired-end reads resulted from NEBNex kit (Prep Master Mix Set for Illumina, E6040, BioLabs) and sequencing by HiSeq 2000. Based on FastQC analysis, for all samples, the length of one set read (from paired-end) is 100bp and the length of the second read is 80 bp. I'll glad if you please let me know why the length of two set reads, corresponding to paired-end reads, are different? Is it normal or there is something wrong?
Anyway, for filtering and adapter trimming, I used bbduk from bbmap package (version 37.17) with the following command:
./bbduk.sh in=file_1.fastq in2=file_2.fastq out=out1.fastq out2=out2.fastq ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=rl trimq=20 ftl=20 ftr=90 minlen=40
Based on re-checking the quality of generated output by FastQC, It sounds that everything OK except for "per base sequence content" and "sequence length distribution". Please see the attached images. Even with removing the first and end bases, the "per base sequence content" still failed Image 1. The sequence length changed from 100 bp to the range of sequences with 41-70 bp in length Image 2. Please kindly tell me what's wrong with my command and how to solve it?
Also, 40% of bases removed after trimming and the read length reduced, which is not my desired. Could you please advise me how to keep more read as possible as for a successful downstream analysis?
Thanks in advance