I downloaded some sra datasets using ascp as recommended (https://www.ncbi.nlm.nih.gov/books/NBK158899/) and additionally used fastq-dump on the downloaded sra-files
fastq-dump --gzip --split-files file.sra
and got two files (file_1.fastq.gz, file_2.fastq.gz), each 7.4 GB, as output.
First thing I did after this, was to check read quality with fastqc:
As you can see, the whiskers go down to ~15, so I wanted to discard the low quality reads, using trimmomatics. Actually, I was not sure whether the adapters were already removed from the reads or not, so I just added the standard ILLUMINACLIP and other options recommened to use:
java -jar trimmomatic-0.36.jar PE -phred33 ../file_1.fastq.gz ../file_2.fastq.gz ../file_1_clean.fastq.gz ../file_2_clean.fastq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:18 MINLEN:50 Multiple cores found: Using 4 threads Input Read Pairs: 73338588 Both Surviving: 53742129 (73,28%) Forward Only Surviving: 4076266 (5,56%) Reverse Only Surviving: 7872077 (10,73%) Dropped: 7648116 (10,43%) TrimmomaticPE: Completed successfully
finally I got the "clean" reads: file_1_clean.fastq.gz (4.8 GB) file_2_clean.fastq.gz (0.4 GB)
I don't know why there is this huge difference. Is it possible, that the second read pair is that bad? After this, I checked quality of reads again with fastqc. file_1_clean.fastq.gz looks ok, I think but file_2_clean.fastq.gz looks really strange and not really "clean".
Does anyone know what happend here?
Thanks in advance!