Hi, I am using COMPSRA tools for small RNASeq data analysis. One of the samples I am using is SRR14272450. It is a single-end read generated using 1 x 50 bp chemistry of Illumina HiSeq 2500. After downloading and the SRA file and splitting to obtain the fastq file, I performed FastQC for quality analysis and got the adapter content as follows:
I used COMPSRA tools to remove adapters, using the following code as suggested by the COMPSRA manual:
java -jar COMPSRA.jar -ref hg38 -qc -ra TGGAATTCTCGGGTGCCAAGG -rb 4 -rh 20 -rt 20 -rr 20 -rlh 8,17 -in SRR14272450_1.fastq -out SRR14272450_qc
This generates the following files:
According to the manual, we should be using the file 17to50_FitReads file for further analysis. But I went ahead to check the post-trimming FastQC and saw the following result:
As you can see, adapters have not been completely removed, and for the life of me I cannot figure it out. Someone, please guide me on how to proceed. I tried fastp, sickle, cutadapt, with similar or worse results.
Since you have tried different tools with similar results you may as well go ahead and do the analysis.
If you are referring to this pipeline then it seems to be using
STAR
for analysis which should be able to soft-clip the adapters. With public data you always run the risk of not fully understanding how the sample was prepared. You may need to spend some time on the data to see if you can figure the data out.Yes, that is the pipeline I am following. And thank you, your response helps!