Hi All,
Background: I have completed adapter trimming and checked QC on Illumina NextSeq miRNA single end reads of length 75bp. I want to run umi_tools to extract the UMI information before I align the reads to the reference. UMI extraction failed and then I was advised to run UMI extraction first and then do adapter trimming. It worked fine!
Current scenario: If I followed UMI extraction first and then adapter trimming later, there is a weird long trail of N's (approx 20 N's) towards all the read ends. So to see if something has changed drastically due to UMI extraction, I did adapter trimming in a more effective manner and then did UMI extraction.
Cutadapt command:
cutadapt -a AACTGTAGGCACCATCAAT -g GTTCAGAGTTCTACAGTCCGACGATC --discard-untrimmed --minimum-length=15 -a AGATCGGAAGAG -e 0.1 -o XYZ-3adapters-trimmed.fastq.gz ../XYZ.fastq
UMI extraction command:
umi_tools extract --stdin=XYZ-3adapters-trimmed.fastq --bc-pattern=NNNNNNNNNNNN -L XYZ-UMIextract.log --stdout=XYZ-3adapters-trimmed-UMIextracted.fastq
Now following is my situation in question:
Situation 1: PhiX contamination (aligned reads) of only adapter trimmed reads using Bowtie1 = 0.03%. Average read length here: 23 bp
Situation 2: PhiX contamination (aligned reads) after UMI extraction on adapter trimmed reads using Bowtie1 = 78.72% (mindbogglingly high). Average read length here: 11 bp
Now I do not see too many trailing N's in my data but this PhiX contamination level is bothering me. Please help me understand this behaviour. Due to UMI extraction I know the reads have become shorter but this is too drastic and tells me something is not right with UMI extraction. I confirmed with the wet lab technician, they said 10% PhiX was spiked-in as expected.
Thank you, Rituriya.