I’d really appreciate help with an issue in creating fastq files after RNA-Seq run using mcSCRB-seq protocol (which uses UMIs) on Illumina NextSeq.
Here is the head of the RunInfo.xml file:
<RunInfo xmlns:xsd="<a href=" http:="" www.w3.org="" 2001="" XMLSchema"="" rel="nofollow">http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Version="4">
<Run Id="191126_NS500640_0561_AHM5CGBGXC" Number="561">
<Flowcell>HM5CGBGXC</Flowcell>
<Instrument>NS500640</Instrument>
<Date>191126</Date>
<Reads>
<Read Number="1" NumCycles="16" IsIndexedRead="N"/>
<Read Number="2" NumCycles="8" IsIndexedRead="Y"/>
<Read Number="3" NumCycles="66" IsIndexedRead="N"/>
The way I run bcl2fastq is:
bsub -q new-all.q -n 10 -R 'rusage[mem=10000] span[hosts=1]' bcl2fastq -R [my input folder] --output-dir [my output folder]-p 40 --no-lane-splitting --mask-short-adapter-reads 6 --barcode-mismatches 1 --minimum-trimmed-read-length 16 --create-fastq-for-index-reads --use-bases-mask y16,i8,y66
My forward reads look fine (header):
@NS500640:561:HM5CGBGXC:1:11101:22084:1067 1:N:0:0
CTAGCNAGGGATCCGG
+
6AAA/#EEEE/EAEEA
@NS500640:561:HM5CGBGXC:1:11101:20572:1067 1:N:0:0
CCAACNAACGATGTTG
+
AAA6A#6EEAE6EE/E
@NS500640:561:HM5CGBGXC:1:11101:18984:1067 1:N:0:0
ATGAGNGCCATGGTGC
+
AAAAA#EEEAEEEEEE
@NS500640:561:HM5CGBGXC:1:11101:23391:1067 1:N:0:0
CGGAANTCAATGTTAA
Also, my barcodes file:
@NS500640:561:HM5CGBGXC:1:11101:22084:1067 1:N:0:0
GGCGCTAC
+
/A/AAA/A
@NS500640:561:HM5CGBGXC:1:11101:20572:1067 1:N:0:0
ACTCGCTA
+
6A6AAEEE
@NS500640:561:HM5CGBGXC:1:11101:18984:1067 1:N:0:0
ACTCGCTA
+
AAAAAEEE
@NS500640:561:HM5CGBGXC:1:11101:23391:1067 1:N:0:0
GGAGCTAC
However, I have this strange sequence of N’s in all of my reverse reads:
@NS500640:561:HM5CGBGXC:1:11101:22084:1067 2:N:0:0
CTCTNNNNNNNNNNNNNNNNNNNNNNNNNAAAAAGCTAAGCAGGTGTTGAAAATCATAGCCAGCTA
+
AAAA#########################AEEE//EEEEEEA/AE/6EEA/EEA/A<<AEE/<AEE
@NS500640:561:HM5CGBGXC:1:11101:20572:1067 2:N:0:0
AGTANNNNNNNNNNNNNNNNNNNNNNNNNTAATCCACACACCAAAAAGGACGATCCTGAACCCTAA
+
A/A/#########################E6<EAEEE/EE/<EEEE<///EE/AEE/<//EE/AEE
@NS500640:561:HM5CGBGXC:1:11101:18984:1067 2:N:0:0
GGAANNNNNNNNNNNNNNNNNNNNNNNNNAGACAAGCAGGAAGCTGCTCCTGCCCAGAAACCGGTG
+
AAAA#########################/EEEEEEEEAAEEEEEEEEEEEEEEEEE/EEEE<E/E
@NS500640:561:HM5CGBGXC:1:11101:23391:1067 2:N:0:0
TGTANNNNNNNNNNNNNNNNNNNNNNNNNTGCTGCCTGCAGTACCGGCTGGCCATTTGTGAATTTT
+
A/</#########################//AE<A6E<AE/EE/<//<E//<<AE/<<</EEEA6A
@NS500640:561:HM5CGBGXC:1:11101:24930:1067 2:N:0:0
CATTNNNNNNNNNNNNNNNNNNNNNNNNNAGCTCCAGCGGTAGCCCTGGTTCCTGCCCCAAAGGTG
+
AAA/#########################E/EEEEEEEEAEEEEEEAAEEEEEEEEEEEEEE<EEE
@NS500640:561:HM5CGBGXC:1:11101:24231:1067 2:N:0:0
GGAGNNNNNNNNNNNNNNNNNNNNNNNNNCAGGGAAGAGGCGGAGCAGGGGGTCGGGGGGAGACAG
Does anybody have a clue?
Thanks in advance!
A small educational note: I added (code) markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:
Thanks, good to know.
Thanks for the quick response. Yes, this is a NextSeq run. Actually, I didn’t notice the “N” in Read 1. A single “N” is actually present in each sequence somewhere in the middle.
The most important point is: you are convinced that this is a purely technical issue concerning the sequencer, and not a problem which can solved by tweaking the bcl2fastq parameters? (I already made several attempts unsuccessfully.)
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized. This comment should go below my answer.SUBMIT ANSWER
is for new answers to original question.As for your question: Yes. Check with support first.
Somewhere should also not be acceptable but at a specific cycle in each read is definitely a problem.
I see. I suppose my idea is not kosher, but I’d still would like to know your opinion: For the time being, is there a way to excise those N’s and work with the remaining data if read 1 and read 2 still overlap? Or is this approach totally biased?
If all of your Read 2's have those N's then you could hard trim left part of that read by doing something like
forcetrimleft=29
withbbduk.sh
from BBMap suite. A guide forbbduk.sh
is here. I am not familiar withmcSCRB-seq
(is this some kind of single cell protocol) but if the remaining read has enough information you could use, then there is no harm in trying it out.