Question

Issue with bcl2fastq and mcSCRB-seq protocol.

0

Entering edit mode

4.2 years ago

liwinskt • 0

I’d really appreciate help with an issue in creating fastq files after RNA-Seq run using mcSCRB-seq protocol (which uses UMIs) on Illumina NextSeq.

Here is the head of the RunInfo.xml file:


<RunInfo xmlns:xsd="&lt;a href=" http:="" www.w3.org="" 2001="" XMLSchema"="" rel="nofollow">http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Version="4">
  <Run Id="191126_NS500640_0561_AHM5CGBGXC" Number="561">
    <Flowcell>HM5CGBGXC</Flowcell>
    <Instrument>NS500640</Instrument>
    <Date>191126</Date>
    <Reads>
      <Read Number="1" NumCycles="16" IsIndexedRead="N"/>
      <Read Number="2" NumCycles="8" IsIndexedRead="Y"/>
      <Read Number="3" NumCycles="66" IsIndexedRead="N"/>

The way I run bcl2fastq is:

bsub -q new-all.q -n 10 -R 'rusage[mem=10000] span[hosts=1]' bcl2fastq -R [my input folder] --output-dir [my output folder]-p 40 --no-lane-splitting --mask-short-adapter-reads 6 --barcode-mismatches 1 --minimum-trimmed-read-length 16 --create-fastq-for-index-reads --use-bases-mask y16,i8,y66

My forward reads look fine (header):

@NS500640:561:HM5CGBGXC:1:11101:22084:1067 1:N:0:0
CTAGCNAGGGATCCGG
+
6AAA/#EEEE/EAEEA
@NS500640:561:HM5CGBGXC:1:11101:20572:1067 1:N:0:0
CCAACNAACGATGTTG
+
AAA6A#6EEAE6EE/E
@NS500640:561:HM5CGBGXC:1:11101:18984:1067 1:N:0:0
ATGAGNGCCATGGTGC
+
AAAAA#EEEAEEEEEE
@NS500640:561:HM5CGBGXC:1:11101:23391:1067 1:N:0:0
CGGAANTCAATGTTAA

Also, my barcodes file:

@NS500640:561:HM5CGBGXC:1:11101:22084:1067 1:N:0:0
GGCGCTAC
+
/A/AAA/A
@NS500640:561:HM5CGBGXC:1:11101:20572:1067 1:N:0:0
ACTCGCTA
+
6A6AAEEE
@NS500640:561:HM5CGBGXC:1:11101:18984:1067 1:N:0:0
ACTCGCTA
+
AAAAAEEE
@NS500640:561:HM5CGBGXC:1:11101:23391:1067 1:N:0:0
GGAGCTAC

However, I have this strange sequence of N’s in all of my reverse reads:

@NS500640:561:HM5CGBGXC:1:11101:22084:1067 2:N:0:0
CTCTNNNNNNNNNNNNNNNNNNNNNNNNNAAAAAGCTAAGCAGGTGTTGAAAATCATAGCCAGCTA
+
AAAA#########################AEEE//EEEEEEA/AE/6EEA/EEA/A<<AEE/<AEE
@NS500640:561:HM5CGBGXC:1:11101:20572:1067 2:N:0:0
AGTANNNNNNNNNNNNNNNNNNNNNNNNNTAATCCACACACCAAAAAGGACGATCCTGAACCCTAA
+
A/A/#########################E6<EAEEE/EE/<EEEE<///EE/AEE/<//EE/AEE
@NS500640:561:HM5CGBGXC:1:11101:18984:1067 2:N:0:0
GGAANNNNNNNNNNNNNNNNNNNNNNNNNAGACAAGCAGGAAGCTGCTCCTGCCCAGAAACCGGTG
+
AAAA#########################/EEEEEEEEAAEEEEEEEEEEEEEEEEE/EEEE<E/E
@NS500640:561:HM5CGBGXC:1:11101:23391:1067 2:N:0:0
TGTANNNNNNNNNNNNNNNNNNNNNNNNNTGCTGCCTGCAGTACCGGCTGGCCATTTGTGAATTTT
+
A/</#########################//AE<A6E<AE/EE/<//<E//<<AE/<<</EEEA6A
@NS500640:561:HM5CGBGXC:1:11101:24930:1067 2:N:0:0
CATTNNNNNNNNNNNNNNNNNNNNNNNNNAGCTCCAGCGGTAGCCCTGGTTCCTGCCCCAAAGGTG
+
AAA/#########################E/EEEEEEEEAEEEEEEAAEEEEEEEEEEEEEE<EEE
@NS500640:561:HM5CGBGXC:1:11101:24231:1067 2:N:0:0
GGAGNNNNNNNNNNNNNNNNNNNNNNNNNCAGGGAAGAGGCGGAGCAGGGGGTCGGGGGGAGACAG

Does anybody have a clue?

Thanks in advance!

RNA-Seq • 1.2k views

ADD COMMENT • link 4.2 years ago by liwinskt • 0

1

Entering edit mode

A small educational note: I added (code) markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY • link 4.2 years ago by lieven.sterck 15k

0

Entering edit mode

Thanks, good to know.

ADD REPLY • link 4.2 years ago by liwinskt • 0

0

Entering edit mode

Thanks for the quick response. Yes, this is a NextSeq run. Actually, I didn’t notice the “N” in Read 1. A single “N” is actually present in each sequence somewhere in the middle.

The most important point is: you are convinced that this is a purely technical issue concerning the sequencer, and not a problem which can solved by tweaking the bcl2fastq parameters? (I already made several attempts unsuccessfully.)

ADD REPLY • link 4.2 years ago by liwinskt • 0

1

Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. This comment should go below my answer.

SUBMIT ANSWER is for new answers to original question.

As for your question: Yes. Check with support first.

A single “N” is actually present in each sequence somewhere in the middle.

Somewhere should also not be acceptable but at a specific cycle in each read is definitely a problem.

ADD REPLY • link 4.2 years ago by GenoMax 141k

0

Entering edit mode

I see. I suppose my idea is not kosher, but I’d still would like to know your opinion: For the time being, is there a way to excise those N’s and work with the remaining data if read 1 and read 2 still overlap? Or is this approach totally biased?

ADD REPLY • link 4.2 years ago by liwinskt • 0

1

Entering edit mode

If all of your Read 2's have those N's then you could hard trim left part of that read by doing something like forcetrimleft=29 with bbduk.sh from BBMap suite. A guide for bbduk.sh is here. I am not familiar with mcSCRB-seq (is this some kind of single cell protocol) but if the remaining read has enough information you could use, then there is no harm in trying it out.

ADD REPLY • link 4.2 years ago by GenoMax 141k

score 1 · Answer 1 · 2020-02-06

Considering that you are able to demultiplex the data and other two reads look reasonably ok (see below) I am going to speculate two things:

Your run is borderline overloaded (this looks like a NextSeq run) and low diversity stretch of nucleotides is causing the basecaller issues leading to N calls.
There was some sort of a problem with the run (e.g. bubble in lanes) during Read 2 which led to the N calls.

In either case you should contact Illumina tech support for diagnostic help. They should be able to remotely look at this run (or will ask for some data) and decide if this was a sequencer problem.

I am concerned about Read 1

ATGAG**N**GCCATGGTGC

If that N is present in all reads then that is not good.