Question

"Unrecognized data type(Cbcl)" when demultiplexing with Picard

0

Entering edit mode

2.1 years ago

RNG_Daemon ▴ 20

I am demultiplexing a S4 sequencing run and I am running into the following eror:

    INFO 2022-04-08 17:31:08     ExtractIlluminaBarcodes Extracting barcodes for tile 2666
    INFO 2022-04-08 17:31:08     ExtractIlluminaBarcodes Extracting barcodes for tile 2674
    ERROR 2022-04-08 17:31:08     ExtractIlluminaBarcodes Error processing tile 2667
    picard.PicardException: Unrecognized data type(Cbcl) found by IlluminaDataProviderFactory!
     at picard.illumina.parser.IlluminaDataProviderFactory.makeParser(IlluminaDataProviderFactory.java:400)
     at picard.illumina.parser.IlluminaDataProviderFactory.makeDataProvider(IlluminaDataProviderFactory.java:249)
     at picard.illumina.parser.IlluminaDataProviderFactory.makeDataProvider(IlluminaDataProviderFactory.java:228)
     at picard.illumina.ExtractIlluminaBarcodes$PerTileBarcodeExtractor.run(ExtractIlluminaBarcodes.java:355)
     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
     at java.lang.Thread.run(Thread.java:748)

I checked the MD5sum of the raw data several times and I am not sure, where the error could be. This is the command I use:

picard -Xmx64g -Xms64g ExtractIlluminaBarcodes \
-B /data/BHX2/Data/Intensities/BaseCalls \
-L 1 \
--NUM_PROCESSORS 8 \
-M metrices/barcode_metrices1.txt \
-TMP_DIR /data/tmp \
-BARCODE_FILE /my_dir/barcode1.csv \
-RS 148T8B9M8B148T

I also run a check on the Basecall dir. My Picard version is 2.26.11 Here is the run Info:

<Read Number="1" NumCycles="148" IsIndexedRead="N"/>
<Read Number="2" NumCycles="17" IsIndexedRead="Y"/>
<Read Number="3" NumCycles="8" IsIndexedRead="Y"/>
<Read Number="4" NumCycles="148" IsIndexedRead="N"/>

It's dual index data with UMIs

EDIT: The first error is a "File not Found". I checked, the file does exist, though.-

picard.PicardException: File not found: (/data/BHX2/Data/Intensities/BaseCalls/L004/C275.1/L004_2.cbcl)
 at picard.illumina.parser.readers.BaseBclReader.open(BaseBclReader.java:93)
 at picard.illumina.parser.readers.CbclReader.readHeader(CbclReader.java:127)
 at picard.illumina.parser.readers.CbclReader.readTileData(CbclReader.java:200)                                                                                                                                                              at picard.illumina.parser.readers.CbclReader.advance(CbclReader.java:275)                                                                                                                                                                   at picard.illumina.parser.readers.CbclReader.hasNext(CbclReader.java:252)                                                                                                                                                                   at picard.illumina.parser.NewIlluminaDataProvider.hasNext(NewIlluminaDataProvider.java:125)
 at picard.illumina.ExtractIlluminaBarcodes$PerTileBarcodeExtractor.run(ExtractIlluminaBarcodes.java:363)                                                                                                                                    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: /data/BHX2/Data/Intensities/BaseCalls/L004/C275.1/L004_2.cbcl (Too many open files)
 at java.io.FileInputStream.open0(Native Method)                                                                                                                                                                                             at java.io.FileInputStream.open(FileInputStream.java:195)
 at java.io.FileInputStream.<init>(FileInputStream.java:138)
 at picard.illumina.parser.readers.BaseBclReader.open(BaseBclReader.java:90)

I also let the demultiplexing run single core. Same error

demultiplex RNA-Seq Picard • 1.4k views

ADD COMMENT • link 2.1 years ago by RNG_Daemon ▴ 20

1

Entering edit mode

I assume you don't have access to bcl-convert or bcl2fastq which is why you are using Picard? Process appears to be reading other cbcl files before it encounters this error.

ADD REPLY • link 2.1 years ago by GenoMax 142k

0

Entering edit mode

Yes, I have to use picard since it is part of our pipeline. You are right, there is another error. I edited the question. Thanks for pointing that out!

ADD REPLY • link 2.1 years ago by RNG_Daemon ▴ 20

1

Entering edit mode

(Too many open files)

You may need to check with your sys admins about this. While there are generic solutions http://www.mastertheboss.com/java/hot-to-solve-the-too-many-open-files-error-in-java-applications/ not sure if they apply in your case. Since you have tried to use a single core and still get that error.

ADD REPLY • link 2.1 years ago by GenoMax 142k

0

Entering edit mode

So the issue is really that Picard opens to many files. I monitored the open files of the process with lsof and it quickly exceeds 120000 files, which is the maximum that I can set with ulimit -n.

I also set the Picard parameter --MAX_RECORDS_IN_RAM 50000000, to limit the amount of files written, but to no avail.

ADD REPLY • link 2.1 years ago by RNG_Daemon ▴ 20

0

Entering edit mode

/data/temp is a real directory on a file system (from your command line)?

ADD REPLY • link 2.1 years ago by GenoMax 142k

0

Entering edit mode

The dir exists, I have write permisson and enough space. But I don't see actual files written there other than libgkl_compressionXXXX.so. The bash variable $TMPDIR also points there.

I added picard -Djava.io.tmpdir=/data/tmp/ to the system call. But still, I don't see any temporary files.

ADD REPLY • link 2.1 years ago by RNG_Daemon ▴ 20

0

Entering edit mode

So in the end, I "solved" it by not using Picard . Used bcl2fastq to seperate the samples into five different fasta files (Two for the reads, two for the sequence indexes, one for the UMI) and then putting them back together with fgbio FastqToBam. I then sorted the uBAM files with picard SortSam by "queryname".

From here, I could continue with my pipeline.

ADD REPLY • link 2.1 years ago by RNG_Daemon ▴ 20