Question

empty fastq files created by docker bcl2fastq2 v2.20 OSX

0

Entering edit mode

3.6 years ago

akh22 ▴ 110

HI,

I installed a following docker image,

    REPOSITORY               TAG            IMAGE ID            CREATED             SIZE
zymoresearch/bcl2fastq   latest           037f216c2523        13 months ago        117MB

and run a following command;

docker run -d --name bcl2fastq -v /Volumes/Aura2/bcl_NU/170720_NB501488_0132_AH5V32BGX3:/mnt/run -v /Volumes/Aura2/output:/mnt/out zymoresearch/bcl2fastq:2.20 -R /mnt/run -o /mnt/out/Data/Intensities/BaseCalls/Alignment_1  --barcode-mismatches 0 --with-failed-reads --no-lane-splitting -p 24

This generates fastq files but they are all 0Ks. I looked at the log file but nothing stands out as a major error. Since I am really getting stuck trouble shooting this issue, I'd really appreciate any inputs and suggestions.

RNA-Seq Assembly sequence next-gen • 2.0k views

ADD COMMENT • link updated 2.4 years ago by Charles • 0 • written 3.6 years ago by akh22 ▴ 110

0

Entering edit mode

Are you not getting any fastq files? e.g. if the data is not being demultiplexed then you should get files called Undetermined * R1 *.fastq.gz and Undetermined * R2 *.fastq.gz.

You also don't seem to be providing a Samplesheet.csv file which is what denotes sample_ID --> Index associations. Without that there is no way to demux the data.

--with-failed-reads

That is a bad option to choose. Those reads fail initial filters for a reason and should be left alone.

ADD REPLY • link 3.6 years ago by GenoMax 141k

0

Entering edit mode

@genomax, It did generate all the fastqs for entires in the samplesheet.csv except one

I don't know where this Undetermined_S0_R1_001.fastq.gz came from, which was not in the samplesheet.csv. I forgot to mention that these are single reads.

Also, I did try run this without --with-failed-reads and --barcode-mismatches 0 but did not make any difference, and a following is the samplesheet.csv used in the run.

[Header]                                    
FileVersion 1                               
LibraryPrepKit  TruSeqHT                                
ContainerType   Plate96                             
ContainerID FoodAlelrgy_Skin_7/12/2017                              
Notes                                   

[Data]                                  
Sample_ID   Sample_Name Species Sample_Project  Sample_Well NucleicAcid i7_Index_ID index   i5_Index_ID index2
PI0001  1   Musmusculus mouse   A01 RNA D701    ATTACTCG    D501    TATAGCCT
PI0002  2   Musmusculus mouse   B01 RNA D701    ATTACTCG    D502    ATAGAGGC
PI0003  3   Musmusculus mouse   C01 RNA D701    ATTACTCG    D503    CCTATCCT
PI0004  4   Musmusculus mouse   D01 RNA D701    ATTACTCG    D504    GGCTCTGA
PI0005  8   Musmusculus mouse   E01 RNA D701    ATTACTCG    D505    AGGCGAAG
PI0006  12  Musmusculus mouse   F01 RNA D701    ATTACTCG    D506    TAATCTTA
PI0007  15  Musmusculus mouse   G01 RNA D701    ATTACTCG    D507    CAGGACGT
PI0008  16  Musmusculus mouse   H01 RNA D701    ATTACTCG    D508    GTACTGAC
PI0009  17  Musmusculus mouse   A02 RNA D702    TCCGGAGA    D501    TATAGCCT
PI0010  18  Musmusculus mouse   B02 RNA D702    TCCGGAGA    D502    ATAGAGGC
PI0011  20  Musmusculus mouse   C02 RNA D702    TCCGGAGA    D503    CCTATCCT
PI0012  22  Musmusculus mouse   D02 RNA D702    TCCGGAGA    D504    GGCTCTGA
PI0013  23  Musmusculus mouse   E02 RNA D702    TCCGGAGA    D505    AGGCGAAG
PI0014  26  Musmusculus mouse   F02 RNA D702    TCCGGAGA    D506    TAATCTTA
PI0015  28  Musmusculus mouse   G02 RNA D702    TCCGGAGA    D507    CAGGACGT
PI0016  29  Musmusculus mouse   H02 RNA D702    TCCGGAGA    D508    GTACTGAC
PI0017  30  Musmusculus mouse   A03 RNA D703    CGCTCATT    D501    TATAGCCT
PI0018  31  Musmusculus mouse   B03 RNA D703    CGCTCATT    D502    ATAGAGGC
PI0019  32  Musmusculus mouse   C03 RNA D703    CGCTCATT    D503    CCTATCCT
PI0020  33  Musmusculus mouse   D03 RNA D703    CGCTCATT    D504    GGCTCTGA
PI0021  34  Musmusculus mouse   E03 RNA D703    CGCTCATT    D505    AGGCGAAG
PI0022  35  Musmusculus mouse   F03 RNA D703    CGCTCATT    D506    TAATCTTA
PI0023  36  Musmusculus mouse   G03 RNA D703    CGCTCATT    D507    CAGGACGT
PI0024  38  Musmusculus mouse   H03 RNA D703    CGCTCATT    D508    GTACTGAC
PI0025  39  Musmusculus mouse   A04 RNA D704    TCTCGCGC    D501    TATAGCCT
PI0026  40  Musmusculus mouse   B04 RNA D704    AGCGATAG    D502    ATAGAGGC

ADD REPLY • link updated 3.6 years ago by GenoMax 141k • written 3.6 years ago by akh22 ▴ 110

0

Entering edit mode

It did generate all the fastqs for entires in the samplesheet.csv except one

Those are all zero byte files. Any reads that can't be classified using indexes provided in SampleSheet.csv are put into Undetermined* file. You can look in that file to see what indexes ended up there. Don't be surprised to see a smattering of indexes you don't expect. This is normal, as long as they are < 5% of reads. Use the code I have here: C: Demultiplexing reads with index present in the labels

I think easiest mistake is providing index sequence as reverse-complement. Once you compare results from my code and indexes you have, it should be easy to figure out.

I forgot to mention that these are single reads.

Why are you using a SampleSheet for 2D indexes then? The file does not follow Illumina's samplesheet format either.

ADD REPLY • link 3.6 years ago by GenoMax 141k

0

Entering edit mode

I wish I could use you script but the "Underdetermined" file is also 0K. Also, single reads were done with dual index. So the entry of the second index in the samplesheet is wrong in this case ?

ADD REPLY • link 3.6 years ago by akh22 ▴ 110

0

Entering edit mode

My apologies. Single reads with dual indexes are certainly fine. I should have considered that.

Note: Depending on which sequencer this run as done, you may need to reverse complement the second index.

Can you create your samplesheet in this format. Save as comma separated values (.csv) file.

[Header],,,,,,,,,
IEMFileVersion,4,,,,,,,,
Investigator Name,test,,,,,,,,
Experiment Name,EXPT_NAME,,,,,,,,
Date,9/16/2020,,,,,,,,
Application,FASTQ Only,,,,,,,,
[Reads],,,,,,,,,
[Settings],,,,,,,,,
[Data],,,,,,,,,
Lane,Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project,Description
1,Sample_1,My_Sample_1,,,D704,TCTCGCGC,D501,TATAGCCT,Project_NAME,
2,Sample_2,My_Sample_2,,,D704,TCTCGCGC,D502,ATAGAGGC,Project_NAME,

If you don't have multiple lanes then you will have entries for lane 1. If the pool ran on multiple lanes then you will need to create multiple lane entries (column 1).

Also make sure you run the bcl2fastq2 command with these options

bcl2fastq -R /path_to_flow_cell_folder -o /path_to_flow_cell_folder/Unaligned --sample-sheet SampleSheet.csv --barcode-mismatches 0 --no-lane-splitting

If you have more than more core available then you could also add -r N -p N -w N to the command. Three N's added up should not be more than cores you have available.

ADD REPLY • link 3.6 years ago by GenoMax 141k

0

Entering edit mode

I modified the samplesheet.csv as follows;

[Data]                                      
Lane    Sample_ID   Sample_Name Sample_Plate    Sample_Well i7_Index_ID index   i5_Index_ID index2  Sample_Project  Description
1   PI0001  Sample_1        A01 D701    ATTACTCG    D501    TATAGCCT    Mouse   
1   PI0002  Sample_2        B01 D701    ATTACTCG    D502    ATAGAGGC    Mouse   
1   PI0003  Sample_3        C01 D701    ATTACTCG    D503    CCTATCCT    Mouse   
1   PI0004  Sample_4        D01 D701    ATTACTCG    D504    GGCTCTGA    Mouse   
1   PI0005  Sample_5        E01 D701    ATTACTCG    D505    AGGCGAAG    Mouse   
1   PI0006  Sample_6        F01 D701    ATTACTCG    D506    TAATCTTA    Mouse   
1   PI0007  Sample_7        G01 D701    ATTACTCG    D507    CAGGACGT    Mouse   
1   PI0008  Sample_8        H01 D701    ATTACTCG    D508    GTACTGAC    Mouse   
1   PI0009  Sample_9        A02 D702    TCCGGAGA    D501    TATAGCCT    Mouse   
1   PI0010  Sample_10       B02 D702    TCCGGAGA    D502    ATAGAGGC    Mouse   
1   PI0011  Sample_11       C02 D702    TCCGGAGA    D503    CCTATCCT    Mouse   
1   PI0012  Sample_12       D02 D702    TCCGGAGA    D504    GGCTCTGA    Mouse   
1   PI0013  Sample_13       E02 D702    TCCGGAGA    D505    AGGCGAAG    Mouse   
1   PI0014  Sample_14       F02 D702    TCCGGAGA    D506    TAATCTTA    Mouse   
1   PI0015  Sample_15       G02 D702    TCCGGAGA    D507    CAGGACGT    Mouse   
1   PI0016  Sample_16       H02 D702    TCCGGAGA    D508    GTACTGAC    Mouse   
1   PI0017  Sample_17       A03 D703    CGCTCATT    D501    TATAGCCT    Mouse   
1   PI0018  Sample_18       B03 D703    CGCTCATT    D502    ATAGAGGC    Mouse   
1   PI0019  Sample_19       C03 D703    CGCTCATT    D503    CCTATCCT    Mouse   
1   PI0020  Sample_20       D03 D703    CGCTCATT    D504    GGCTCTGA    Mouse   
1   PI0021  Sample_21       E03 D703    CGCTCATT    D505    AGGCGAAG    Mouse   
1   PI0022  Sample_22       F03 D703    CGCTCATT    D506    TAATCTTA    Mouse   
1   PI0023  Sample_23       G03 D703    CGCTCATT    D507    CAGGACGT    Mouse   
1   PI0024  Sample_24       H03 D703    CGCTCATT    D508    GTACTGAC    Mouse   
1   PI0025  Sample_25       A04 D704    TCTCGCGC    D501    TATAGCCT    Mouse   
1   PI0026  Sample_26       B04 D704    AGCGATAG    D502    ATAGAGGC    Mouse

Again, it generated 26 fastqs but they are all empty. The bcl2fastq log file is found here bcl2fastq log

ADD REPLY • link 3.6 years ago by akh22 ▴ 110

0

Entering edit mode

Do you actually have access to the complete flowcell data folder? For bcl2fastq2 to work that is a requirement. Looking at the log, it looks to me that you don't have the full FC folder. Right after the fastq files are created is where the program should start reading the bcl files and converting them to sequence. Your log ends at the point. Are there any errors in any other log?

ADD REPLY • link 3.6 years ago by GenoMax 141k

0

Entering edit mode

The input folder appears to be intact, containing everything original except the samplesheet.csv which I had to create based on a sample submission sheet. I think the issue may be incorrect samplesheet entries, or some corrupted files which will generate errors, but I dont se them. At this point, I have to throw the towel

ADD REPLY • link 3.6 years ago by akh22 ▴ 110

0

Entering edit mode

I encounter exactly the same problem. Solved it by assigning more memory space for my docker app.

ADD REPLY • link 2.4 years ago by Charles • 0