human mapping fastq second pass error, how to reuse previously generated files, STAR
1
0
Entering edit mode
3.7 years ago

In my quest to build my own mapping using the Fastq files, read 1 and 2, given to me by the sequencing intermediary I have:

  • checked for quality with FASTQC, the nucleotide individual reads were of very good quality, but I am still worried about the scars diversity of the quality scores within the files and FASTQC would not report on such figure (I have not seen it, nor I found a FASTQC command to calculate it)

  • used a reference genome downloaded from the NCBI to construct a new genome index directory. I used the GRCh38.p13. Not aware of the GTF GFF files that apparently are a direct download that might have saved me time (is there a place to have more information on them, notably for human genome mapping?).

  • I then launched successfully on a 16 CPU Threads machine with 60 GB of ram the mapping using STAR and adding the option "--twopassMode Basic". The first pass was generated after a couple of hours, but the second pass incurred a memory error, here is the exit message:

Aug 15 04:35:56 ..... started sorting BAM Max memory needed for sorting = 4537708471 *EXITING because of FATAL ERROR: number of bytes expected from the BAM bin does not agree with the actual size on disk: Expected bin size=3846976240 ; size on disk=1328263168 ; bin number=47 Aug 15 04:37:12 ...... FATAL ERROR, exiting*

I suppose there was not enough space, is there a way to take back the previously generated files once I have added space with STAR? Avoiding a complete recalculation, I haven't seen anything about it in the documentation.

The command line used for mapping:

sudo nohup STAR 
--runThreadN 16 \
--readFilesIn ~/r1.fastq.gz ~/r2.fastq.gz \
--genomeDir ~/hg38_index \
--outFileNamePrefix polly \
--readFilesCommand zcat \
--outSAMtype BAM SortedByCoordinate \
--outSAMunmapped Within \
--outSAMattributes Standard \
--twopassMode Basic

Wishing you a nice weekend,

alignment RNA-Seq software error sequencing • 660 views
ADD COMMENT
0
Entering edit mode
3.7 years ago

Found an old post, for whom not familiar with linux, the number of open files allowed becomes an issue if your system applies a standard limit of 1024 (command "ulimit -n"). In case you cannot change the limit you'll have to joggle between the number of threads and the Bins used to sort the output BAM file.

files_open = Bins * Threads

Voila, to my best knowledge the alignment completed successful, wishing you a good week,

ADD COMMENT

Login before adding your answer.

Traffic: 2545 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6