Question: STAR alignment error: ERROR in input reads
0
gravatar for prabin.dm
23 months ago by
prabin.dm190
USA/Amherst/Umass
prabin.dm190 wrote:

Hi,

I am using STAR to align my RNAseq datasets and I am having this error

ReadAlignChunk_processChunks.cpp:115:processChunks EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or >

This is my code

module load star/2.5.3a

STAR -- genomeDir mouse/star_genome_mm10 \
        -- readFilesIn L001_R1_001.fastq.gz \
        --outSAMtype BAM SortedByCoordinate \
        --outSAMunmapped Within \
        --twopassMode Basic \
        --outFilterMultimapNmax 1 
        --quantMode TranscriptomeSAM \
        --runThreadN 6 \
        --outFileNamePrefix "STAR_output/Test/"

The Fastq files look like this

@NS500540:133:HNFTLBGX5:1:11101:11802:1042 1:N:0:ACTGAT
CTCCGNTTTATTTATTTGTTCTGCAAATTCGATGCGTCTACCTTCAAATAAAGCATTCATCTTTCTCTGTGACTCT
+
AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

Is there something wrong I am not able to figure out? I have checked the file for each line, they start with @

thank you

rna-seq star fastq • 2.6k views
ADD COMMENTlink modified 23 months ago • written 23 months ago by prabin.dm190
1

I have checked the file for each line, they start with @

Are you certain about that? Did you do anything to this file that may have corrupted the format (e.g. improper trimming)?

You can try validateFiles utility from Jim Kent to see if your file checks out.

ADD REPLYlink modified 23 months ago • written 23 months ago by genomax91k

Thats looks like an useful utility.

When I downloaded it saves as a text file, which I cant run. Can you please let me know how to use it?

ADD REPLYlink written 23 months ago by prabin.dm190

You need to add execute permission to it by doing chmod a+x validateFiles before you can run it.

ADD REPLYlink written 23 months ago by genomax91k
1

There shouldn't be a space in -- readFilesIn. Please select a title which describes your problem better than this.

ADD REPLYlink modified 23 months ago • written 23 months ago by WouterDeCoster44k

thank you. I will make that change. Also, I changed the title.

ADD REPLYlink written 23 months ago by prabin.dm190

what are the outputs of

file L001_R1_001.fastq.gz

and

gunzip -c L001_R1_001.fastq.gz | paste - - - - | cut -c 1 | uniq | sort | uniq

?

ADD REPLYlink written 23 months ago by Pierre Lindenbaum131k

the output for file L001_R1_001.fastq.gz is L001_R1_001.fastq.gz: gzip compressed data, extra field

and the output for gunzip -c L001_R1_001.fastq.gz | paste - - - - | cut -c 1 | uniq | sort | uniq is @

ADD REPLYlink written 23 months ago by prabin.dm190
7
gravatar for h.mon
23 months ago by
h.mon31k
Brazil
h.mon31k wrote:

If your fastq files are gzip-compressed, you have to use the parameter --readFilesCommand zcat.

ADD COMMENTlink written 23 months ago by h.mon31k

That worked. Thank you

While we are at it, can i ask another question?

I prepared STAR genome indices with --sjdbOverhang 99. Currently the readlength of my fastq files are 75bp.

Should I prepare genome indices again ?

ADD REPLYlink written 23 months ago by prabin.dm190
2

Biostars is better organized if each thread has only one question. Anyway, no, you do not need to build the indices again, see this post: Confused about sjdbOverhang .

ADD REPLYlink written 23 months ago by h.mon31k

Thanks again. I will do that.

ADD REPLYlink written 23 months ago by prabin.dm190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1471 users visited in the last hour