Question: Unable to map Reverse read using STAR aligner
0
gravatar for bnayer26
12 months ago by
bnayer260
bnayer260 wrote:

Hi, I am new to aligning paired-end data using STAR. I used cutadapt to trim the adapters in my raw files and then used the output trimmed files to run with STAR. When I try to run the code for paired-end as follows:

STAR --runThreadN 16 --genomeLoad NoSharedMemory --genomeDir /home/bnay2/biotools/references/UCSC/mm10/STAR_mm10 --readFilesIn /home/bnay2/JeanProject/Trimmed_reads/EBN1_L1_1_trimmed.fq.gz /home/bnay2/JeanProject/Trimmed_reads/EBN1_L1_2_trimmed.fq.gz --readFilesCommand gunzip -c --outFileNamePrefix /home/bnay2/JeanProject/STAR_Aligned/trial1_PE_alignment_sample1/EBN1_L1_aligned --outSAMtype BAM Unsorted --outSAMattributes All --alignIntronMax 1000000 --alignEndsType EndToEnd

I get the following error:

Jan 06 02:50:22 ..... started STAR run Jan 06 02:50:22 ..... loading
genome Jan 06 02:52:37 ..... started mapping

EXITING because of FATAL ERROR in reads input: short read sequence
line: 1 Read Name=@E00477:565:H7F2CCCX2:3:1108:12702:34676 Read
Sequence==== DEF_readNameLengthMax=50000 DEF_readSeqLengthMax=650

Jan 06 02:53:28 ...... FATAL ERROR, exiting

However, if I run the same code by removing the second read of the pair (so I only map the first single-end read) like this:

STAR --runThreadN 16 --genomeLoad NoSharedMemory --genomeDir /home/bnay2/biotools/references/UCSC/mm10/STAR_mm10 --readFilesIn /home/bnay2/JeanProject/Trimmed_reads/EBN1_L1_1_trimmed.fq.gz --readFilesCommand gunzip -c --outFileNamePrefix /home/bnay2/JeanProject/STAR_Aligned/trial1_PE_alignment_sample1/EBN1_L1_1_aligned --outSAMtype BAM Unsorted --outSAMattributes All --alignIntronMax 1000000 --alignEndsType EndToEnd

then it surprisingly works, as I get this msg:

Jan 06 02:09:45 ..... started STAR run Jan 06 02:09:45 ..... loading
genome Jan 06 02:13:25 ..... started mapping Jan 06 02:17:21 .....
finished successfully

Next, when I try and run the same code for single-end mapping, but using the Reverse read this time (read2) with the following code:

STAR --runThreadN 16 --genomeLoad NoSharedMemory --genomeDir /home/bnay2/biotools/references/UCSC/mm10/STAR_mm10 --readFilesIn /home/bnay2/JeanProject/Trimmed_reads/EBN1_L1_2_trimmed.fq.gz --readFilesCommand gunzip -c --outFileNamePrefix /home/bnay2/JeanProject/STAR_Aligned/trial1_PE_alignment_sample1/EBN1_L1_2_aligned --outSAMtype BAM Unsorted --outSAMattributes All --alignIntronMax 1000000 --alignEndsType EndToEnd

I get the following error again:

Jan 06 02:45:22 ..... started STAR run Jan 06 02:45:22 ..... loading
genome Jan 06 02:47:59 ..... started mapping

EXITING because of FATAL ERROR in reads input: short read sequence
line: 1 Read Name=@E00477:565:H7F2CCCX2:3:1108:12702:34676 Read
Sequence==== DEF_readNameLengthMax=50000 DEF_readSeqLengthMax=650

Jan 06 02:48:17 ...... FATAL ERROR, exiting

So somehow, my code is working only on the Forward read for my paired-end trimmed reads.

If it helps, here is the code I used for my cutadapt step:

cutadapt --cores=14 -q 10,10 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -o EBN1_L1_1_trimmed.fq.gz -p EBN1_L1_2_trimmed.fq.gz ~/JeanProject/JeanRawData/EBN1_L1_1.fq.gz ~/JeanProject/JeanRawData/EBN1_L1_2.fq.gz

Any suggestions about what I can change would be really helpful, thanks in advance!

EDIT/UPDATE: I just tried running it with a second set of trimmed samples and it seemed to have worked: Here is the code I wrote:

STAR --runThreadN 16 --genomeLoad NoSharedMemory --genomeDir /home/bnay2/biotools/references/UCSC/mm10/STAR_mm10 --readFilesIn /home/bnay2/JeanProject/Trimmed_reads/EBN2_L2_1_trimmed.fq.gz /home/bnay2/JeanProject/Trimmed_reads/EBN2_L2_2_trimmed.fq.gz --readFilesCommand gunzip -c --outFileNamePrefix /home/bnay2/JeanProject/STAR_Aligned/trial1_PE_alignment_sample1/EBN2_L2_aligned --outSAMtype BAM Unsorted --outSAMattributes All --alignIntronMax 1000000 --alignEndsType EndToEnd

Here is what it printed on the screen after running:

Jan 06 02:58:20 ..... started STAR run Jan 06 02:58:20 ..... loading genome Jan 06 03:00:42 ..... started mapping Jan 06 03:13:40 ..... finished successfully

However, how can I use these output files to better understand whether my code did indeed run successfully or not? I know these log files have a lot of useful information but being new to this, could you kindly point me to some resources other than the STAR manual which I can use to understand why the code worked for one set of files and not the other, when in fact I have trimmed both of them with the exact same CUTADAPT code.

Thank you once again and I apologise if any of these questions are too basic, I'm still starting out!

mapping rna-seq star aligner • 380 views
ADD COMMENTlink modified 12 months ago by colindaven2.6k • written 12 months ago by bnayer260
0
gravatar for colindaven
12 months ago by
colindaven2.6k
Hannover Medical School
colindaven2.6k wrote:

The error is likely due to the read2 being trimmed down to a very short length. EXITING because of FATAL ERROR in reads input: short read sequence

Try

  • don't use trimming at all. It should run through (there was a biorxiv paper in late 2019 https://www.biorxiv.org/content/10.1101/833962v1 describing why trimming is overrated and even unnecessary for modern RNA-seq aligners like STAR)
  • set the minimum length parameter in your trimmer to something higher.
ADD COMMENTlink written 12 months ago by colindaven2.6k

Hi thanks for suggesting that. I'll give it a try. I am going to check my trimmed files with FastQC now to see what sequences are remaining. But other than that, do you think it could be a formatting error? How can I best view my trimmed fastq.gz files in the terminal if I just want to see how they are looking after the trimming process? Thanks!

ADD REPLYlink written 12 months ago by bnayer260

Good idea.

less x.fastq
less x.fastq.gz
ADD REPLYlink written 12 months ago by colindaven2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1103 users visited in the last hour
_