STAR Mapping muliple files
2
0
Entering edit mode
3.9 years ago
ipalmisa ▴ 10

Hi I am new in bioinformatic analysis and I would like to double check if I am doing things right. Thanks in advance for the help.

First question: I am using STAR to map my fastq files. It is a double ended RNAseq and I have multiple runs for each sample I am using the following command:

STAR --runThreadN 10 \
--genomeDir $RDS/projects/sequ/live/Genecode/mouseindex \
--readFilesIn $RDS/projects/Sample_SDG10/S*_R1_001.fastq.gz,S*_R1_002.fastq.gz,S*_R1_003.fastq.gz,S*_R1_004.fastq.gz,S*_R1_005.fastq.gz,S*_R1_006.fastq.gz,S*_R1_007.fastq.gz,S*_R1_008.fastq.gz $RDS/projects/Sample_SDG10/S*_R2_001.fastq.gz,S*_R2_002.fastq.gz,S*_R2_003.fastq.gz,S*_R2_004.fastq.gz,S*_R2_005.fastq.gz,S*_R2_006.fastq.gz,S*_R2_007.fastq.gz,S*_R2_008.fastq.gz \
--readFilesCommand gunzip -c \
--outFileNamePrefix $RDS/projects/sequ/live/mapped/Genecode/EEtest

But I get this error message;

gzip: S*_R1_002.fastq.gz: No such file or directory
gzip: S*_R1_003.fastq.gz: No such file or directory
gzip: S*_R1_004.fastq.gz: No such file or directory
gzip: S*_R1_005.fastq.gz: No such file or directory
gzip: S*_R1_006.fastq.gz: No such file or directory
gzip: S*_R1_007.fastq.gz: No such file or directory
gzip: S*_R1_008.fastq.gz: No such file or directory
gzip: S*_R2_002.fastq.gz: No such file or directory
gzip: S*_R2_003.fastq.gz: No such file or directory
gzip: S*_R2_004.fastq.gz: No such file or directory
gzip: S*_R2_005.fastq.gz: No such file or directory
gzip: S*_R2_006.fastq.gz: No such file or directory
gzip: S*_R2_007.fastq.gz: No such file or directory
gzip: S*_R2_008.fastq.gz: No such file or directory

The files are in the specified path. What am i doing wrong?

Second question: if I have only one R1 and one R2 file in the directory, can I use the * to avoid writing the file name? Is that correct? STAR would be able to match them, as they are the only R1 and R2 files in the folder, am I correct?

STAR --runThreadN 10 \
--genomeDir $RDS/projects/sequ/live/Genecode/mouseindex \
--readFilesIn $RDS/projects/*_R1_*.fastq.gz $RDS/projects/*_R2_*.fastq.gz \
--readFilesCommand gunzip -c \
--outFileNamePrefix $RDS/projects/sequ/live/mapped/Genecode/EEtest

Thanks Ilaria

alignment • 1.3k views
ADD COMMENT
0
Entering edit mode

A small educational note: I added (code) markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode

oh Thank you! I didn't know that!

ADD REPLY
0
Entering edit mode

it cannot work with '*' expension if the tool expect a list of comma separated files.

try

echo $RDS/projects/Sample_SDG10/S*_R1_001.fastq.gz,S*_R1_002.fastq.gz,S*_R1_003.fastq.gz,S*_R1_004.fastq.gz,S*_R1_005.fastq.gz,S*_R1_006.fastq.gz,S*_R1_007.fastq.gz,S*_R1_008.fastq.gz $RDS/projects/Sample_SDG10/S*_R2_001.fastq.gz,S*_R2_002.fastq.gz,S*_R2_003.fastq.gz,S*_R2_004.fastq.gz,S*_R2_005.fastq.gz,S*_R2_006.fastq.gz,S*_R2_007.fastq.gz,S*_R2_008.fastq.gz 

to see what is happening....

ADD REPLY
0
Entering edit mode

Thank you I have tried echo...you are right. Does this mean that I need to write the file names each time? I have around 200 files, with different names...isn't there any short cut, please?

ADD REPLY
0
Entering edit mode
3.9 years ago

I have around 200 files, with different names...isn't there any short cut, please?

   --readFilesIn `ls $RDS/projects/*_R1_*.fastq.gz | sort | tr "\n" ","` `ls $RDS/projects/*_R2_*.fastq.gz | sort | tr "\n" ","`
ADD COMMENT
0
Entering edit mode

Thanks a lot! I will do this way!

ADD REPLY
0
Entering edit mode
3.9 years ago

You might also make your life easier by catting all the relevant files together first, so you can give STAR just one R1 and one R2 file.

Also, are you completely sure that you want to combine your fastqs the way you have? Usually, different numbers after the S mean totally different samples.

ADD COMMENT
0
Entering edit mode

Hi thanks. Yes, I knew about the cat option, but I thought this way would be easier It's the same sample (SDG10) run more than once, so I have SDG10_GAGTGG_L006_R1_001.fastq SDG10_GAGTGG_L006_R2_001.fastq SDG10_GAGTGG_L006_R3_001.fastq and so on..... Thanks again

ADD REPLY

Login before adding your answer.

Traffic: 2824 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6