Question: STAR Mapping muliple files
0
gravatar for ipalmisa
5 months ago by
ipalmisa10
ipalmisa10 wrote:

Hi I am new in bioinformatic analysis and I would like to double check if I am doing things right. Thanks in advance for the help.

First question: I am using STAR to map my fastq files. It is a double ended RNAseq and I have multiple runs for each sample I am using the following command:

STAR --runThreadN 10 \
--genomeDir $RDS/projects/sequ/live/Genecode/mouseindex \
--readFilesIn $RDS/projects/Sample_SDG10/S*_R1_001.fastq.gz,S*_R1_002.fastq.gz,S*_R1_003.fastq.gz,S*_R1_004.fastq.gz,S*_R1_005.fastq.gz,S*_R1_006.fastq.gz,S*_R1_007.fastq.gz,S*_R1_008.fastq.gz $RDS/projects/Sample_SDG10/S*_R2_001.fastq.gz,S*_R2_002.fastq.gz,S*_R2_003.fastq.gz,S*_R2_004.fastq.gz,S*_R2_005.fastq.gz,S*_R2_006.fastq.gz,S*_R2_007.fastq.gz,S*_R2_008.fastq.gz \
--readFilesCommand gunzip -c \
--outFileNamePrefix $RDS/projects/sequ/live/mapped/Genecode/EEtest

But I get this error message;

gzip: S*_R1_002.fastq.gz: No such file or directory
gzip: S*_R1_003.fastq.gz: No such file or directory
gzip: S*_R1_004.fastq.gz: No such file or directory
gzip: S*_R1_005.fastq.gz: No such file or directory
gzip: S*_R1_006.fastq.gz: No such file or directory
gzip: S*_R1_007.fastq.gz: No such file or directory
gzip: S*_R1_008.fastq.gz: No such file or directory
gzip: S*_R2_002.fastq.gz: No such file or directory
gzip: S*_R2_003.fastq.gz: No such file or directory
gzip: S*_R2_004.fastq.gz: No such file or directory
gzip: S*_R2_005.fastq.gz: No such file or directory
gzip: S*_R2_006.fastq.gz: No such file or directory
gzip: S*_R2_007.fastq.gz: No such file or directory
gzip: S*_R2_008.fastq.gz: No such file or directory

The files are in the specified path. What am i doing wrong?

Second question: if I have only one R1 and one R2 file in the directory, can I use the * to avoid writing the file name? Is that correct? STAR would be able to match them, as they are the only R1 and R2 files in the folder, am I correct?

STAR --runThreadN 10 \
--genomeDir $RDS/projects/sequ/live/Genecode/mouseindex \
--readFilesIn $RDS/projects/*_R1_*.fastq.gz $RDS/projects/*_R2_*.fastq.gz \
--readFilesCommand gunzip -c \
--outFileNamePrefix $RDS/projects/sequ/live/mapped/Genecode/EEtest

Thanks Ilaria

alignment • 214 views
ADD COMMENTlink modified 5 months ago by swbarnes28.9k • written 5 months ago by ipalmisa10

A small educational note: I added (code) markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLYlink written 5 months ago by lieven.sterck8.7k

oh Thank you! I didn't know that!

ADD REPLYlink written 5 months ago by ipalmisa10

it cannot work with '*' expension if the tool expect a list of comma separated files.

try

echo $RDS/projects/Sample_SDG10/S*_R1_001.fastq.gz,S*_R1_002.fastq.gz,S*_R1_003.fastq.gz,S*_R1_004.fastq.gz,S*_R1_005.fastq.gz,S*_R1_006.fastq.gz,S*_R1_007.fastq.gz,S*_R1_008.fastq.gz $RDS/projects/Sample_SDG10/S*_R2_001.fastq.gz,S*_R2_002.fastq.gz,S*_R2_003.fastq.gz,S*_R2_004.fastq.gz,S*_R2_005.fastq.gz,S*_R2_006.fastq.gz,S*_R2_007.fastq.gz,S*_R2_008.fastq.gz 

to see what is happening....

ADD REPLYlink written 5 months ago by Pierre Lindenbaum131k

Thank you I have tried echo...you are right. Does this mean that I need to write the file names each time? I have around 200 files, with different names...isn't there any short cut, please?

ADD REPLYlink written 5 months ago by ipalmisa10
0
gravatar for Pierre Lindenbaum
5 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

I have around 200 files, with different names...isn't there any short cut, please?

   --readFilesIn `ls $RDS/projects/*_R1_*.fastq.gz | sort | tr "\n" ","` `ls $RDS/projects/*_R2_*.fastq.gz | sort | tr "\n" ","`
ADD COMMENTlink modified 5 months ago • written 5 months ago by Pierre Lindenbaum131k

Thanks a lot! I will do this way!

ADD REPLYlink written 5 months ago by ipalmisa10
0
gravatar for swbarnes2
5 months ago by
swbarnes28.9k
United States
swbarnes28.9k wrote:

You might also make your life easier by catting all the relevant files together first, so you can give STAR just one R1 and one R2 file.

Also, are you completely sure that you want to combine your fastqs the way you have? Usually, different numbers after the S mean totally different samples.

ADD COMMENTlink written 5 months ago by swbarnes28.9k

Hi thanks. Yes, I knew about the cat option, but I thought this way would be easier It's the same sample (SDG10) run more than once, so I have SDG10_GAGTGG_L006_R1_001.fastq SDG10_GAGTGG_L006_R2_001.fastq SDG10_GAGTGG_L006_R3_001.fastq and so on..... Thanks again

ADD REPLYlink written 5 months ago by ipalmisa10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 914 users visited in the last hour