Question: featureCounts and bash
0
gravatar for cilgaiscan
11 months ago by
cilgaiscan50
Turkey
cilgaiscan50 wrote:

Hi everyone! I have a problem with my bash code, which I added down here. I would like to automatize the my RNA-seq workflow and I tried to write this bash code. (I am pretty new in bash and programming.) I expect several steps from my code. These are;

  • changing the directory into source directory and give me a notification. (It works fine)
  • creating a file array with finding all files that is named as accepted_hits.bam in source directory and its sub-folder.(Problem occurs here. My code scans all computer and i have a memory error.)
  • and run featureCounts.

code starts

#!/bin/bash

SOURCE_DIR=$1
TARGET_DIR=$2

echo 'Going to $SOURCE_DIR'
cd $SOURCE_DIR

FILE_ARRAY=/$(locate accepted_hits.bam)

for file in $FILE_ARRAY; do

    serial_number=${file}
    echo "Working on $serial_number"

    mkdir -p $TARGET_DIR/$serial_number

      featureCounts -t $SOURCE_DIR/$file -a /home/cilga/Desktop/VPC/ref/hg19/hg19.gtf -o $TARGET_DIR$serial_number        done

I also tried these arrays, too:

find . -name $SOURCE_DIR/accepted_hits.bam

find . -path \*/$SOURCE_DIR/accepted_hits.bam

find ./ | grep '$SOURCE_DIR/accepted_hits.bam$'

find / -name 'accepted_hits.bam'

If someone helps me, I would be appreciated. Thank you :)

bash rna-seq featurecounts • 679 views
ADD COMMENTlink modified 11 months ago by jkkim30 • written 11 months ago by cilgaiscan50
1

Hello everyone, I was able to solve problem with small touches. I am sharing my working code :) Thanks for everyone for your helps.

   #!/bin/bash

SOURCE_DIR=$1
TARGET_DIR=$2

cd $SOURCE_DIR

find . -name 'accepted_hits.bam'
inputfiles=$(find . -name 'accepted_hits.bam' | xargs echo)
readlink -m $inputfiles

featureCounts -a hg19.gtf -o rnaseq.txt $inputfiles
ADD REPLYlink written 11 months ago by cilgaiscan50
1
find . -name 'accepted_hits.bam'

Just shows the files, literally does nothing else

inputfiles=$(find . -name 'accepted_hits.bam' | xargs echo)

The | xargs echo part does nothing here

readlink -m $inputfiles

Shows the path and literally does nothing else

Also, you should always quote your Bash variables..

ADD REPLYlink written 11 months ago by 5heikki8.4k

hi. thank you, for your explanations. I am pretty new in bash and also programming too. I tried to combine things together :( thank you again!

ADD REPLYlink written 11 months ago by cilgaiscan50
1
gravatar for Devon Ryan
11 months ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:
find $SOURCE_DIR -name accepted_hits.bam
ADD COMMENTlink written 11 months ago by Devon Ryan91k

it didn’t worked but i found another solution. thank you :)

ADD REPLYlink written 11 months ago by cilgaiscan50
1

People with similar problems may stumble upon your thread, so perhaps you could share your solution here?

ADD REPLYlink written 11 months ago by WouterDeCoster40k
1
gravatar for jkkim
11 months ago by
jkkim30
S.Korea
jkkim30 wrote:

There's some tools for building pipelines. snakemake, gnu-parallel...etc. Here's some code for star alignment using gnu-parallel. It works well for me.

STAR_INDEX="/home/mRNAseq/ref/gencode/human/GRCh37.19/idx/"  
INPUT_DIR="00_raw_fastq"
OUTPUT_DIR="01_alignment_genes_count"
GTF="/home/mRNAseq/ref/gencode/human/GRCh37.19/gtf/gencode.v19.annotation.gtf"
RSEM_INDEX="/home/mRNAseq/ref/gencode/human/GRCh37.19/idx/GRCh37.p13.genome"

mkdir -p $OUTPUT_DIR

cat samples.list | parallel --jobs 1 "mkdir -p $OUTPUT_DIR/{}"

cat samples.list | parallel --jobs 1  "STAR --genomeDir $STAR_INDEX --twopassMode Basic --quantMode GeneCounts --readFilesCommand zcat --readFilesIn $INPUT_DIR/{}_1.fastq.gz $INPUT_DIR/{}_2.fastq.gz --outFileNamePrefix $OUTPUT_DIR/{}/{}. --runThreadN 20 --outSAMtype BAM SortedByCoordinate --sjdbGTFfile $GTF --outSAMunmapped Within  --outSAMattributes Standard --runMode alignReads"

It's from my gist. The samples.list file is like ls *R1.fastq.gz | sed 's/R1.fastq.gz//' > samples.list something like this.

ADD COMMENTlink modified 11 months ago by finswimmer11k • written 11 months ago by jkkim30
1

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

Note that you can also post a link to a GitHub gist here directly and it will be rendered.

ADD REPLYlink written 11 months ago by WouterDeCoster40k
1

If you have a gist then I suggest that you replace the text above with a link to your gist. Biostars will automatically get the code from gist and show it in-line.

On a different note: While this is an alternate it is not a solution to the question that OP was asking originally.

ADD REPLYlink modified 11 months ago • written 11 months ago by genomax70k
0
gravatar for 5heikki
11 months ago by
5heikki8.4k
Finland
5heikki8.4k wrote:

Maybe something like this..

#!/bin/bash
DEPTH="1"
for FILE in $(find "$1" -maxdepth 10 -name "accepted_hits.bam"); do
    echo "Working on $FILE"
    mkdir -p "$2/$(basename "$FILE").$DEPTH"
    featureCounts -t "$FILE" \
      -a /home/cilga/Desktop/VPC/ref/hg19/hg19.gtf \
      -o "$2/$(basename "$FILE").$DEPTH"
    ((DEPTH++))
done
ADD COMMENTlink modified 11 months ago • written 11 months ago by 5heikki8.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1025 users visited in the last hour