Question: featureCounts and bash
0
gravatar for cilgaiscan
7 months ago by
cilgaiscan50
Turkey
cilgaiscan50 wrote:

Hi everyone! I have a problem with my bash code, which I added down here. I would like to automatize the my RNA-seq workflow and I tried to write this bash code. (I am pretty new in bash and programming.) I expect several steps from my code. These are;

  • changing the directory into source directory and give me a notification. (It works fine)
  • creating a file array with finding all files that is named as accepted_hits.bam in source directory and its sub-folder.(Problem occurs here. My code scans all computer and i have a memory error.)
  • and run featureCounts.

code starts

#!/bin/bash

SOURCE_DIR=$1
TARGET_DIR=$2

echo 'Going to $SOURCE_DIR'
cd $SOURCE_DIR

FILE_ARRAY=/$(locate accepted_hits.bam)

for file in $FILE_ARRAY; do

    serial_number=${file}
    echo "Working on $serial_number"

    mkdir -p $TARGET_DIR/$serial_number

      featureCounts -t $SOURCE_DIR/$file -a /home/cilga/Desktop/VPC/ref/hg19/hg19.gtf -o $TARGET_DIR$serial_number        done

I also tried these arrays, too:

find . -name $SOURCE_DIR/accepted_hits.bam

find . -path \*/$SOURCE_DIR/accepted_hits.bam

find ./ | grep '$SOURCE_DIR/accepted_hits.bam$'

find / -name 'accepted_hits.bam'

If someone helps me, I would be appreciated. Thank you :)

bash rna-seq featurecounts • 480 views
ADD COMMENTlink modified 7 months ago by jkkim20 • written 7 months ago by cilgaiscan50
1

Hello everyone, I was able to solve problem with small touches. I am sharing my working code :) Thanks for everyone for your helps.

   #!/bin/bash

SOURCE_DIR=$1
TARGET_DIR=$2

cd $SOURCE_DIR

find . -name 'accepted_hits.bam'
inputfiles=$(find . -name 'accepted_hits.bam' | xargs echo)
readlink -m $inputfiles

featureCounts -a hg19.gtf -o rnaseq.txt $inputfiles
ADD REPLYlink written 7 months ago by cilgaiscan50
1
find . -name 'accepted_hits.bam'

Just shows the files, literally does nothing else

inputfiles=$(find . -name 'accepted_hits.bam' | xargs echo)

The | xargs echo part does nothing here

readlink -m $inputfiles

Shows the path and literally does nothing else

Also, you should always quote your Bash variables..

ADD REPLYlink written 7 months ago by 5heikki8.3k

hi. thank you, for your explanations. I am pretty new in bash and also programming too. I tried to combine things together :( thank you again!

ADD REPLYlink written 7 months ago by cilgaiscan50
1
gravatar for Devon Ryan
7 months ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:
find $SOURCE_DIR -name accepted_hits.bam
ADD COMMENTlink written 7 months ago by Devon Ryan88k

it didn’t worked but i found another solution. thank you :)

ADD REPLYlink written 7 months ago by cilgaiscan50
1

People with similar problems may stumble upon your thread, so perhaps you could share your solution here?

ADD REPLYlink written 7 months ago by WouterDeCoster37k
1
gravatar for jkkim
7 months ago by
jkkim20
S.Korea
jkkim20 wrote:

There's some tools for building pipelines. snakemake, gnu-parallel...etc. Here's some code for star alignment using gnu-parallel. It works well for me.

STAR_INDEX="/home/mRNAseq/ref/gencode/human/GRCh37.19/idx/"  
INPUT_DIR="00_raw_fastq"
OUTPUT_DIR="01_alignment_genes_count"
GTF="/home/mRNAseq/ref/gencode/human/GRCh37.19/gtf/gencode.v19.annotation.gtf"
RSEM_INDEX="/home/mRNAseq/ref/gencode/human/GRCh37.19/idx/GRCh37.p13.genome"

mkdir -p $OUTPUT_DIR

cat samples.list | parallel --jobs 1 "mkdir -p $OUTPUT_DIR/{}"

cat samples.list | parallel --jobs 1  "STAR --genomeDir $STAR_INDEX --twopassMode Basic --quantMode GeneCounts --readFilesCommand zcat --readFilesIn $INPUT_DIR/{}_1.fastq.gz $INPUT_DIR/{}_2.fastq.gz --outFileNamePrefix $OUTPUT_DIR/{}/{}. --runThreadN 20 --outSAMtype BAM SortedByCoordinate --sjdbGTFfile $GTF --outSAMunmapped Within  --outSAMattributes Standard --runMode alignReads"

It's from my gist. The samples.list file is like ls *R1.fastq.gz | sed 's/R1.fastq.gz//' > samples.list something like this.

ADD COMMENTlink modified 7 months ago by finswimmer11k • written 7 months ago by jkkim20
1

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

Note that you can also post a link to a GitHub gist here directly and it will be rendered.

ADD REPLYlink written 7 months ago by WouterDeCoster37k
1

If you have a gist then I suggest that you replace the text above with a link to your gist. Biostars will automatically get the code from gist and show it in-line.

On a different note: While this is an alternate it is not a solution to the question that OP was asking originally.

ADD REPLYlink modified 7 months ago • written 7 months ago by genomax63k
0
gravatar for 5heikki
7 months ago by
5heikki8.3k
Finland
5heikki8.3k wrote:

Maybe something like this..

#!/bin/bash
DEPTH="1"
for FILE in $(find "$1" -maxdepth 10 -name "accepted_hits.bam"); do
    echo "Working on $FILE"
    mkdir -p "$2/$(basename "$FILE").$DEPTH"
    featureCounts -t "$FILE" \
      -a /home/cilga/Desktop/VPC/ref/hg19/hg19.gtf \
      -o "$2/$(basename "$FILE").$DEPTH"
    ((DEPTH++))
done
ADD COMMENTlink modified 7 months ago • written 7 months ago by 5heikki8.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2264 users visited in the last hour