Ending up with only one BAM file
1
0
Entering edit mode
6 weeks ago
salman_96 ▴ 60

Hi,

I am working on a variant calling pipeline. Currently, I am trying to generate BAM files from raw fastq files.

The files are paired end end seq files and named like this (there are more than 20 files)

  • AB03_T1_R1_001_trimmed.fastq AB03_T1_R2_001_trimmed.fastq
  • AB05_T1_2D_R1_001_trimmed.fastq AB05_T1_2D_R2_001_trimmed.fastq
  • AB09T1_2D_R1_001_trimmed.fastq AB09T1_2D_R2_001_trimmed.fastq

The code is :

cd /data/Data/Results

genome=/data/Data/Genome/GRCh38.primary_assembly.genome.fa

for fq1 in /data/Data/*_R1_001_trimmed.fastq

 do  
    echo "working with file $fq1"
    base=$(basename $fq1 _R1_001_trimmed.fastq)
    echo "basename is $base"

    fq1=/data/Data/${base}_R1_001_trimmed.fastq
    fq2=/data/Data/${base}_R2_001_trimmed.fastq
    STAR --genomeDir /dataData/Genome \
    --runThreadN 30 --readFilesIn ${fq1} ${fq2} --outSAMtype BAM SortedByCoordinate --outFileNamePrefix STARpaired    

    done

The resulting file I am getting is only one .bam file named STARpairedAligned.sortedByCoord.out.bam

Can anyone assist why do I get only one file and how to fix this problem as I need individual files for each sample as I want to run Variant call on individual samples?

Regards, Salman

bam calling star variant files • 278 views
ADD COMMENT
0
Entering edit mode
for fq1 in /data/Data/*_R1_001_trimmed.fastq

you should use a workflow manager like nextflow or snakemake

ADD REPLY
0
Entering edit mode

You should get used right away to never work with uncompressed files. It just doesn't scale and takes up disk space which can be limited on shared clusters with quotas. Gzip your fastq files, most trimmers offer that. Same with the BAM files from STAR, use --outBAMcompression 5 for the STAR command to get BAM compression, otherwise the BAM files take up just hilariously much space. STAR is so bad at providing good defaults for simple things like that, but it is great at mapping and that is what counts most.

ADD REPLY
2
Entering edit mode
6 weeks ago

you're always using the very same output file whatever is the input in the loop.

--outFileNamePrefix STARpaired   

use something like

 --outFileNamePrefix "${base}STARpaired"   
ADD COMMENT

Login before adding your answer.

Traffic: 1585 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6