Multiple commands in shell programming
2
0
Entering edit mode
8.6 years ago

Hi I am new to shell programming was reading a "chromosome coordinates" file line by line and then further want to use each line and give it to a new different commands so my "chromosome coordinates" file looks like below these are few lines but I have 2500 such lines in this file

> Chromosome:Position-Position
chr1:822979-822979
chr1:1052781-1052781
chr1:1216196-1216196
chr1:5053847-5053847
chr1:5735093-5735093
chr1:7973055-7973055
chr1:14768954-14768954
chr1:16903372-16903372
chr1:16903374-16903374
chr1:16903402-16903402
chr1:17224399-17224399
chr1:18398185-18398185
chr1:20001583-20001583
............

and I am reading this file using following script which looks like working fine

#!/bin/bash
filename="$1"
while read -r line  || [[ -n "$line" ]]; # read all the line and this reads the last line|| [[ -n "$line" ]]
do
    name="$line"
    echo "$name"
done < "$filename"

Now i want to give each line of the "chromosome coordinates" file as an input to samtools command e.g.

samtools view -f 0x02 -b CLL.bam chr1:822979-822979 > chr1_reads.bam samtools view -f 0x02 -b CLL.bam chr1:1052781-1052781 > chr1.1_reads.bam samtools view -f 0x02 -b CLL.bam chr1:1216196-1216196 > chr1.2_reads.bam and so on for 2500 lines

The above output file will contain reads from one location only but I want to make output files that contains reads from all the 2500 lines that were read from the original "chromosome coordinates" file.

Further want to use the "chr1_reads.bam" , chr1.1_reads.bam, chr1.2_reads.bam files as an input to another command

samtools index chr1_reads.bam samtools index chr1.1_reads.bam samtools index chr1.2_reads.bam

then finally give the indexed file as input to another command

samtools idxstats chr1_reads.bam > chr_count samtools idxstats chr1.1_reads.bam > chr_count samtools idxstats chr1.2_reads.bam > chr_count`

But want to store the output of the final command to be stored in one singe output file "chr_count"

Any suggestions will be highly appreciated.

Thanks

shell • 1.7k views
ADD COMMENT
4
Entering edit mode
8.6 years ago
venu 7.1k

So you want to supply the each region of your coordinate file to the samtools . As you said there are 2500 lines, you can do something like following (you get bam files with names CLL_chr1:822979-822979.bam instead of chr1.1_reads.bam, I think it doesn't make any difference)

while read region; do samtools -f 0x02 -b CLL.bam "$region" > CLL_"$region".bam; done < chr_coordinates.txt

Then samtools index, you can use a bash for loop or GNU parallel (Examples)

Using GNU parallel (and you can try for idxstats)

parallel 'samtools index {}' ::: *.bam

BTW what does -f 0x02 do in samtools? (Because when I test this on my BAM files, it is showing unrecognized command -f)

ADD COMMENT
2
Entering edit mode

-f is a flag from old samtools. My apologies for the mistake.

ADD REPLY
1
Entering edit mode
8.6 years ago
John 13k

You can use xargs for this :)

Something like:

sed '1d' file.txt | xargs -P 4 -I % sh -c "samtools view -f 2 -F 12 -b CLL.bam % > %.bam"

Gives:

samtools view -f 0x02 -b CLL.bam chr1:822979-822979 > chr1:822979-822979.bam
samtools view -f 0x02 -b CLL.bam chr1:1052781-1052781 > chr1:1052781-1052781.bam
samtools view -f 0x02 -b CLL.bam chr1:1216196-1216196 > chr1:1216196-1216196.bam
samtools view -f 0x02 -b CLL.bam chr1:5053847-5053847 > chr1:5053847-5053847.bam
...

Running 4 samtools processes at a time, because why not. I used sed '1d' to cat the file without the first line (your header). To see what it's doing before you run it in your environment, replace sh -c with echo

ADD COMMENT

Login before adding your answer.

Traffic: 3088 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6