Question: Python scripts to map multiple pair end files to the genome
1
gravatar for Bioinfonext
3.4 years ago by
Bioinfonext250
Korea
Bioinfonext250 wrote:

I read python basic codes and I am fimiliar with for loops, But not able to understand how to make a python programme to run hisat2 on multiple pair-end reads.

I want to run hisat2 mapping tool on mutiple pair end reads to map these on Genome.

I already have prepared genome index.

1)my input will be hisat2 location and genome index, pair-end read file 2) How should I give multiple hisat2 arguments and how to store sam file in a separate folder for each pair-end reads.

Can you help to make this python script ?

Location of hisat2:
 /home/yog/software/hisat2-2.0.4/hisat2

Location of genome index:
data/Analysis/Radish_index

Location of pair end files:

 /data/Analysis/mapped/

multiple pair end files like this:

216_5W_Ca1_R1.fq

216_5W_Ca1_R2.fq

218_5W_Ca1_R1.fq

218_5W_Ca1_R1.fq



216_5W_Co1_R1.fq

216_5W_Co1_R2.fq

218_5W_Co1_R1.fq

218_5W_Co1_R1.fq

Now I am running hisat2 on individual files by using this commonad:

/home/yog/software/hisat2-2.0.4/hisat2 -p 8 -x /data/Analysis/Radish_index  --fr -1 /data/Analysis/mapped/216_5W_Ca1_R1.fq -2 /data/Analysis/mapped/216_5W_Ca1_R2.fq -S 216_5W_Ca1.sam
rna-seq • 2.6k views
ADD COMMENTlink modified 3.4 years ago by Philipp Bayer6.7k • written 3.4 years ago by Bioinfonext250
7
gravatar for Philipp Bayer
3.4 years ago by
Philipp Bayer6.7k
Australia/Perth/UWA
Philipp Bayer6.7k wrote:

It would be easier to do this in bash:

for r1 in  /data/Analysis/mapped/*R1.fq
do 
  #replace R1 by R2
  r2=${r1/R1/R2}
  # snip off the fq and add sam to the output file name
  out=${r1%%_R1.fq}.sam
  /home/yog/software/hisat2-2.0.4/hisat2 -p 8 -x /data/Analysis/Radish_index  --fr -1 $r1 -2 $r2 -S $out
done

In Python it's similar:

import glob
import os
for r1 in glob.glob('/data/Analysis/mapped/*R1.fq'):
  r2 = r1.replace('R1', 'R2')
  out = r1.replace('_R1.fq', '.sam')
  os.popen('/home/yog/software/hisat2-2.0.4/hisat2 -p 8 -x /data/Analysis/Radish_index  --fr -1 %s -2 %s -S %s'%(r1, r2, out))

None of those are multithreaded - you can use the subprocess library instead of os.popen in Python so that each job runs in the background instead of it running each job one after the other, and you can end the command using & in bash to run all of the jobs at the same time, like this:

/home/yog/software/hisat2-2.0.4/hisat2 -p 8 -x /data/Analysis/Radish_index  --fr -1 $r1 -2 $r2 -S $out &

Here I'm also not doing anything special to capture the standard error or the standard output, it would be better for you to have a look at all of the standard error.

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Philipp Bayer6.7k

@Philipp Bayer Thanks, this bash script is awesome as I tried it on my data. But, my paired-end files are named in the following pattern:

sample1_lane1_R1_001.pass_sample1.fq     sample1_lane1_R2_001.pass_sample1.fq
sample2_lane1_R1_001.pass_sample2.fq     sample2_lane1_R2_001.pass_sample2.fq

I am trying to edit the script to get the following pattern in the name of sam and bam output, but did not work:

sample1_lane1_pass_sample1.sam
sample2_lane1_pass_sample2.sam

However, the editing of :

SAM=${r1%%_R1_.fq}.sam

/home/yog/software/hisat2-2.0.4/hisat2 -p 8 -x /data/Analysis/Radish_index  --fr -1 $r1 -2 $r2 -S $out

to

 SAM=${r1%%_R1_.fq}.sam

 BAM=${r1%%_R1_.fq}.bam

 /home/yog/software/hisat2-2.0.4/hisat2 -p 8 -x /data/Analysis/Radish_index  --fr -1 $r1 -2 $r2 -S  $SAM | samtools view -bS > $BAM

worked. :)

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by bioinfo8120

I figured out:

  SAM=${r1%.fq}.sam
  BAM=${r1%.fq}.bam

Ref: How to remove the extension of a file?

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by bioinfo8120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1534 users visited in the last hour