Question: Can I use PICARD (SortSam) instead of SAMTOOLS (sort) for sorting bam files in RNAseq pipeline where HISAT2 is used for alignment.
1
gravatar for shuksi1984
10 months ago by
shuksi198440
shuksi198440 wrote:

My pipeline includes following steps:

STEP-1: Alignmnet with HISAT2

path/to/hisat2 -f -x /path/to/in-built/genome -1 /path/to/SRR925687_1.fa -2 /path/to/SRR925687_2.fa -S /path/to/RNA.sam

STEP-2 SAM-->Conversion

samtools view -S -b  /path/to/RNA.sam > /path/to/RNA.bam

STEP-3 BAM sorting

sudo java -jar /path/to/picard.jar SortSam INPUT=RNA.bam OUTPUT=RNA.sorted.bam SORT_ORDER=coordinate

SREP-4 Assemble transcripts with StringTie

/path/to/stringtie RNA.sorted.bam -A RNA.gene.abudance.tab -C RNA.cov.refs.gtf -G Homo_sapiens.GRCh38.86.gtf -B -e -o RNA.gtf -p 4

STEP-5 Prepare for DESeq2

cd /path/to/RNA #(where ballgown subdirectory is created)
python ./prepDE.py

Error: sub-directory 'ballgown' not found!

I created a subdirectory with the name "ballgown" and placed *.ctab files and GTF files, then executed the above the command, I got following error:

Error: no GTF files found under ./ballgown !

I believe error might be due to BAM sorting by sortsam. But, I didnt get any error message in rest of the steps.

ADD COMMENTlink modified 10 months ago • written 10 months ago by shuksi198440

To me those errors do not suggest something is wrong with your bam file sorting. What is the output of

ls /path/to/RNA(where ballgown subdirectory is created)?

ADD REPLYlink written 10 months ago by WouterDeCoster37k

Following files I moved in ballgown subdirectory

e2t.ctab
e_data.ctab
i2t.ctab
i_data.ctab
t_data.ctab
RNA.cov.refs.gtf
RNA.gtf
RNA.gene.abudance.tab

I also moved Homo_sapiens.GRCh38.86.gtf, when my error was not resolved

ADD REPLYlink modified 10 months ago by RamRS20k • written 10 months ago by shuksi198440

I looked at the code of prepDE.py and this suggests that it can indeed not find the gtf files in that directory (perhaps the name is not as expected) and does not have a link with sorting bam files.

ADD REPLYlink modified 10 months ago • written 10 months ago by WouterDeCoster37k

Can the code of prepDE.py not recognize GTF file with .gtf extension?

Shall I perform the sorting step with SAMTOOLS?

ADD REPLYlink written 10 months ago by shuksi198440

Can the code of prepDE.py not recognize GTF file with .gtf extension?

The code is looking for a *.gtf file, but I'm not sure if it requires other naming constraints, this is the line searching for gtf files:

samples = [(i,glob.iglob(os.path.join(opts.input,i,"*.gtf")).next()) for i in next(os.walk(opts.input))[1] if re.search(opts.pattern,i)]

Shall I perform the sorting step with SAMTOOLS?

If that makes you happy, go for it.

ADD REPLYlink modified 10 months ago • written 10 months ago by WouterDeCoster37k

I used following command:

samtools sort RNA.bam -o RNA.sorted.bam

It got stuck.

ADD REPLYlink modified 10 months ago • written 10 months ago by shuksi198440

Stuck in what way? Depending on the size of the file sorting can take a while.

ADD REPLYlink written 10 months ago by genomax63k

It is running from the past 24hrs. File size is 3.5G RNA.bam

ADD REPLYlink written 10 months ago by shuksi198440

At that size sorting should not take 24h. How much memory do you have? Are you able to see if the samtools process is doing anything? Are there *tmp* files?

ADD REPLYlink written 10 months ago by genomax63k

Numerous RNA.sorted.bam.0000.bam files are generated, which disapper after sometime. Then after nothing is generated except multiples lines of "c62;c62;c62;c62;c62;c62;c62;" in terminal.

My machine has: RAM-16G HDD-1TB

ADD REPLYlink modified 10 months ago • written 10 months ago by shuksi198440

I believe, ballgown is used to calculate differential expression of multiple samples, while I have taken single sample. So, i need multiple samples for ballgown to work. Maybe this can help.

ADD REPLYlink modified 9 months ago • written 9 months ago by shuksi198440

Let me know if anybody can find any solution

ADD REPLYlink modified 10 months ago • written 10 months ago by shuksi198440

Based on your description of hardware and file size this should be a trivial conversion taking less than 20 min.

What OS are you using? Did you compile samtools yourself? What version of samtools are you using?

ADD REPLYlink written 10 months ago by genomax63k

OS description:

Distributor ID: Ubuntu

Description:Ubuntu 15.04

Release:15.04

Codename:vivid

Yes, I compiled samtools myself. SAMTOOLS: Version: 1.2 (using htslib 1.2.1)

ADD REPLYlink modified 10 months ago • written 10 months ago by shuksi198440

Samtools is currently in v. 1.8. I suggest that you upgrade and see if that helps.

ADD REPLYlink written 10 months ago by genomax63k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1937 users visited in the last hour