Question: Can I use PICARD (SortSam) instead of SAMTOOLS (sort) for sorting bam files in RNAseq pipeline where HISAT2 is used for alignment.
1
gravatar for shuksi1984
5 months ago by
shuksi198430
shuksi198430 wrote:

My pipeline includes following steps:

STEP-1: Alignmnet with HISAT2

path/to/hisat2 -f -x /path/to/in-built/genome -1 /path/to/SRR925687_1.fa -2 /path/to/SRR925687_2.fa -S /path/to/RNA.sam

STEP-2 SAM-->Conversion

samtools view -S -b  /path/to/RNA.sam > /path/to/RNA.bam

STEP-3 BAM sorting

sudo java -jar /path/to/picard.jar SortSam INPUT=RNA.bam OUTPUT=RNA.sorted.bam SORT_ORDER=coordinate

SREP-4 Assemble transcripts with StringTie

/path/to/stringtie RNA.sorted.bam -A RNA.gene.abudance.tab -C RNA.cov.refs.gtf -G Homo_sapiens.GRCh38.86.gtf -B -e -o RNA.gtf -p 4

STEP-5 Prepare for DESeq2

cd /path/to/RNA #(where ballgown subdirectory is created)
python ./prepDE.py

Error: sub-directory 'ballgown' not found!

I created a subdirectory with the name "ballgown" and placed *.ctab files and GTF files, then executed the above the command, I got following error:

Error: no GTF files found under ./ballgown !

I believe error might be due to BAM sorting by sortsam. But, I didnt get any error message in rest of the steps.

ADD COMMENTlink modified 5 months ago • written 5 months ago by shuksi198430

To me those errors do not suggest something is wrong with your bam file sorting. What is the output of

ls /path/to/RNA(where ballgown subdirectory is created)?

ADD REPLYlink written 5 months ago by WouterDeCoster32k

Following files I moved in ballgown subdirectory

e2t.ctab
e_data.ctab
i2t.ctab
i_data.ctab
t_data.ctab
RNA.cov.refs.gtf
RNA.gtf
RNA.gene.abudance.tab

I also moved Homo_sapiens.GRCh38.86.gtf, when my error was not resolved

ADD REPLYlink modified 5 months ago by RamRS18k • written 5 months ago by shuksi198430

I looked at the code of prepDE.py and this suggests that it can indeed not find the gtf files in that directory (perhaps the name is not as expected) and does not have a link with sorting bam files.

ADD REPLYlink modified 5 months ago • written 5 months ago by WouterDeCoster32k

Can the code of prepDE.py not recognize GTF file with .gtf extension?

Shall I perform the sorting step with SAMTOOLS?

ADD REPLYlink written 5 months ago by shuksi198430

Can the code of prepDE.py not recognize GTF file with .gtf extension?

The code is looking for a *.gtf file, but I'm not sure if it requires other naming constraints, this is the line searching for gtf files:

samples = [(i,glob.iglob(os.path.join(opts.input,i,"*.gtf")).next()) for i in next(os.walk(opts.input))[1] if re.search(opts.pattern,i)]

Shall I perform the sorting step with SAMTOOLS?

If that makes you happy, go for it.

ADD REPLYlink modified 5 months ago • written 5 months ago by WouterDeCoster32k

I used following command:

samtools sort RNA.bam -o RNA.sorted.bam

It got stuck.

ADD REPLYlink modified 5 months ago • written 5 months ago by shuksi198430

Stuck in what way? Depending on the size of the file sorting can take a while.

ADD REPLYlink written 5 months ago by genomax57k

It is running from the past 24hrs. File size is 3.5G RNA.bam

ADD REPLYlink written 5 months ago by shuksi198430

At that size sorting should not take 24h. How much memory do you have? Are you able to see if the samtools process is doing anything? Are there *tmp* files?

ADD REPLYlink written 5 months ago by genomax57k

Numerous RNA.sorted.bam.0000.bam files are generated, which disapper after sometime. Then after nothing is generated except multiples lines of "c62;c62;c62;c62;c62;c62;c62;" in terminal.

My machine has: RAM-16G HDD-1TB

ADD REPLYlink modified 5 months ago • written 5 months ago by shuksi198430

I believe, ballgown is used to calculate differential expression of multiple samples, while I have taken single sample. So, i need multiple samples for ballgown to work. Maybe this can help.

ADD REPLYlink modified 4 months ago • written 4 months ago by shuksi198430

Let me know if anybody can find any solution

ADD REPLYlink modified 5 months ago • written 5 months ago by shuksi198430

Based on your description of hardware and file size this should be a trivial conversion taking less than 20 min.

What OS are you using? Did you compile samtools yourself? What version of samtools are you using?

ADD REPLYlink written 5 months ago by genomax57k

OS description:

Distributor ID: Ubuntu

Description:Ubuntu 15.04

Release:15.04

Codename:vivid

Yes, I compiled samtools myself. SAMTOOLS: Version: 1.2 (using htslib 1.2.1)

ADD REPLYlink modified 5 months ago • written 5 months ago by shuksi198430

Samtools is currently in v. 1.8. I suggest that you upgrade and see if that helps.

ADD REPLYlink written 5 months ago by genomax57k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1835 users visited in the last hour