Entering edit mode
6.0 years ago
shuksi1984
▴
60
My pipeline includes following steps:
STEP-1: Alignmnet with HISAT2
path/to/hisat2 -f -x /path/to/in-built/genome -1 /path/to/SRR925687_1.fa -2 /path/to/SRR925687_2.fa -S /path/to/RNA.sam
STEP-2 SAM-->Conversion
samtools view -S -b /path/to/RNA.sam > /path/to/RNA.bam
STEP-3 BAM sorting
sudo java -jar /path/to/picard.jar SortSam INPUT=RNA.bam OUTPUT=RNA.sorted.bam SORT_ORDER=coordinate
SREP-4 Assemble transcripts with StringTie
/path/to/stringtie RNA.sorted.bam -A RNA.gene.abudance.tab -C RNA.cov.refs.gtf -G Homo_sapiens.GRCh38.86.gtf -B -e -o RNA.gtf -p 4
STEP-5 Prepare for DESeq2
cd /path/to/RNA #(where ballgown subdirectory is created)
python ./prepDE.py
Error: sub-directory 'ballgown' not found!
I created a subdirectory with the name "ballgown" and placed *.ctab files and GTF files, then executed the above the command, I got following error:
Error: no GTF files found under ./ballgown !
I believe error might be due to BAM sorting by sortsam. But, I didnt get any error message in rest of the steps.
To me those errors do not suggest something is wrong with your bam file sorting. What is the output of
ls /path/to/RNA(where ballgown subdirectory is created)
?Following files I moved in ballgown subdirectory
I also moved Homo_sapiens.GRCh38.86.gtf, when my error was not resolved
I looked at the code of
prepDE.py
and this suggests that it can indeed not find the gtf files in that directory (perhaps the name is not as expected) and does not have a link with sorting bam files.Can the code of prepDE.py not recognize GTF file with .gtf extension?
Shall I perform the sorting step with SAMTOOLS?
The code is looking for a
*.gtf
file, but I'm not sure if it requires other naming constraints, this is the line searching for gtf files:If that makes you happy, go for it.
I used following command:
It got stuck.
Stuck in what way? Depending on the size of the file sorting can take a while.
It is running from the past 24hrs. File size is 3.5G RNA.bam
At that size sorting should not take 24h. How much memory do you have? Are you able to see if the
samtools
process is doing anything? Are there*tmp*
files?Numerous RNA.sorted.bam.0000.bam files are generated, which disapper after sometime. Then after nothing is generated except multiples lines of "c62;c62;c62;c62;c62;c62;c62;" in terminal.
My machine has: RAM-16G HDD-1TB
I believe, ballgown is used to calculate differential expression of multiple samples, while I have taken single sample. So, i need multiple samples for ballgown to work. Maybe this can help.
Let me know if anybody can find any solution
Based on your description of hardware and file size this should be a trivial conversion taking less than 20 min.
What OS are you using? Did you compile samtools yourself? What version of samtools are you using?
OS description:
Distributor ID: Ubuntu
Description:Ubuntu 15.04
Release:15.04
Codename:vivid
Yes, I compiled samtools myself. SAMTOOLS: Version: 1.2 (using htslib 1.2.1)
Samtools is currently in
v. 1.8
. I suggest that you upgrade and see if that helps.