Entering edit mode
2.4 years ago
mujiangxielu
▴
10
.sam
file was generated by following code
samtools sort -n Untreated-3/accepted_hits.bam > Untreated-3_sn.bam
samtools view -o Untreated-3_sn.sam Untreated-3_sn.bam
samtools sort Untreated-3/accepted_hits.bam > Untreated-3_s.bam
samtools index Untreated-3_s.bam
.gtf
file was downloaded by:
wget ftp://ftp.ensembl.org/pub/release-70/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP5.70.gtf.gz
gunzip Drosophila_melanogaster.BDGP5.70.gtf.gz
when I use htseq-count:
htseq-count -s no -a 10 Untreated-3_sn.sam Drosophila_melanogaster.BDGP5.70.gtf > Untreated-3.count
an error occured:
file has no sequences defined (mode='r') - is it SAM/BAM format? Consider opening with check_sq=False
[Exception type: ValueError, raised in libcalignmentfile.pyx:990]
I can use samtools view see the sorted .bam
file, but can't the .sam
file generated by .bam
file, error occured that
[E::sam_parse1] missing SAM header
[W::sam_read1] Parse error at line 1
[main_samview] truncated file.
I find I can use the .bam file to get the count file:
htseq-count -s no -a 10 Untreated-3_sn.bam Drosophila_melanogaster.BDGP5.70.gtf > Untreated-3.count
It works like this:
100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
358027 GFF lines processed.
100000 alignment record pairs processed.
200000 alignment record pairs processed.
……
9700000 alignment record pairs processed.
9800000 alignment record pairs processed.
9900000 alignment record pairs processed.
10000000 alignment record pairs processed.
10028400 alignment pairs processed.
I'm totally new to bioinfo, please tell me is it OK to get count
file by .bam
file rather than .sam
file? thanks
It is unclear what you are doing by flipping back and forth between SAM and BAM files at the beginning. That is not needed. Counting programs need co-ordinate sorted files. You are not including the header in your SAM file which is why it is not working. Show us output of
head -5 your.sam
. BAM files are simply binary representations of SAM files and are identical in all respects except format.Thank you very much!
my .sam head 5 are:
I flipping back and forth between SAM and BAM because the article I want to reproduce did so, and the article used the generated sam file for HTSeq count…
You are missing headers in your SAM file so that is why you are unable to count. Using
samtools view -h -o Untreated-3_sn.sam Untreated-3_sn.bam
should fix that.Thank you very much!
After aligning the reads with 'hisat2' and generating aligned .bam files as output, I sort them by coordinates (not by name) and then they are ready to be used as input for
featureCounts
. I think all those steps back and forth between .sam and .bam files are unnecessary. I also don't think you necessarily need to index the files before annotation.Moreover, I prefer to perform annotation using
featureCounts
. In case you are going to perform DE analysis afterwards using DESeq2, it implements a function that imports directly the count matrix generated with featureCounts.Thank you very much!