running dexseq_count.py on the command line
1
1
Entering edit mode
6.2 years ago
sneha108ss ▴ 30

Hi,

I am trying to run this code

python dexseq_count.py -p yes -f bam -r pos Daphnia_annotation_result.gff Control.sorted.bam control.count.txt

and I get the following error:

ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False.

Can you please tell me how to open the file with the check_sq=False option

Thanks

rna-seq • 2.7k views
ADD COMMENT
0
Entering edit mode

I'd first suggest you make sure that your bam file is a valid bam file. What does samtools flagstat Control.sorted.bam return?

ADD REPLY
0
Entering edit mode

Hi,

I ran samtools flagstat and it returned the following:

0+0 in total (QC-passed reads + QC-failed reads)
0+0 secondary
0+0 supplementary
0+0 duplicates
0+0 mapped (N/A : N/A)
0+0 paired in sequencing
0+0 read1
0+0 read2
0+0 properly paired (N/A : N/A)
0+0 with itself and mate mapped
0+0 singletons (N/A : N/A)
0+0 with mate mapped to a different chr
0+0 with mate mapped to a different chr (mapQ>=5)

I don't understand why everything is 0. The bam files are not empty.

ADD REPLY
0
Entering edit mode

I don't understand why everything is 0. The bam files are not empty.

Why do you think that?

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode

Weird... what is the output of samtools view Control.sorted.bam | head ?

ADD REPLY
0
Entering edit mode
Thank you.

I was able to run dexseq_count.py using the gff file generated with dexseq_prepare_annotation.py. This is the code I used:
samtools view control.sorted.bam | python dexseq_count.py -p yes -s yes -r pos control_annotation_result.gff - control.count.txt 

But when I open the control.count.txt file it looks like this:

DAPPUDRAFT_1000009:001     0
DAPPUDRAFT_1000009:002     0
DAPPUDRAFT_1000009:003     0
DAPPUDRAFT_1000009:004     0
DAPPUDRAFT_1000009:005     0
DAPPUDRAFT_1000036:001     0

Why are all the exons 0?

This is the first few lines of the BAM file I used:

@HD VN:1.4 SO:coordinate
@SQ SN:DAPPUscaffold_15279 LN:1000
@SQ SN:DAPPUscaffold_15319 LN:1000
@SQ SN:DAPPUscaffold_15290 LN:1000
@SQ SN:DAPPUscaffold_15301 LN:1000
@SQ SN:DAPPUscaffold_15291 LN:1000
@SQ SN:DAPPUscaffold_15334 LN:1000
@SQ SN:DAPPUscaffold_15309 LN:1000
@SQ SN:DAPPUscaffold_15292 LN:1000
@SQ SN:DAPPUscaffold_15297 LN:1000
@SQ SN:DAPPUscaffold_15311 LN:1000
@SQ SN:DAPPUscaffold_6661 LN:1001
@SQ SN:DAPPUscaffold_15270 LN:1001
@SQ SN:DAPPUscaffold_15254 LN:1001
@SQ SN:DAPPUscaffold_15261 LN:1001
@SQ SN:DAPPUscaffold_15230 LN:1002
@SQ SN:DAPPUscaffold_15263 LN:1002
@SQ SN:DAPPUscaffold_15272 LN:1002
@SQ SN:DAPPUscaffold_15246 LN:1002

I really appreciate your help.

ADD REPLY
0
Entering edit mode

How did you align the FASTQ file?

The BAM data should look something like:

M00964:33:000000000-G1DPL:1:1104:18866:6025 2209    1   7909411 0   115H31M 17  60681428    0   ATTACAGGTGCCACCACGCCCAGCTAATTTT GEHHHFFGGHHHHHGHHFGGGAFHFHEHHHH SA:Z:17,60681428,+,3S110M33S,60,1;  XA:Z:5,-166217864,31M115S,0;1,-44700182,31M115S,0;12,-50320895,31M115S,0;   MD:Z:31 PG:Z:MarkDuplicates NM:i:0  AS:i:31 XS:i:31
M00964:33:000000000-G1DPL:1:2101:14805:2639 2177    1   7909411 0   115H31M =   143924024   136014614   ATTACAGGTGCCACCACGCCCAGCTAATTTT F2FHHBGF1GCHGFFEHGGFCCGFFHFHBHH SA:Z:1,143924024,-,28S118M,8,2; XA:Z:5,-166217864,31M115S,0;1,-44700182,31M115S,0;12,-50320895,31M115S,0;   MD:Z:31 PG:Z:MarkDuplicates NM:i:0  AS:i:31 XS:i:31
M00964:33:000000000-G1DPL:1:2101:14805:2639 2161    1   7909411 0   115H31M =   143924024   136014701   ATTACAGGTGCCACCACGCCCAGCTAATTTT HFGHHHGGHAEEFEEECC?FFFFFFFBBAA3 SA:Z:1,143924024,+,28S118M,8,2; MD:Z:31 PG:Z:MarkDuplicates NM:i:0  AS:i:31 XS:i:31
M00964:33:000000000-G1DPL:1:2104:11459:19082    97  1   16642512    4   111M    16  56036244    0   AAAATTAGCTGGGCGTGGTGGTGCATGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATCACTTGAACCTGGGAGGCAGAGTTGCAGTGAGTCGAGATCATGC >AA?AFDFBFFFECEEAFCF2EEGCGFBF53BFHD5BG332BFFAGEE2F0EF1AGF1A1A0A??GGBFGHH3D3GCG11111E1111?B4@B@G3?3BGEEGGHE33F?F MD:Z:14A6C0A76C11   PG:Z:MarkDuplicates NM:i:4  AS:i:91 XS:i:86
M00964:33:000000000-G1DPL:1:2103:7896:25082 161 1   25804455    60  6S145M  =   25804545    240 AAAATTAGCTGGGCGTGGTGGCTCAGGCCTGTAATCCCAGCACTTTGGGAGGCCAAGGCAGGCAGATCACTTGAGTTCAGGAGTTCAAGACCACCTGGTCAACATGGTAAAACCCCATCTCTACAAAAAATACAAACATTAGCCAGGCATG AAAA1FFB1FFB11E1A000?GHGHCFCEEABGGHHHCHHHH01ABEGEGGGGGEHHFEG?CGHGC@GFHHEFDFEEGHHHHGHHFHGHFHFFGFH0G@12@BF0FBGHHHGFE>?EHEHGEFFD<GHHGFDHHFB/FD1@FCGBGEHHHH MD:Z:145    PG:Z:MarkDuplicates NM:i:0  AS:i:145    XS:i:83
ADD REPLY
0
Entering edit mode

The alignment was done using STAR

The alignment section of the BAM file does look like what you have sent.

HWI-ST913:353:C83BLACXX:8:2310:18405:24062  403 DAPPUscaffold_15319 706 0   98M =   613 -191    TAAAGTGGGCCCATTTCTCCGTCAAAGTGGGCCTATTTCTCCCAATTATTTCTAAGTTCCATTAGGTACTCTGAAATATATTCAATAAAAAATTCGAC  @CC?<<;29(3;;;A;99'<@77EEAEIE@.))@FFCAF=(<GFFB?BIIGHEFB@FA>GHGDHF:?:E9<D<IHHAC?F?>AHF?FHFHDDBDD@@@  NH:i:7 HI:i:6 AS:i:146 nM:i:8 RG:Z:ConAc_A2-16S
HWI-ST913:353:C83BLACXX:8:1102:6725:11593   419 DAPPUscaffold_15334 2   1   100M    =   31  419 CACGTAGAACGCACCTGGTCTCGTCAGCTCCCAGAAGTTAAGTTACGTCGAGTCCCGTTAGTACTTGGAAGGGTGACCGCTTGGGAATACGGGATGTCGT    <@BFDEFFHHHHGJJJJJIIJIJHGJJIJJJJJJJJJFHHHIGGIIIIJIHGGHJJG=AEECEFFFFEDECDDBBDDDBDDDD@BACDDDDDBDDDDDD?    NH:i:3 HI:i:3 AS:i:192 nM:i:4 RG:Z:ConAc_A2-16S
HWI-ST913:353:C83BLACXX:8:1109:18934:44901  419 DAPPUscaffold_15334 2   3   90M =   28  416 CACGTAGAACGCACCTGGTCTCGTCAGCTCCCAGAAGTTAAGTTACGTCGAGTCCCGTTAGTACTTGGAAGGGTGACCGCTTGGGAATAC  ?@<DADADFFFHHIIIEGFFCHD?EDEGFGHIEHHIG9??FF/BF??/=:;FFHIAGC=A57?B77;6>C>CB;;?CD?=;=(8ABC8AA  NH:i:2 HI:i:2 AS:i:182 nM:i:4 RG:Z:ConAc_A2-16S
HWI-ST913:353:C83BLACXX:8:1201:6254:18108   419 DAPPUscaffold_15334 2   3   99M =   31  419 CACGTAGAACGCACCTGGTCTCGTCAGCTCCCAGAAGTTAAGTTACGTCGAGTCCCGTTAGTACTTGGAAGGGTGACCGCTTGGGAATACGGGATGTCG ?@@DBDBDGFHHHJJJJFGHIGGGGIJIGGGIIIGIJIIAFGBFGHGDEGHHCHIJI=A;B?CCCB?@3@@AB;==;@=;BB<?A19<>CB?B50?@CA NH:i:2 HI:i:2 AS:i:191 nM:i:4 RG:Z:ConAc_A2-16S
HWI-ST913:353:C83BLACXX:8:1205:14228:50064  419 DAPPUscaffold_15334 2   3   99M =   28  416 CACGTAGAACGCACCTGGTCTCGTCAGCTCCCAGAAGTTAAGTTACGTCGAGTCCCGTTAGTACTTGGAAGGGTGACCGCTTGGGAATACGGGATGTCG @CCFFFFFHHHHGJJJJIIIJJJJJJJJJJJJJJJJJGIGIIDHGIJIJIAHDHIJJEHHFDDFFFFEEEECD=@BDDBDBDDDDDDDDDDBDDDDDDD NH:i:2 HI:i:2 AS:i:191 nM:i:4 RG:Z:ConAc_A2-16S
HWI-ST913:353:C83BLACXX:8:1302:3663:52877   419 DAPPUscaffold_15334 2   3   99M =   28  416 CACGTAGAACGCACCTGGTCTCGTCAGCTCCCAGAAGTTAAGTTACGTCGAGTCCCGTTAGTACTTGGAAGGGTGACCGCTTGGGAATACGGGATGTCG CCCFFFFFHHHHHJJJJJEHIJJJJJJJJJJIJJJIJHJJJJIIIJIIJJGHIJJJJHHHFBEFFFFEEEEDDABDDDDDDDDDDDDBDDDDDDBDDDD NH:i:2 HI:i:2 AS:i:191 nM:i:4 RG:Z:ConAc_A2-16S
ADD REPLY
0
Entering edit mode
Sorry for the crowded text, This is how the alignment section of the BAM file looks like.


HWI-ST913:353:C83BLACXX:8:2310:18405:24062  403 DAPPUscaffold_15319 706 0   98M =   613 -191    TAAAGTGGGCCCATTTCTCCGTCAAAGTGGGCCTATTTCTCCCAATTATTTCTAAGTTCCATTAGGTACTCTGAAATATATTCAATAAAAAATTCGAC  @CC?<<;29(3;;;A;99'<@77EEAEIE@.))@FFCAF=(<GFFB?BIIGHEFB@FA>GHGDHF:?:E9<D<IHHAC?F?>AHF?FHFHDDBDD@@@  NH:i:7 HI:i:6 AS:i:146 nM:i:8 RG:Z:ConAc_A2-16S
HWI-ST913:353:C83BLACXX:8:1102:6725:11593   419 DAPPUscaffold_15334 2   1   100M    =   31  419 CACGTAGAACGCACCTGGTCTCGTCAGCTCCCAGAAGTTAAGTTACGTCGAGTCCCGTTAGTACTTGGAAGGGTGACCGCTTGGGAATACGGGATGTCGT    <@BFDEFFHHHHGJJJJJIIJIJHGJJIJJJJJJJJJFHHHIGGIIIIJIHGGHJJG=AEECEFFFFEDECDDBBDDDBDDDD@BACDDDDDBDDDDDD?    NH:i:3 HI:i:3 AS:i:192 nM:i:4 RG:Z:ConAc_A2-16S
HWI-ST913:353:C83BLACXX:8:1109:18934:44901  419 DAPPUscaffold_15334 2   3   90M =   28  416 CACGTAGAACGCACCTGGTCTCGTCAGCTCCCAGAAGTTAAGTTACGTCGAGTCCCGTTAGTACTTGGAAGGGTGACCGCTTGGGAATAC  ?@<DADADFFFHHIIIEGFFCHD?EDEGFGHIEHHIG9??FF/BF??/=:;FFHIAGC=A57?B77;6>C>CB;;?CD?=;=(8ABC8AA  NH:i:2 HI:i:2 AS:i:182 nM:i:4 RG:Z:ConAc_A2-16S
HWI-ST913:353:C83BLACXX:8:1201:6254:18108   419 DAPPUscaffold_15334 2   3   99M =   31  419 CACGTAGAACGCACCTGGTCTCGTCAGCTCCCAGAAGTTAAGTTACGTCGAGTCCCGTTAGTACTTGGAAGGGTGACCGCTTGGGAATACGGGATGTCG ?@@DBDBDGFHHHJJJJFGHIGGGGIJIGGGIIIGIJIIAFGBFGHGDEGHHCHIJI=A;B?CCCB?@3@@AB;==;@=;BB<?A19<>CB?B50?@CA NH:i:2 HI:i:2 AS:i:191 nM:i:4 RG:Z:ConAc_A2-16S
HWI-ST913:353:C83BLACXX:8:1205:14228:50064  419 DAPPUscaffold_15334 2   3   99M =   28  416 CACGTAGAACGCACCTGGTCTCGTCAGCTCCCAGAAGTTAAGTTACGTCGAGTCCCGTTAGTACTTGGAAGGGTGACCGCTTGGGAATACGGGATGTCG @CCFFFFFHHHHGJJJJIIIJJJJJJJJJJJJJJJJJGIGIIDHGIJIJIAHDHIJJEHHFDDFFFFEEEECD=@BDDBDBDDDDDDDDDDBDDDDDDD NH:i:2 HI:i:2 AS:i:191 nM:i:4 RG:Z:ConAc_A2-16S
HWI-ST913:353:C83BLACXX:8:1302:3663:52877   419 DAPPUscaffold_15334 2   3   99M =   28  416 CACGTAGAACGCACCTGGTCTCGTCAGCTCCCAGAAGTTAAGTTACGTCGAGTCCCGTTAGTACTTGGAAGGGTGACCGCTTGGGAATACGGGATGTCG CCCFFFFFHHHHHJJJJJEHIJJJJJJJJJJIJJJIJHJJJJIIIJIIJJGHIJJJJHHHFBEFFFFEEEEDDABDDDDDDDDDDDDBDDDDDDBDDDD NH:i:2 HI:i:2 AS:i:191 nM:i:4 RG:Z:ConAc_A2-16S
HWI-ST913:353:C83BLACXX:8:2306:12220:82131  419 DAPPUscaffold_15334 2   1   99M =   31  419 CACGTAGAACGCACCTGGTCTCGTCAGCTCCCAGAAGTTAAGTTACGTCGAGTCCCGTTAGTACTTGGAAGGGTGACCGCTTGGAAATACGGGATGTCG CCCFFFFFHHHHHJJJJJJJJJJJJFGGGIJIJGGIIBGIIJIIIIJHIJFHIIIJIEHHFFFFFFFECCEDD=BDDDDDDDDDDDCDDDDDDDDDDDD NH:i:3 HI:i:2 AS:i:187 nM:i:6 RG:Z:ConAc_A2-16S
HWI-ST913:353:C83BLACXX:8:1113:12305:79077  419 DAPPUscaffold_15334 4   1   98M2S   =   85  471 CGTAGAACGCACCTGGTCTCGTCAGTTCCCAGAAGTTAAGTTACGTCGAGTCCCGTTAGTACTTGGAAGGGTGACCGCTTGGGAATACGGGATGTCGTCA    BCBFFFEFHHFHHJJIJJJIJIJJIHJJIJIJJJIGHIIJGGIIIIIIIJIIJJJIIIHFHHHHFFEFFBD6;=>@>B??B<AA9<A@ABB>9?CAA?@B    NH:i:3 HI:i:2 AS:i:190 nM:i:4 RG:Z:ConAc_A2-16S
ADD REPLY
0
Entering edit mode

I see, Daphnia spp.? Are the contig / chromosome names in the BAM exactly the same as in the GFF?

It's very late here in Europe. Wouter already went to sleep, as am I... catch you in the morning.

ADD REPLY
0
Entering edit mode

As indicated by @Kevin you likely have a chromosome name mismatch in your GTF file and the genome you used for the alignments. Do you see things like DAPPUscaffold_15334 in your GTF file? If you don't then that explains why you are getting 0 counts. Those two things need to match.

ADD REPLY
0
Entering edit mode

Thank you so much for your help. The BAM files looked fine on my computer but when I uploaded it onto the cluster something went wrong. Anyways I tried uploading it again and now samtools flagstat control.sorted.bam gives the following output:

36866957 + 0 in total (QC-passed reads + QC-failed reads)
10035691 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
33998723 + 0 mapped (92.22% : N/A)
26831266 + 0 paired in sequencing
13415633 + 0 read1
13415633 + 0 read2
23958640 + 0 properly paired (89.29% : N/A)
23958640 + 0 with itself and mate mapped
4392 + 0 singletons (0.02% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

But when I run samtools view control.sorted.bam | python dexseq_count.py -p yes -s yes -r pos Daphnia_annotation_result.gff - control.txt I get the following error:

Traceback (most recent call last):
  File "dexseq_count.py", line 239, in <module>
    for a in reader( sam_file ):
  File "/home/ss75l/.local/lib/python2.7/site-packages/HTSeq/__init__.py", line 551, in __iter__
    algnt = SAM_Alignment.from_SAM_line( line )
  File "python2/src/HTSeq/_HTSeq.pyx", line 1454, in HTSeq._HTSeq.SAM_Alignment.from_SAM_line
ValueError: ("Malformed SAM line: MRNM == '*' although flag bit &0x0008 cleared", 'line 33999408')
ADD REPLY

Login before adding your answer.

Traffic: 2694 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6