Entering edit mode
                    10.1 years ago
        Kristin Muench
        
    
        ▴
    
    640
    Hello,
I'm trying to extract raw count data from a .bam file (output by Tophat2) for use in DESeq2. I'm trying to use HTSeq. Although I can get HTSeq to run, nothing aligns uniquely. The reads themselves are 100 bp long, so I'm surprised that's the case.
I input something like:
htseq-count -f bam -r pos /path/on/cluster/mySampleBam/accepted_hits.bam /path/on/cluster/homoSapGRCh38.gtf
And the output file looks like:
...
ENSG00000282813    0
ENSG00000282814    0
ENSG00000282815    0
ENSG00000282816    0
ENSG00000282817    0
ENSG00000282818    0
ENSG00000282819    0
ENSG00000282820    0
ENSG00000282821    0
ENSG00000282822    0
__no_feature    16996078
__ambiguous    0
__too_low_aQual    0
__not_aligned    0
__alignment_not_unique    2579821
Has anyone had this issue? What might be going on?
I previously sorted the .bam file I'm trying to generate counts for with the command:
samtools sort accepted_hits.bam accepted_hits_sorted
Just use the
-ooption and have a look at some of the reads that should be counted. I hope you looked at these in IGV or similar already.I haven't - what's IGV? I'm new to this, so any suggestions on how to sanity check as I build my pipeline would be appreciated.
This is IGV, it's one of a number of viewers for BAM (and other) files. It's always good to have a brief visual look at things when you get weird results. You can then find some reads that should be counted (but obviously aren't), use the -o option and then grep for the read names and see why they weren't counted.
What's the output of:
A very long list of numbers that looks like:
EDIT: It has this many lines: 1566241
sorry. This command:
Thanks! Output is 4 lines:
Output?
For the .gtf file:
For the .bam file:
I think Goutham Atla wanted to know if the chromosome-naming in your bam file and the GTF are the same: In your sam-file, you have Chromosome One written as chr1. Now you have to look if it's matching with the gtf:
and what is the output of: