Htseq-count reads with missing mate encountered
0
0
Entering edit mode
3 months ago
Bjorn • 0

Hello. I ran this HTseq command

htseq-count -r name -t gene -i gene -s yes -f bam /Volumes/cachannel/ZebraFinchBrain/CB-4a_genomemapping/sorted_alignmentcb4a.bam /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/ncbi_dataset/data/GCF_003957565.2/genomic.gff > /Volumes/cachannel/ZebraFinchBrain/HTSEQ_withautomate/output_counts.txt

and got the error

Warning: 72583723 reads with missing mate encountered.
80015507 alignment record pairs processed.

Is there a setting I am missing to process this differently?

htseq htseq-count • 507 views
ADD COMMENT
0
Entering edit mode

Did you name re-sort your BAM file (as discussed in other thread)? You are using -r name in this command line. The error above otherwise makes no sense.

Did you by chance trim/scan the paired-end reads that went into the alignment independently? If so your paired-end files were likely out of sync.

ADD REPLY
0
Entering edit mode

Hey, I did try this changed command.

htseq-count -r pos -t gene -i gene -s yes -f bam /Volumes/cachannel/ZebraFinchBrain/CB-4a_genomemapping/sorted_alignmentcb4a.bam /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/ncbi_dataset/data/GCF_003957565.2/genomic.gff > /Volumes/cachannel/ZebraFinchBrain/HTSEQ_withautomate/output_counts.txt

And I still got this

ZZEF1   0
ZZZ3    0
__no_feature    35623054
__ambiguous 0
__too_low_aQual 2393534
__not_aligned   3061390
__alignment_not_unique  2190975

I believe that I scanned these together. I used this code for that process

TrimGalore-0.6.10/trim_galore --fastqc --paired --length 20 --clip_R2 15 --three_prime_clip_R1 15 -o /Users/margaretsaha/Desktop/Bjorn2023/ZebraFinchBrain/CB-3a_Fastqc_Report /Users/margaretsaha/Desktop/Bjorn2023/ZebraFinchBrain/CB-3a/CB-3a_R1_001.fastq.gz /Users/margaretsaha/Desktop/Bjorn2023/ZebraFinchBrain/CB-3a/CB-3a_R2_001.fastq.gz

Would this cause them to be out of sync somehow?

ADD REPLY
0
Entering edit mode

The reads should be in sync since you did trim them as paired files.

Have you looked at these alignments in a genome viewer? Do the alignments look reasonable i.e. reads piling up under exons. Is every gene in your file 0 counts? Are the identifiers for chromosomes matching in your reference and annotation file?

ADD REPLY
0
Entering edit mode

Yes, every gene in my file has 0 counts.

I believe you have helped my identify the problem. I created a reference genome of .ht2 files from a .fna file that has the identifiers

>CM012081.2 Taeniopygia guttata isolate Black17 chromosome 1, whole genome shotgun sequence
>CM012082.2 Taeniopygia guttata isolate Black17 chromosome 1A, whole genome shotgun sequence
>CM012083.2 Taeniopygia guttata isolate Black17 chromosome 2, whole genome shotgun sequence
>CM012084.2 Taeniopygia guttata isolate Black17 chromosome 3, whole genome shotgun sequence

But the .gff file I have has NC_044211.2 as the identifiers.

NC_044211.2 Gnomon  gene    6695    12279   .   +   .   ID=gene-LOC121470777;Dbxref=GeneID:121470777;Name=LOC121470777;gbkey=Gene;gene=LOC121470777;gene_biotype=lncRNA
NC_044211.2 Gnomon  lnc_RNA 6695    12279   .   +   .   ID=rna-XR_005981978.1;Parent=gene-LOC121470777;Dbxref=GeneID:121470777,Genbank:XR_005981978.1;Name=XR_005981978.1;gbkey=ncRNA;gene=LOC121470777;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by 

Is there a way to account for this? Or will I have to find a .fna file that has the same identifiers, make a new genome, rerun the alignment, then run the read counting?

ADD REPLY
1
Entering edit mode

This is not going to work with any program (featureCounts or htseq-count). Your chromosome ID's need to match in your reference and the annotation.

I see a good reference genome for the finch here: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_003957565.2/

Click on the Download button and get fasta sequence and the GTF file. Everything should match in this case. You will need to re-index the genome and re-do the alignments.

ADD REPLY
0
Entering edit mode

Thank you. I will focus on making a new genome index. However, I am running into a problem with the hisat2-build funtion where it is not recognizing my A nucleotides?

This is my input hisat2-build -p 16 /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/ncbi_dataset/data/GCF_003957565.2/GCF_003957565.2_bTaeGut1.4.pri_genomic.fna /Volumes/cachannel/ZebraFinchBrain/reftwo/referencegenome2

This is my output after a while of building

Exited GFM loop fchr[A]: 0 fchr[C]: 306050462 fchr[G]: 526435244 fchr[T]: 746725372 fchr[$]: 1052636474 Exiting GFM::buildToDisk() Returning from initFromVector Wrote 355107337 bytes to primary GFM file: /Volumes/cachannel/ZebraFinchBrain/reftwo/referencegenome2.1.ht2 Wrote 263159124 bytes to secondary GFM file: /Volumes/cachannel/ZebraFinchBrain/reftwo/referencegenome2.2.ht2 Re-opening _in1 and _in2 as input streams Returning from GFM constructor Returning from initFromVector Wrote 462917771 bytes to primary GFM file: /Volumes/cachannel/ZebraFinchBrain/reftwo/referencegenome2.5.ht2 Wrote 267955020 bytes to secondary GFM file: /Volumes/cachannel/ZebraFinchBrain/reftwo/referencegenome2.6.ht2 Re-opening _in5 and _in5 as input streams Returning from HGFM constructor Headers: len: 1052636474 gbwtLen: 1052636475 nodes: 1052636475 sz: 263159119 gbwtSz: 263159119 lineRate: 6 offRate: 4 offMask: 0xfffffff0 ftabChars: 10 eftabLen: 0 eftabSz: 0 ftabLen: 1048577 ftabSz: 4194308 offsLen: 65789780 offsSz: 263159120 lineSz: 64 sideSz: 64 sideGbwtSz: 48 sideGbwtLen: 192 numSides: 5482482 numLines: 5482482 gbwtTotLen: 350878848 gbwtTotSz: 350878848 reverse: 0 linearFM: Yes

ADD REPLY
0
Entering edit mode

This did eliminate the missing mate error though.

ADD REPLY
0
Entering edit mode

Do yourself a favor and use featureCounts, it is notably faster, more flexible and can sort reads automatically if paired-end is set and sort order is position.

ADD REPLY
0
Entering edit mode

Thank you. I am looking into this now.

ADD REPLY

Login before adding your answer.

Traffic: 1052 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6