Question

Htseq-count reads with missing mate encountered

0

Entering edit mode

5 months ago

Bjorn • 0

Hello. I ran this HTseq command

htseq-count -r name -t gene -i gene -s yes -f bam /Volumes/cachannel/ZebraFinchBrain/CB-4a_genomemapping/sorted_alignmentcb4a.bam /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/ncbi_dataset/data/GCF_003957565.2/genomic.gff > /Volumes/cachannel/ZebraFinchBrain/HTSEQ_withautomate/output_counts.txt

and got the error

Warning: 72583723 reads with missing mate encountered.
80015507 alignment record pairs processed.

Is there a setting I am missing to process this differently?

htseq htseq-count • 669 views

ADD COMMENT • link 5 months ago by Bjorn • 0

0

Entering edit mode

Did you name re-sort your BAM file (as discussed in other thread)? You are using -r name in this command line. The error above otherwise makes no sense.

Did you by chance trim/scan the paired-end reads that went into the alignment independently? If so your paired-end files were likely out of sync.

ADD REPLY • link 5 months ago by GenoMax 141k

0

Entering edit mode

Hey, I did try this changed command.

htseq-count -r pos -t gene -i gene -s yes -f bam /Volumes/cachannel/ZebraFinchBrain/CB-4a_genomemapping/sorted_alignmentcb4a.bam /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/ncbi_dataset/data/GCF_003957565.2/genomic.gff > /Volumes/cachannel/ZebraFinchBrain/HTSEQ_withautomate/output_counts.txt

And I still got this

ZZEF1   0
ZZZ3    0
__no_feature    35623054
__ambiguous 0
__too_low_aQual 2393534
__not_aligned   3061390
__alignment_not_unique  2190975

I believe that I scanned these together. I used this code for that process

TrimGalore-0.6.10/trim_galore --fastqc --paired --length 20 --clip_R2 15 --three_prime_clip_R1 15 -o /Users/margaretsaha/Desktop/Bjorn2023/ZebraFinchBrain/CB-3a_Fastqc_Report /Users/margaretsaha/Desktop/Bjorn2023/ZebraFinchBrain/CB-3a/CB-3a_R1_001.fastq.gz /Users/margaretsaha/Desktop/Bjorn2023/ZebraFinchBrain/CB-3a/CB-3a_R2_001.fastq.gz

Would this cause them to be out of sync somehow?

ADD REPLY • link updated 5 months ago by GenoMax 141k • written 5 months ago by Bjorn • 0

0

Entering edit mode

The reads should be in sync since you did trim them as paired files.

Have you looked at these alignments in a genome viewer? Do the alignments look reasonable i.e. reads piling up under exons. Is every gene in your file 0 counts? Are the identifiers for chromosomes matching in your reference and annotation file?

ADD REPLY • link 5 months ago by GenoMax 141k

0

Entering edit mode

Yes, every gene in my file has 0 counts.

I believe you have helped my identify the problem. I created a reference genome of .ht2 files from a .fna file that has the identifiers

>CM012081.2 Taeniopygia guttata isolate Black17 chromosome 1, whole genome shotgun sequence
>CM012082.2 Taeniopygia guttata isolate Black17 chromosome 1A, whole genome shotgun sequence
>CM012083.2 Taeniopygia guttata isolate Black17 chromosome 2, whole genome shotgun sequence
>CM012084.2 Taeniopygia guttata isolate Black17 chromosome 3, whole genome shotgun sequence

But the .gff file I have has NC_044211.2 as the identifiers.

NC_044211.2 Gnomon  gene    6695    12279   .   +   .   ID=gene-LOC121470777;Dbxref=GeneID:121470777;Name=LOC121470777;gbkey=Gene;gene=LOC121470777;gene_biotype=lncRNA
NC_044211.2 Gnomon  lnc_RNA 6695    12279   .   +   .   ID=rna-XR_005981978.1;Parent=gene-LOC121470777;Dbxref=GeneID:121470777,Genbank:XR_005981978.1;Name=XR_005981978.1;gbkey=ncRNA;gene=LOC121470777;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by

Is there a way to account for this? Or will I have to find a .fna file that has the same identifiers, make a new genome, rerun the alignment, then run the read counting?

ADD REPLY • link updated 5 months ago by GenoMax 141k • written 5 months ago by Bjorn • 0

1

Entering edit mode

This is not going to work with any program (featureCounts or htseq-count). Your chromosome ID's need to match in your reference and the annotation.

I see a good reference genome for the finch here: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_003957565.2/

Click on the Download button and get fasta sequence and the GTF file. Everything should match in this case. You will need to re-index the genome and re-do the alignments.

ADD REPLY • link 5 months ago by GenoMax 141k

0

Entering edit mode

Thank you. I will focus on making a new genome index. However, I am running into a problem with the hisat2-build funtion where it is not recognizing my A nucleotides?

This is my input hisat2-build -p 16 /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/ncbi_dataset/data/GCF_003957565.2/GCF_003957565.2_bTaeGut1.4.pri_genomic.fna /Volumes/cachannel/ZebraFinchBrain/reftwo/referencegenome2

This is my output after a while of building

Exited GFM loop fchr[A]: 0 fchr[C]: 306050462 fchr[G]: 526435244 fchr[T]: 746725372 fchr[$]: 1052636474 Exiting GFM::buildToDisk() Returning from initFromVector Wrote 355107337 bytes to primary GFM file: /Volumes/cachannel/ZebraFinchBrain/reftwo/referencegenome2.1.ht2 Wrote 263159124 bytes to secondary GFM file: /Volumes/cachannel/ZebraFinchBrain/reftwo/referencegenome2.2.ht2 Re-opening _in1 and _in2 as input streams Returning from GFM constructor Returning from initFromVector Wrote 462917771 bytes to primary GFM file: /Volumes/cachannel/ZebraFinchBrain/reftwo/referencegenome2.5.ht2 Wrote 267955020 bytes to secondary GFM file: /Volumes/cachannel/ZebraFinchBrain/reftwo/referencegenome2.6.ht2 Re-opening _in5 and _in5 as input streams Returning from HGFM constructor Headers: len: 1052636474 gbwtLen: 1052636475 nodes: 1052636475 sz: 263159119 gbwtSz: 263159119 lineRate: 6 offRate: 4 offMask: 0xfffffff0 ftabChars: 10 eftabLen: 0 eftabSz: 0 ftabLen: 1048577 ftabSz: 4194308 offsLen: 65789780 offsSz: 263159120 lineSz: 64 sideSz: 64 sideGbwtSz: 48 sideGbwtLen: 192 numSides: 5482482 numLines: 5482482 gbwtTotLen: 350878848 gbwtTotSz: 350878848 reverse: 0 linearFM: Yes

ADD REPLY • link 5 months ago by Bjorn • 0

0

Entering edit mode

This did eliminate the missing mate error though.

ADD REPLY • link 5 months ago by Bjorn • 0

0

Entering edit mode

Do yourself a favor and use featureCounts, it is notably faster, more flexible and can sort reads automatically if paired-end is set and sort order is position.