Question: Why Warning: reads with missing mate encountered is occured in HTSeq?
0
gravatar for anc.informatics
18 months ago by
anc.informatics0 wrote:

I was following DESeq2 manual to process my simple RNASeq paired end data that involves wil type and stress treated plant.

I ran feature count using HTSeq (version 0.9.1) with the command,

htseq-count -a 10 -s 'no' WT-CON.sam /home/exp/DESEQ2/genes.gtf > WT-DESeq.txt

I noticed a Warning: 53525476 reads with missing mate encountered.

100000 GFF lines processed.    
.
604523 GFF lines processed.
Warning: Read K00171:29:H2NYHBBXX:8:1128:23045:10019 claims to have an aligned mate which could not be found in an adjacent line.
100000 SAM alignment record pairs processed.   
.
.
56400000 SAM alignment record pairs processed.
56500000 SAM alignment record pairs processed.
Warning: 53525476 reads with missing mate encountered.
56509150 SAM alignment pairs processed.

A previous post and a comment by Ian highlight the sort by name (-n) option. Currently, I sort in by position and converted to SAM,

samtools sort -o WT-CON.bam /home/exp/DESEQ2/WT/accepted_hits.bam 
samtools view WT-CON.bam > WT-CON.sam

Am not sure how can I overcome the warning. Do I need to sort BAM with -n and run HTSeq again or any other parameter is missing?

rna-seq sam alignment htseq • 2.0k views
ADD COMMENTlink modified 18 months ago by Martombo2.4k • written 18 months ago by anc.informatics0
1

Do I need to sort BAM with -n and run HTSeq

Yes, you have to do that in order to have the aligned mates adjacent to each other. Alternatively, featureCounts can sort the reads automaticaly for you before counting. But it will be slow anyway... sorting bam is always a pain.

ADD REPLYlink modified 18 months ago • written 18 months ago by Carlo Yague4.4k
3
gravatar for Martombo
18 months ago by
Martombo2.4k
Seville, ES
Martombo2.4k wrote:

In a bam file sorted by name the read mates are in two consecutive lines, since they have the same name. HTSeq can actually work on position-sorted bam files as well, with option -r pos, see here. That will however use more memory, as a read is kept until its mate is found. Also, you don't need to convert to sam, as HTSeq works on bam files as well. You may want to check featureCounts as well, which is much faster than HTSeq and produces the same results.

ADD COMMENTlink written 18 months ago by Martombo2.4k

Thanks for the explanation. May I know how the -n -o can be used in sorting? First I used -no after seeing a thread from SEQanswer but, it didn't seems sort correctly as I get error

 Unsorted positions on sequence #6: 5325030 followed by 5325016
 samtools index: failed to create index for "MUT.bam"

I ran command like,

samtools sort -no WT.bam /home/exp/DESEQ2/WT/accepted_hits.bam
ADD REPLYlink modified 18 months ago • written 18 months ago by anc.informatics0

Was the sorted bam file created correctly? Depending on your version of samtools, that might be an outdated syntax. You should use samtools sort -n -T /tmp/aln.sorted -o aln.sorted.bam aln.bam, see here.

ADD REPLYlink modified 18 months ago • written 18 months ago by Martombo2.4k

Yes, I came to know that -T option after checking the manual. Actually I was referring to an old one. Thanks again for your help!!

ADD REPLYlink written 18 months ago by anc.informatics0

ah, sorry, I didn't realize 5 days had passed already ;)

ADD REPLYlink written 18 months ago by Martombo2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 797 users visited in the last hour