Closed:HTseq error with paired end alignments
0
0
Entering edit mode
5.1 years ago
bnayer26 • 0

I am running HTseq code as follows:

python2.7 -m HTSeq.scripts.count -f bam -r pos -s no /extstor/sudhirlab/Reety/Bhavana/Set3_firstAgrigenome/Sorted_CFPAC-1_N1_aligned.bam /extstor/sudhirlab/Reety/Bhavana/Indexed_Genomes/GRCh38/ChromosomeNamed_Gtf_GRCh38.gtf > /extstor/sudhirlab/Reety/Bhavana/Set3_firstAgrigenome/Counts_Sorted_CFPAC-1_N1_aligned.txt

When I execute the command, the command starts to run showing the following:

100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.

....and so on

However, at the end it is giving me the following warning:

Warning: Mate records missing for 153562 records; first such record: <SAM_Alignment object: Paired-end read 'HWI-1KL120:151:C3350ACXX:6:2205:21118:19311' aligned to chrMT:[5593,5692)/+>.
34600000 SAM alignment record pairs processed.
Warning: Mate pairing was ambiguous for 133693 records; mate key for first such record: ('HWI-1KL120:151:C3350ACXX:6:2301:10892:62181', 'second', 'chr1', 14614, 'chr1', 14800, 285).
34697807 SAM alignment pairs processed.

When I open my output text file, I see that there are several rows of:

ENSG00000000003 0
ENSG00000000005 0
ENSG00000000419 1504
ENSG00000000457 409
ENSG00000000460 224

And in the end this is what it says:

__no_feature    3952696
__ambiguous     2186111
__too_low_aQual 1112784
__not_aligned   235455
__alignment_not_unique  1318607

So has my HTseq worked or not? I am very confused. I am guessing these warning signs mean that HTseq cannot handle any unpaired reads? How do I proceed with this?

Second, when I am running the same on another set of sorted bam files, then I am getting a different error. This second set of sorted bam files, originally were created from sam files generated after alignment with Hisat2, wherein I had input both paired and unpaired outputs (given by trimmomatic), to use for performing alignment. The error is as follows:

Error occured when processing SAM input (record #10 in file /extstor/sudhirlab/Reety/Bhavana/Set3_firstAgrigenome/Sorted_OZ-GCTM5-POSITIVE_aligned.bam):
  Sequence of paired-end alignments expected, but got single-end alignment.
  [Exception type: ValueError, raised in __init__.py:761]

And the corresponding .txt file in which I had directed my output to, is empty.

My guess after reading some other posts is that HTseq does not handle unpaired reads well. So what can I do about it ? I am very new to informatics, so any help will be great.

RNA seq RNA-Seq HTseq • 241 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2145 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6