Question: Htseq-count no_feature threshhold?
0
gravatar for MK_B
2.5 years ago by
MK_B0
Switzerland
MK_B0 wrote:

Dear all,

I'm in the middle of a standard RNA-seq analysis and got a rather basic question that I cannot answer myself due to to missing experience: Since I'm getting quite a high percentage of read count assigned to 'no_feature' from HTSeq-count: I get roughly 25% ( some up to 30%) of the reads counts assigned to no_feature. And I checked the standard issues like maching chromosome names in the GTF and sam files etc. I also tried -m intersect_strict instead of -m union but then I had basically all counts assigned to no_feature. It's important to add that I'm working on a non-model organism (horse) , so I guess I would expect a higher number in that regard, but still, I'm not sure if any troubleshooting is necessary or if that's just as good as it can get. I would really appreciate to get your advice, and if you need any information or clarification just let me know please.

Please find below my command lines as well as GTF and SAM sample lines and an htseq output I get.

HTseq command line:

htseq-count -m union -r pos -i transcript_id -a 10 -o ${NAME}_out.sam --stranded=no -f bam $path ref_EquCab2.0_top_level.chr.gtf>count_table.txt

HTseq sample output (tail)

rna9995 5
rna9996 0
rna9997 0
rna9998 0
rna9999 0
__no_feature    10342058
__ambiguous 8603905
__too_low_aQual 0
__not_aligned   0
__alignment_not_unique  1814027

HTSeq errors/output

34600000 SAM alignment record pairs processed.
34700000 SAM alignment record pairs processed.
34800000 SAM alignment record pairs processed.
Warning: Mate records missing for 1498 records; first such record: <SAM_Alignment object: Paired-end read 'J00121:58:H75JHBBXX:6:2113:6908:24261' aligned to chr22:[27095622,27095771)/->.
Warning: Mate pairing was ambiguous for 105845 records; mate key for first such record: ('J00121:58:H75JHBBXX:6:2227:26808:37501', 'second', 'chr1', 942, 'chr1', 1108, 312).
34833739 SAM alignment pairs processed.

GTF file i used (head)

chrMT   RefSeq  exon    1   70  .   +   .   transcript_id "rna43393";
chrMT   RefSeq  exon    71  1045    .   +   .   transcript_id "rna43394";
chrMT   RefSeq  exon    1046    1112    .   +   .   transcript_id "rna43395";
chrMT   RefSeq  exon    1113    2693    .   +   .   transcript_id "rna43396";
chrMT   RefSeq  exon    2694    2768    .   +   .   transcript_id "rna43397";
chrMT   RefSeq  CDS 2771    3727    .   +   0   transcript_id "gene27150"; gene_id "gene27150"; gene_name "ND1";
chrMT   RefSeq  exon    3727    3795    .   +   .   transcript_id "rna43398"; gene_id "gene27150"; gene_name "ND1";
chrMT   RefSeq  exon    3793    3865    .   -   .   transcript_id "rna43399";
chrMT   RefSeq  exon    3868    3936    .   +   .   transcript_id "rna43400";
chrMT   RefSeq  CDS 3937    4977    .   +   0   transcript_id "gene27151"; gene_id "gene27151"; gene_name "ND2";

head of a sample SAM file:

J00121:58:H75JHBBXX:6:1101:24200:42337  163 chr1    689 255 151M    =   833 294 CGGGGCCTTGCGGGGGAGGCCCGTGGAGGGCGCGACGGGCTCGGCCGCCGGGCTGGCCTTTTCCCCACTGGTCTTCCGAGTCGACCGGCTCTGGCGGTGGGGACCGGGCCCGGTCCTCGGATGCCTCCTCCTCCGTGGCAGTTTTTTGTCC AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJFJJFJJJJJJ7FJJFJ7F<AFJJJJJ NH:i:1  HI:i:1  AS:i:293    nM:i:3  NM:i:0  MD:Z:151    jM:B:c,-1   jI:B:i,-1
J00121:58:H75JHBBXX:6:1109:2595:15838   163 chr1    716 255 150M    =   818 251 GGGCGCGACGGGCTCGGCCGCCGGGCTGGCCTTTTCCCCACTGGTCTTCCGAGTCGACCGGCTCTGGCGGTGGGGACCGGGCCCGGTCCTCGGATGCCTCCTCCTCCGTGGCAGTTTTTTGTCCAAGTCCCGCCCTGGAGAAGAGCGTGG  AAAFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFJJJJJJJJJJJJJJJJJJJJJAFJJJF<JJJAFJJJJJJJJJJJJJJJJFFFFJJJJJJJJJJJJJJJJJJJJJJJJJJFJJAJJAAA-<-7FF7A-<FFJJFAFJ  NH:i:1  HI:i:1  AS:i:289    nM:i:4  NM:i:1  MD:Z:144C5  jM:B:c,-1   jI:B:i,-1
J00121:58:H75JHBBXX:6:2206:15006:18915  163 chr1    716 255 150M    =   818 251 GGGCGCGACGGGCTCGGCCGCCGGGCTGGCCTTTTCCCCACTGGTCTTCCGAGTCGACCGGCTCTGGCGGTGGGGACCGGGCCCGGTCCTCGGATGCCTCCTCCTCCGTGGCAGTTTTTTGTCCAAGTCCCGCCCTGGAGAAGAGCGTGG  AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJFJJJJJJJJJJJJ<JJFJJJJJJJJJJ<JJJJAJJJJJJJJJJFJF7FJJJJAFJJJJJFFFFJJJJFAJJJJJJJAJJ<FFJFJFJJ  NH:i:1  HI:i:1  AS:i:289    nM:i:4  NM:i:1  MD:Z:144C5  jM:B:c,-1   jI:B:i,-1
J00121:58:H75JHBBXX:7:1202:30107:19337  163 chr1    716 255 150M    =   818 251 GGGCGCGACGGGCTCGGCCGCCGGGCTGGCCTTTTCCCCACTGGTCTTCCGAGTCGACCGGCTCTGGCGGTGGGGACCGGGCCCGGTCCTCGGATGCCTCCTCCTCCGTGGCAGTTTTTTGTCCAAGTCCCGCCCTGGAGAAGAGCGTGG  AAAFFAJJJJJJFJJFJJJFJJJJJJJJJJJJJJJFFFJ<JJJJJJJJJJJJJFJJJJJFFJJJJJFJJJJFJJA77AJ<JJFJFJAJJJJFF-F-AFFJJJFF<F<AFFFFAAF-AJJJFF--7A<FAJ)))7<-FF-)7-<AFJF<A<  NH:i:1  HI:i:1  AS:i:289    nM:i:4  NM:i:1  MD:Z:144C5  jM:B:c,-1   jI:B:i,-1
J00121:58:H75JHBBXX:6:2115:19948:27408  163 chr1    722 255 151M    =   869 298 GACGGGCTCGGCCGCCGGGCTGGCCTTTTCCCCACTGGTCTTCCGAGTCGACCGGCTCTGGCGGTGGGGACCGGGCCCGGTCCTCGGATGCCTCCTCCTCCGTGGCAGTTTTTTGTCCAAGTCCCGCCCTGGAGAAGACCGTGGACCGGCC AAFFFJJFJFJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJAJJJFJJJ<JJJJJJJJFJJJJJJJJJFJJFFJJJJAJJJJJJJJJAJJJJJ-FFJJJJJJJJJJFJJJFJJJJJFFJJFFFJJJJJJFJ7JFJJ NH:i:1  HI:i:1  AS:i:296    nM:i:2  NM:i:0  MD:Z:151    jM:B:c,-1   jI:B:i,-1

STAR command line

STAR  --outFileNamePrefix $SEED --outFilterMultimapNmax 50  --outFilterMismatchNmax4 --seedSearchStartLmax 25 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 100000--sjdbGTFfile Equus_caballus.EquCab2.72.gtf --outFilterMismatchNmax 4 --outFilterType BySJout --outSAMtype BAM SortedByCoordinate --outSAMstrandField intronMotif --outSAMattributes All --outTmpDir ./$SGE_TASK_ID --runThreadN 4 --genomeDir /data/references/horse/StarIdx --readFilesIn ../raw_data/${SEED}_combined_R1.fastq ../raw_data/${SEED}_combined_R2.fastq
rna-seq no_feature htseq-count • 1.4k views
ADD COMMENTlink written 2.5 years ago by MK_B0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1424 users visited in the last hour