Question

count file having zero

0

Entering edit mode

5.7 years ago

BioBaby ▴ 20

I have some non-coding region sequence around 5000 and some tissue specific reads data. I want to do differential gene expression analysis. So I want to check which tissue specific reads having these non coding sequences for that I aligned them (where I created database for those non coding sequences) by tophat2. Then I need to have count file from these bam file, used featureCount but it gave zero only.I also used bedtool also it also not worked. I suspect because of those non-coding seq it can't generate count file...may be I don't know. I convert bam to sam file then it looks like this:

> @HD     VN:1.0  SO:coordinate
@SQ     SN:0    LN:309
@SQ     SN:0    LN:381
@SQ     SN:1    LN:1085
@SQ     SN:1    LN:353
@SQ     SN:10   LN:239
@SQ     SN:10   LN:205
@SQ     SN:10008        LN:659
@SQ     SN:1001 LN:291

Can you people tell me what should I do?

RNA-Seq next-gen alignment • 1.8k views

ADD COMMENT • link updated 15 months ago by Ram 44k • written 5.7 years ago by BioBaby ▴ 20

0

Entering edit mode

I changed "tool" to "question", since tool is meant for promotion of new tools. Also use the 101 button for code, it reads easier for your readers.

ADD REPLY • link 5.7 years ago by Benn 8.3k

0

Entering edit mode

Your post lacks any details (code) on what you did and which files you used). Please add this. Brief Reminder On How To Ask A Good Question

ADD REPLY • link 5.7 years ago by ATpoint 84k

0

Entering edit mode

You first need to figure out why you have duplicate entries with different lengths in your BAM/SAM files. For example, you an a contig 0 and length 309 and another contig 0 with length 381. That's going to break most tools. Further, make sure that your annotations has contig names like 0, 1, 10008 and so on, otherwise featureCounts will always return 0.

ADD REPLY • link 5.7 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks Ryan...I removed duplicate. Here I should mention that I am working novel lincRNA so my annotation file does not have that information.Then how to calculate count? any other way to do it. It may look silly but do not have idea.

ADD REPLY • link 5.7 years ago by BioBaby ▴ 20

0

Entering edit mode

Add it to your annotation then.

ADD REPLY • link 5.7 years ago by Devon Ryan 104k

0

Entering edit mode

here I am giving my gtf file which looks like:

SL3.0ch00       pfurio  peak    0       16479   .       +       .       peak_id "SL3.0ch00_0_16479";
SL3.0ch00       pfurio  peak    17940   328351  .       +       .       peak_id "SL3.0ch00_17940_328351";
SL3.0ch00       pfurio  peak    334459  548343  .       +       .       peak_id "SL3.0ch00_334459_548343";

from bam file samid generated by following command

samtools idxstats a_sorted.bam > samid.txt

samid.txt looks like this:

SL3.0ch00:1230877-1231088       211     0       0
SL3.0ch00:1147815-1150318       2503    0       0
SL3.0ch00:1147485-1147759       274     0       0
SL3.0ch00:1150608-1151254       646     4       0

then give this command

featureCounts -a x.gtf -F GTF -o y.bed a_sorted.bam

ERROR: no features were loaded in format GTF

and I used bedtool also still it gave zero count.

ADD REPLY • link 5.7 years ago by BioBaby ▴ 20

0

Entering edit mode

That "GTF" file isn't in GTF format, you need to fix that. Also, the chromosome names don't match between the two files.

ADD REPLY • link 5.7 years ago by Devon Ryan 104k

0

Entering edit mode

I created GTF file from this command:

awk -v OFS="\t" '{print $1, "myIntergenic","intergenic", $2, $3, ".", "+", ".", sprintf("gene_id \"%s\"; transcript_id \"%s:%s-%s\"" ";", $4, $1, $2, $3 )}' l.bed > o.gtf

so it looks like this:

SL3.0ch00   myIntergenic    intergenic  1230877 1231088 .   +   .   gene_id "xxx"; transcript_id "SL3.0ch00:1230877-1231088";
SL3.0ch00   myIntergenic    intergenic  1147815 1150318 .   +   .   gene_id "xxx"; transcript_id "SL3.0ch00:1147815-1150318";
SL3.0ch00   myIntergenic    intergenic  1147485 1147759 .   +   .   gene_id "xxx"; transcript_id "SL3.0ch00:1147485-1147759";

but how to change bam file ids? in gtf file we can't give "SL3.0ch00:1230877-1231088"...so what should I do?

ADD REPLY • link 5.7 years ago by BioBaby ▴ 20

0

Entering edit mode

It looks like your reference fasta file has the contigs split apart into chunks. Honestly I'd try to fix that and realign the data. Then you'd have SL3.0ch00 and such and the alignments would probably be more accurate.

ADD REPLY • link 5.7 years ago by Devon Ryan 104k

0

Entering edit mode

thanks for giving your time Ryan...one question I have..can I create count file from idxstats output? it will be OK for further differential gene expression analysis?