count file zero
1
0
Entering edit mode
5.2 years ago
BioBaby ▴ 20

hello everyone,

I identified some lincRNA from different light treatment and tissue specific data.Now I want to see differential expression of these lincRNA among these tissue and light specific data so for that I align individual raw read (tissue and light specific) to these lincRNA(predicted) then try to generate count file by featureCount but gave only zero while in samtools idxstats gave mapped reads. lincRNA file looks like:

>SL3.0ch00:1230877-1231088
GCCTATATATAGAGTACTAAATTCCTTAAAAAGGCATCTCGGAAGTTCCATAAATAGATCAAGATATCGAATAAGAGAGGTCAAATGTAAATCATCCTAGTTCGAGAAATACGCCACTAACGACCCTCGAATCATACAAAATCATGGAGAGTAGAATTAAGGGATCAATAGAATTGTACCGCTG
>SL3.0ch00:1147815-1150318
GAGCAGCCATAGAACAAAAGCAGTTGTGGGTGAGCTGGTTTAAACCTCCTCAATAAGAGGCGTGCGCACCAACAAGCGAGGGTTTGAATCCCACCAGTAGCATTTATTTTTTTAAAAAAATT

samtools idxstats accepted_hits.sort.bam > samid.txt

which looks like:

SL3.0ch00:1230877-1231088       211     0       0
SL3.0ch00:1147815-1150318       2503    0       0
SL3.0ch00:1147485-1147759       274     0       0
SL3.0ch00:1150608-1151254       646     4       0

because these lincRNA are from intergenic region so would overlap with original gtf file (which available at solgenomic website) so from these lincRNA file itself I created GTF file which looks like this:

SL3.0ch00      myIntergenic    intergenic      1230877 1231088 .       +       .       gene_id "xxx"; trancript_id "SL3.0ch00:1230877-1231088";
SL3.0ch00       myIntergenic    intergenic      1147815 1150318 .       +       .       gene_id "xxx"; trancript_id "SL3.0ch00:1147815-1150318";

featureCounts -T 60 -F GTF -t intergenic -a intergenic_created.gtf -o feature4.bed SLY_veg06_sorted.bam

this I used for feature Count but I got zero count for all transcript. kindly guide me how to solve this.

RNA-Seq next-gen alignment • 1.3k views
ADD COMMENT
0
Entering edit mode
5.2 years ago
michael.ante ★ 3.8k

Your chromosome names in your alignment are named SL3.0ch00:1230877-1231088 , your chromosomes names in the gtf are SL3.0ch00 . The gtf describes coordinates on the genome, the index you is in aligning against are the transcriptome coordinates.

Since the names are not equal featureCounts (and any other gene read assigner) will provide 0 reads. You can use the first two columns of your idxstat as a start to generate the gtf manually.

ADD COMMENT
0
Entering edit mode

As above I showed GTF that is manually created from idxstat file only sir still it did not give.... I changed my database also:

>SL3.0ch00
GCCTATATATAGAGTACTAAATTCCTTAAAAAGGCATCTCGGAAGTTCCATAAATAGATCAAGATATCGAATAATATACAAATTGAT
>SL3.0ch00
GGATTATTCATAGAGAGAGGTCAAATGTAAATCATCCTAGTTCGAGAAATACG

even I paste SL3.0ch00:1230877-1231088 in first column of gtf to get match (I know its not valid) but this also didnot work. all combination I tried.

ADD REPLY
1
Entering edit mode

Get the reads from samtools idxstats instead of featureCounts

ADD REPLY
0
Entering edit mode

Although this would include multimapper in the counting.

ADD REPLY
0
Entering edit mode

But then your coordinates in the gtf are not 1230877 to 1231088 , but rather 1 to 211 .

ADD REPLY
0
Entering edit mode

I tried by -O parameter of featureCount still did not work. and 211 is length of transcript..can I create count file from idxstats output? it will be OK for further differential gene expression analysis?

ADD REPLY

Login before adding your answer.

Traffic: 2788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6