Entering edit mode
12 months ago
bioinfo
▴
150
Hello,
I created a reference for my species of interest using the cDNA file from ensembl and then aligned my data to it using kallisto. Then, i was asked to add the ncRNA to the reference and a egfp sequence. I concatenated the 3 files using cat and then created the reference. I aligned my samples again, used tximport to get gene counts and now I have a few questions.
- When I aligned to the cDNA only I had gene 18SrRNA-Psi with 150560 counts. After aligning to the cDNA+ncRNA that gene now has 90 counts. However, in the new reference there are more genes that start with 18SrRNA. Could some of the counts be assigned to the other 18srRNA genes and that is why the new reference has less counts? I only checked the first 10 rows of my files and noticed that. The other 9 genes have the same counts.
- The txi.kallisto$abundance and txi.kallisto$counts files following the alignment with the cDNA+ncRNA have an extra line at the beginning that has only 0. This was not happening with the reference made only with cDNA. What could be causing that?
Thank you
Thank you for the help. The abundance files seem fine. It is the txi.kallisto file that has this issue. The txi.kallisto$counts looks like below when aligning to the cdna+ncRNA:
The abundance file looks like this:
When I align just to cDNA i get the below:
My code is below:
The only difference in the code between the cDNA and cDNA+ncRNA alignment is that in the latter I added
t2g2[nrow(t2g2) + 1,] <- list('EGFP', 'EGFP')
to the code.I think you are right and the issue is with the abundance files. I tried to use the same tximport with the EGFP addition with the abundance files from just the cDNA alignment and that does not produce the line with 0. I did txi.kallisto$length and I actually get a value of 15.666069 for all the samples but when I grep that value in the abundance files I do not find anything. I also noticed that in my t2g2 file there are some target_ids without ext_genes. All of those genes seem to be in the ncRNA fasta file.Could that be causing this?
I found the reason and I am adding it here. It seemed to be caused by the target_ids without ext_genes. When Iadded the target_ids as ext_genes for the empty ext_genes I did not get that row on top.