I am working on my thesis where I compare RNA-seq results between pipelines. I feel little bit confused about where do I work with transcripts and where with genes. Here is what I do: I have cDNA reads, which are pseud-aligned to genome using cDNA+ncRNA using kallisto software. At this moment everything seems ok and i have example output file: https://ibb.co/985hdf2
In target_id column there are transcripts (ENSTXXXXXX). Hovewer when I pass raw counts to DESeq2, I obtain output file with column GeneID with transcripts names (ENSTXXXXXX). Moreover MAplot according to DESeq2 documentation represents each gene with a dot. https://ibb.co/ky1VxjQ
Now do DESeq2 turns transcripts into corresponding genes? If not, then why in output file I have transcripts names in GeneID column?. https://ibb.co/2k0Vhf0
Lastly, when I want to obtain different expressed genes and DESeq2 returns only statistically significant genes (transcripts?), how do I know which log2FoldChange values indicate upregulated and downregulated genes? Is there a way to know some threshold point? I also consider option to intepret all genes with p-value < 0.01 as different expressed, but then my analysis shows 50k upregulated and 50k downregulated genes, which does not seems real, because most publications is treating about 100-3k DEGs. Thanks in advance.