I am working on equine embryos and we did non-oriented paired-end RNA-seq (Illumina, NextSeq High) on not-pooled embryos.
I first mapped the data with STAR using EquCab3.0.99 from Ensembl. It worked perfectly (around 60% of mapping for each embryo) but I am interesting in two genes: XIST and SRY which would allow me to know the embryo sexe. This 2 genes are not in the Ensembl annotation but they seems to be in NCBI genebank (but I am not sure):
So, I want annotate the Ensembl unmapped reads with NCBI genebank to find XIST and SRY.
In STAR, I saw that I have to use the argument:
but I am wondering about the read 1 and 2: are they in the same fastq file? Can I use these fastq files directly in the mapping with NCBI genebank?
Moreover, I do not know how find the good fasta and GTF/GFF files in NCBI. Could someone give me a tutorial to download the good one?
Finally, in Ensembl annotation I have a lot of "lnc DNA" annotated as novel gene. I am wondering: is it possible that the genes that are annotated to "novel gene" could be XIST for example? Can I check that?