Olá Raphael, boa tarde (eu falo português da forma fluente)
1) For mapping, do you use scaffolds and chrM or only chr1-chr22,chrX and chrY?
If you have no intention of researching chrM, other scaffolds, or the sex chromosomes, then you can justify removing them - it depends on what your aims are. However, won't StringTie then try to assemble them anyway (if reads from these chromosomes are in your data)? It depends on the behaviour of StringTie when you use a genome-guided assembly.
I note that StringTie, if you supply a reference GTF file, will normalise counts over the GTF transcripts. This normalisation process will be influenced by the presence of a chrM, X, Y, etc., but only slightly. For raw coverage (raw counts), it makes no difference, as it would then be just counting reads over each position (and not normalising them).
Take a close look at the -x parameter of StringTie:
-x <seqid_list> Ignore all read alignments (and thus do not attempt to perform transcript assembly) on the specified reference sequences.
Parameter <seqid_list> can be a single reference sequence name (e.g.
-x chrM) or a comma-delimited list of sequence names (e.g. -x 'chrM,chrX,chrY'). This can speed up StringTie especially in the case
of excluding the mitochondrial genome, whose genes may have very high
coverage in some cases, even though they may be of no interest for a
particular RNA-Seq analysis. The reference sequence names are case
sensitive, they must match identically the names of
chromosomes/contigs of the target genome against which the RNA-Seq
reads were aligned in the first place.
2) In my GENCODE GTF file I have annotations from both mRNAs and non-coding RNAs. Do you remove annotations from non-coding RNAs?
It is no problem keeping the ncRNAs. They are genes like every other gene, the only difference being that they have a single exon. Any good transcriptome assembler will be able to distinguish the boundary between one gene and another.
3) For transcritome assembly (in my case, StringTie), what is the minimum coverage or depth to consider a transcriptome assembled?
Do you mean average coverage across an entire transcriptome or coverage over an individual transcript? Transcriptome assembly with TopHat or StringTie is different from that of other assemblers like Velvet/Oases because you typically use a reference genome FASTA and GTF with TopHat/StringTie. The key parameters are
Boa sorte cara!