Entering edit mode
10.3 years ago
ancient_learner
▴
680
Hi all
i was just trying to extract all the non-coding rna data for mouse from ucsc genome browser. from the table browser i got all the list of refseq genes. is it ok if i just grep
those which contain NR
as suffix for getting all long non coding rna? or is there any method for the same.
That will also contain microRNAs.
then how can i get only long non coding rna data? please give your suggestions
You might download
kgXref
and thengrep "non-coding" kgXref | cut -f 2 | sort | uniq
to get a list of names and then use that with grep (microRNAs aren't labeled as non-coding in kgXref). A more convenient way would be to just add the full name of each gene to the GTF file and then grep for "non-coding" in that (it's often convenient to have a gene_name field in GTF files). I expect there are instructions elsewhere on biostars on how to do that.Edit: Of course, this all depends on the annotations containing "non-coding" in their names, which I can't guarantee is always the case.
Edit2: If you're willing to use the Ensembl annotation then life is easier (you can even just use biomart). I would normally recommend the Ensembl annotation anyway, it's much cleaner.