small RNA-Seq pipeline Standstill
0
0
Entering edit mode
12 weeks ago

Okay, so I aligned my reads to the genome and used the UCSC knownGene gtf to get counts via -t trasncripts. So i got counts,some of these counts are duplcated completely, mean ing they are completely the same and the counts are the same i was reading that this may be due to isoforms mapping to the same area(same start and stop as well as everything else, even duplicated coutns). I also have augmented my counts using biomart and i get gene biotype. Now i know i can graph these biotypes and get a general view of what is in my sample, the issue is is that biomart only queues up the "reviewd(swissprot)" and my "unreveiewed(trembl)" go without augmented info. I guess im just stuck on where to go from here, sorry for the various questions but theres just very limited info on small rna seq and the protocol. This is where im currently at:

1.Decide whether to collapse duplicated counts?

  1. The gene biotypes i got are :

                     artifact                         
                                                             IG_C_gene 
                            19                                 16 
               IG_C_pseudogene                          IG_D_gene 
                            11                                 46 
                     IG_J_gene                    IG_J_pseudogene 
                             4                                  6 
                 IG_pseudogene                          IG_V_gene 
                             1                                207 
               IG_V_pseudogene                             lncRNA 
                           289                              60210 
                         miRNA                           misc_RNA 
                          1926                               2402 
                       Mt_rRNA                            Mt_tRNA 
                             2                                 22 
                non_stop_decay            nonsense_mediated_decay 
                            10                               3297 
          processed_pseudogene               processed_transcript 
                         10139                                 26 
                protein_coding     protein_coding_CDS_not_defined 
                         49902                              28795 
            protein_coding_LoF                         pseudogene 
                           105                                 20 
               retained_intron                           ribozyme 
                         37115                                  8 
                          rRNA                    rRNA_pseudogene 
                            71                                514 
                        scaRNA                             snoRNA 
                            51                               1009 
                         snRNA                               sRNA 
                          2071                                  6 
                           TEC                          TR_D_gene 
                          1162                                  5 
                     TR_J_gene                    TR_J_pseudogene 
                            21                                  4 
                     TR_V_gene                    TR_V_pseudogene 
                           136                                 46 
    

    transcribed_processed_pseudogene transcribed_unitary_pseudogene

                          1216                                188 
    

    transcribed_unprocessed_pseudogene translated_processed_pseudogene

                          1764                                  2 
            unitary_pseudogene             unprocessed_pseudogene 
                            87                               2675 
                     vault_RNA 
                             4 
    

The above only couns everything except trembl genes, im wondering if graphing these would be an acceptable output, and whether the trembl annotations should be added since they are unreviewed and low confidence.

Im genuinely just lost on how to move forward yet again.

NGS miRNA • 291 views
ADD COMMENT

Login before adding your answer.

Traffic: 4983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6