Question

small RNA-Seq pipeline Standstill

0

Entering edit mode

4 months ago

anthony.santana.703.j ▴ 30

Okay, so I aligned my reads to the genome and used the UCSC knownGene gtf to get counts via -t trasncripts. So i got counts,some of these counts are duplcated completely, mean ing they are completely the same and the counts are the same i was reading that this may be due to isoforms mapping to the same area(same start and stop as well as everything else, even duplicated coutns). I also have augmented my counts using biomart and i get gene biotype. Now i know i can graph these biotypes and get a general view of what is in my sample, the issue is is that biomart only queues up the "reviewd(swissprot)" and my "unreveiewed(trembl)" go without augmented info. I guess im just stuck on where to go from here, sorry for the various questions but theres just very limited info on small rna seq and the protocol. This is where im currently at:

1.Decide whether to collapse duplicated counts?

The gene biotypes i got are :

                 artifact                         
                                                         IG_C_gene 
                        19                                 16 
           IG_C_pseudogene                          IG_D_gene 
                        11                                 46 
                 IG_J_gene                    IG_J_pseudogene 
                         4                                  6 
             IG_pseudogene                          IG_V_gene 
                         1                                207 
           IG_V_pseudogene                             lncRNA 
                       289                              60210 
                     miRNA                           misc_RNA 
                      1926                               2402 
                   Mt_rRNA                            Mt_tRNA 
                         2                                 22 
            non_stop_decay            nonsense_mediated_decay 
                        10                               3297 
      processed_pseudogene               processed_transcript 
                     10139                                 26 
            protein_coding     protein_coding_CDS_not_defined 
                     49902                              28795 
        protein_coding_LoF                         pseudogene 
                       105                                 20 
           retained_intron                           ribozyme 
                     37115                                  8 
                      rRNA                    rRNA_pseudogene 
                        71                                514 
                    scaRNA                             snoRNA 
                        51                               1009 
                     snRNA                               sRNA 
                      2071                                  6 
                       TEC                          TR_D_gene 
                      1162                                  5 
                 TR_J_gene                    TR_J_pseudogene 
                        21                                  4 
                 TR_V_gene                    TR_V_pseudogene 
                       136                                 46

transcribed_processed_pseudogene transcribed_unitary_pseudogene

                      1216                                188

transcribed_unprocessed_pseudogene translated_processed_pseudogene

                      1764                                  2 
        unitary_pseudogene             unprocessed_pseudogene 
                        87                               2675 
                 vault_RNA 
                         4

The above only couns everything except trembl genes, im wondering if graphing these would be an acceptable output, and whether the trembl annotations should be added since they are unreviewed and low confidence.

Im genuinely just lost on how to move forward yet again.

NGS miRNA • 371 views

ADD COMMENT • link 4 months ago by anthony.santana.703.j ▴ 30