Question

RNA seq alignment issue

0

Entering edit mode

14 months ago

Hashan • 0

Hi,

I was trying to do the bulk RNA seq analysis. However my pipeline is generating only 400000-700000 total counts per sample. However our core generate nearly 20000000-25000000 total counts per samples. We tried to resolve the issue by doing different things. We tried with no trimming and trimming of adaptors. However when we ran the QC after trimming and no trimming we had minimum loss of reads. Then we tried changing the reference files but didn't get any improvement. my mapping (using hisat2) is above 90% all the time so I do not think there is a problem with it.

Below is the pipeline I use

Run QC
Adaptor trimming 3 Run QC
Mapping (I use hisat2, wget ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/grcm38_tran.tar.gz)
Running sam tools (view, sort, flagstat, view)

counting the reds with htseq - we tried two reference files

           wget ftp://ftp.ensembl.org/pub/release-100/gtf/mus_musculus/Mus_musculus.GRCm38.100.gtf.gz and 
           mm10_UCSC_genesymbolNochr.gtf

I think the issue might be the reference file but not know how to fix it. If someone can help me with it I would really appreciate your help.

Thank you

Hashan

RNA-seq • 499 views

ADD COMMENT • link updated 14 months ago by Ram 45k • written 14 months ago by Hashan • 0

2

Entering edit mode

I think the issue might be the reference file but not know how to fix it.

You appear to be aligning against the transcripts file and not the genome. If you want to use the transcripts then consider using a program like salmon instead. If you wish to align against the genome then use https://cloud.biohpc.swmed.edu/index.php/s/grcm38/download

ADD REPLY • link 14 months ago by GenoMax 153k