Question: RNA_seq Data analysis Help - Uneven coverage of reads across gene!
1
gravatar for deamonkinge
2.6 years ago by
deamonkinge10
deamonkinge10 wrote:

Hi there, I am new to RNA_seq and to Bioinformatics and I would greatly appreciate if someone could help me out a little with the analysis of some of my RNA_seq data.

So I am analyzing RNA_seq data from Bacteria using Galaxy - RNA was isolated from wild type bacteria and from a mutant bacteria. RNA was sequenced using Illumina NextSeq 500 system.

So the first thing I did when given the RNA_seq data was to perform a quality check on the RNA_seq data which was in the form of FastQ files. I performed the quality check using ''FastQC'', after a quality check I found that the reads were of very good quality so I proceeded with mapping of the reads back onto the bacterial reference genome. For this I used ''Bowtie2'' - I allowed Bowtie to perform soft clipping during the mapping step.

I then obtained a BAM file from the mapping stage - I used the ''BamCoverage'' Tool to change the Bam files into BigWig files. I then used ''IGV'' to visualize the mapping of my reads onto the reference genome. You can see this in the picture I attached to this question

https://www.dropbox.com/s/mv3wlgd9k6a32vl/drop%20box%20biostar%20pic.png?dl=0

  • In the picture you can see ''Wild Type'' forward and reverse files and also ''Mutant'' forward and reverse files. At the bottom of the picture you can see the Gene / Genes that the reads are mapping to. You can see in the picture that there are many reads from the Mutant forward file mapping to the gene ''NWMN_RS14115'' which is a gene that is on the forward strand.

My question is : Why are there Big blocks of reads mapping in and around the ends of the gene and less reads mapping to the middle of the gene? Shouldn't all the reads be falling evenly within the confines of the gene? Why are the reads mappingso unevenly - especially at the ends of the gene?

Any help would be greatly appreciated! Thanks

rna-seq next-gen alignment gene • 920 views
ADD COMMENTlink modified 2.6 years ago by Sean Davis26k • written 2.6 years ago by deamonkinge10
1
gravatar for Sean Davis
2.6 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

Read mapping will not be uniform across the gene, in general. The uneven coverage can be introduced in the wet lab and depends on the quality of the RNA, the isolation and extraction of the RNA, the library preparation strategy, and other factors. Bioinformatic processing can also result in observed uneven coverage, as not all regions of the genome are equally "mappable". In general, uneven coverage is to be expected and is not necessarily a problem, particularly when present across samples.

In your specific plot, one that seems to be evident is the presence of duplicate reads; that might be something worth checking. That said, it is very hard to tell based on the plot alone.

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Sean Davis26k

Thanks very much answering my question! Do you know how I could go about checking for the presence of duplicate reads? would there be a specific tool I could use on Galaxy to do this? Thanks.

ADD REPLYlink written 2.6 years ago by deamonkinge10
1

Hello,

There are a bunch of tools in Galaxy such as MarkDuplicates and RmDup that can be used for the identification and removal of duplicate reads if you want to use your data for things like variant calling.

ADD REPLYlink written 2.6 years ago by Diploid Progenitor10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1474 users visited in the last hour