the number of reads that have mapped to the each gene
1
0
Entering edit mode
8.6 years ago
zizigolu ★ 4.3k

Hey friends,

I have two accepted_hits.bam files one for RNA-seq and another for ribo-seq produced by tophat2. Today my adviser asked me how I can count the number of reads that have mapped to the each gene. I searched in biostars posts but I could not figure out yet. Can you help me please?

Thank you

bam tophat2 RNA-Seq ribo-seq • 3.8k views
ADD COMMENT
0
Entering edit mode

Thank you Pierre for your complete guidance!

ADD REPLY
1
Entering edit mode

You are going to have plenty of tools to do that (like this answer). The real question is, what and how are you counting exactly? You can count only uniquely mapped reads, reads that fall entirely into exons of your genes or just overlap them etc... You may also want a way to normalize your values (say, using FPKM).

This gets messy: ask your PI what exact method you should use. Ideally, you might find an article on the subject you are researching whose Materials and Methods RNASeq section can be recycled - you will be able to make more rigorous comparisons of your finds and those of the articles.

At minimum, you will need an annotation file with the position of the features you are counting.

ADD REPLY
0
Entering edit mode

thank you cyril-cros

ADD REPLY
2
Entering edit mode
8.6 years ago

Via BEDOPS gff2bed (or gtf2bed, depending on the annotation format), bam2bed, and bedmap:

$ gff2bed < genes.gff > genes.bed
$ bam2bed < reads.bam | bedmap --echo --count genes.bed - > answer.bed
ADD COMMENT
0
Entering edit mode

Sorry Alex,

I downloaded x86-64 (64-bit) binaries for fedora 64 bit, then with tar xvjf bedops_linux_x86_64-v2.4.14.tar.bz2, unzipped that, but there is no unzipped file...am I right for unzipping?

Sorry again,

I recognized right now... I file named bin have been created there

Thank you

ADD REPLY
0
Entering edit mode

Sorry Alex,

I have a genes.gtf file with

[izadi@lbox161 ~]$ bash
[izadi@lbox161 ~]$ cd /usr/data/nfs6/izadi/angel/ouput/
[izadi@lbox161 ouput]$ export BED=/usr/data/nfs6/izadi/gij/bin/
[izadi@lbox161 ouput]$ echo $BED
/usr/data/nfs6/izadi/gij/bin/
[izadi@lbox161 ouput]$ $BED/gtf2bed genes.gtf > genes.bed
/usr/data/nfs6/izadi/gij/bin//gtf2bed: line 122: convert2bed: command not found
[izadi@lbox161 ouput]$

What is the reason please?

ADD REPLY
3
Entering edit mode

Please, learn from your previous experience: no happen when runing tophat in fedora

ADD REPLY
0
Entering edit mode

Oh Pierre please,

Last night I tried too much

ADD REPLY
1
Entering edit mode
sudo cp $BED/* /usr/local/bin

Or similar.

ADD REPLY
0
Entering edit mode

Sorry Alex, what does this mean please? I should copy what?

ADD REPLY
0
Entering edit mode

Alex,

I tried

[izadi@lbox161 ouput]$ $BED/bam2bed accepted_hits.bam | bedmap --echo --count hgTables.bed - > answer.bed
/usr/data/nfs6/izadi/gij/bin//bam2bed: line 148: convert2bed: command not found
bash: bedmap: command not found...
[izadi@lbox161 ouput]$ sudo cp $BED/* /usr/local/bin

We trust you have received the usual lecture from the local System
Administrator. It usually boils down to these three things:

    #1) Respect the privacy of others.
    #2) Think before you type.
    #3) With great power comes great responsibility.

[sudo] password for izadi: 
[izadi@lbox161 ouput]$ $BED/gtf2bed genes.gtf > genes.bed
[izadi@lbox161 ouput]$ $BED/bam2bed accepted_hits.bam | bedmap --echo --count hgTables.bed - > answer.bed
/usr/data/nfs6/izadi/gij/bin//bam2bed: line 148: convert2bed: command not found
bash: bedmap: command not found...
[izadi@lbox161 ouput]$ $BED/bam2bed accepted_hits.bam | bedmap --echo --count hgTables.bed > answer.bed
/usr/data/nfs6/izadi/gij/bin//bam2bed: line 148: convert2bed: command not found
bash: bedmap: command not found...
[izadi@lbox161 ouput]$

What can I do please?

ADD REPLY
1
Entering edit mode

After copying binaries and scripts to a folder that is already in your PATH (like /usr/local/bin), then run:

$ gff2bed < genes.gff > genes.bed
$ bam2bed < reads.bam | bedmap --echo --count genes.bed - > answer.bed

You do not need the prefix $BED/ so long as binaries and scripts are copied to a folder already in your PATH (which usually includes /usr/local/bin).

The PATH variable helps your command-line session find and run binaries that are in other folders. This is an important part of UNIX to learn, so it is worth spending a couple minutes to read about.

Replace genes.gff with your source of genes, if you are converting GFF-formatted annotations to BED.

If you don't need to convert GFF-formatted annotations to BED, then you can skip this first step and just use your BED file (whatever it is) directly.

For instance, if your BED file of genes is hgTables.bed, and your BAM file of reads is accepted_hits.bam, then you can run:

$ bam2bed < accepted_hits.bam | bedmap --echo --count hgTables.bed - > answer.bed

If you need other statistics than a read count, read the bedmap documentation for available options.

ADD REPLY
0
Entering edit mode

Thank you so much Alex, I copied accepted_hits.bam and gtf.genes in binary file. Then I typed

[izadi@lbox161 ~]$ bash
[izadi@lbox161 ~]$ cd /usr/data/nfs6/izadi/gij/bin/
[izadi@lbox161 bin]$ gtf2bed genes.gtf > genes.bed
Error: No input is specified; please redirect or pipe in formatted data
convert2bed
  version: 2.4.13
  author:  Alex Reynolds
ADD REPLY
1
Entering edit mode
$ gtf2bed < foo.gtf > foo.bed

Note the use of the UNIX input and output redirection operators < and >.

ADD REPLY
0
Entering edit mode

Thank you Alex

ADD REPLY
0
Entering edit mode

Thank you Alex, you are really clever do develop such a tool

I could get the result

This is few rows but all of columns of my bed file

May you please tell me where is the the number of reads per gene?

I    334    337    "YAL069W    .    +    protein_coding    start_codon    0    exon_number "1"; gene_id "YAL069W"; gene_name "YAL069W"; p_id "P3633"; transcript_id "YAL069W"; transcript_name "YAL069W"; tss_id "TSS1128";|0
I    334    646    "YAL069W    .    +    protein_coding    CDS    0    exon_number "1"; gene_id "YAL069W"; gene_name "YAL069W"; p_id "P3633"; protein_id "YAL069W"; transcript_id "YAL069W"; transcript_name "YAL069W";

Thank you

ADD REPLY
1
Entering edit mode

The count value is at the end of the string. In your first line, for example, you see "...;|0" That means there were zero reads over the gene annotation (exon). In other situations, you might see "...;|1", "...;|2" etc. for one, two or more reads over the annotation.

It may be easier to add --delim '\t' to the bedmap command:

$ bam2bed < reads.bam | bedmap --echo --count --delim '\t' genes.bed - > answer.bed

The count value is then in its own column at the end of the line.

ADD REPLY
0
Entering edit mode

Tnxxxxxxxxxxxxxxxxxx a lot briliant guy

ADD REPLY

Login before adding your answer.

Traffic: 2286 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6