Question: the times each genes have been repeated in my file is equal to the number of reads have been mapped on that gene
0
gravatar for F
3.1 years ago by
F3.1k
Iran
F3.1k wrote:

sorry friends,

in thebelow we can see few rows of a bed file from bowtie2...column one is the name of genes, columns two and three are the position where the read has been mapped (start and end)..and as you see gene YAL001C is being repeated (the times this gene has been repeated is equal to the number of reads that have mapped on the different places on this gene)

YAL001C    0    31    SRR1944914.13670510    42    +
YAL001C    0    31    SRR1944914.14245831    42    +
YAL001C    0    31    SRR1944914.14846638    42    +
YAL001C    21    49    SRR1944914.16464709    42    +
YAL001C    34    64    SRR1944914.16452509    42    +
YAL001C    39    68    SRR1944914.9573160    42    +
YAL001C    41    72    SRR1944914.10936494    42    +
YAL001C    47    78    SRR1944914.3091079    42    +
YAL001C    51    81    SRR1944914.14101000    42    +
YAL001C    63    94    SRR1944914.6961904    42    +
YAL001C    64    94    SRR1944914.1613580    42    +
YAL001C    81    112    SRR1944914.6321368    42    +
YAL001C    87    117    SRR1944914.15157073    42    +
YAL001C    102    133    SRR1944914.6375363    42    +
YAL001C    110    142    SRR1944914.3776687    42    +
YAL001C    110    140    SRR1944914.8299121    42    +
YAL001C    110    140    SRR1944914.10247842    42    +
YAL001C    123    153    SRR1944914.17267226    42    +
YAL001C    153    184    SRR1944914.11895906    42    +
YAL001C    162    191    SRR1944914.8661898    42    +
YAL001C    162    193    SRR1944914.15558858    42    +
YAL001C    183    214    SRR1944914.1191651    42    +

anyway i am with yeast and i used tophat2 using ensemble gtf file and so on)...many days i am trying to have such a bed by the tophat2 bam.file but i could not yet...a file in which the column one is the gene name and repeated as how many as reads have been mapped on and columns two and three are the start and the end of mappig of each read

do you have any idea to have such a file????

thAnk you

reads count tophat2 bed • 1.1k views
ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by F3.1k
3
gravatar for andrew.j.skelton73
3.1 years ago by
London
andrew.j.skelton735.3k wrote:

"in thebelow we can see few rows of a bed file from bowtie2"

Does bowtie2 produce a bed file? As far as I'm aware it produces a SAM file. Please be specific as to where you got your bed file from. 

Did you look at the forth column? That appears to be the difference between each of the three repeated entries for each start stop combination. 

"anyway i am with yeast and i used tophat2 using ensemble gtf file and so on)"

Be specific as to what commands you've ran already - reproducibility is key. 

bamtobed will produce a bed file from a bam file with chromosome, start, stop, read IDs, etc. I guess your followup question would be "but what about gene annotation?" - Look at the biomaRt package 

ADD COMMENTlink written 3.1 years ago by andrew.j.skelton735.3k

ohhhhhhhh Andrew come on...i mean i produced a sam then bam then bed

ADD REPLYlink written 3.1 years ago by F3.1k

i mean by tophat2 " tophat2 -p 8 -G genes.gtf genome file.fastq" command (gtf i think is the annotation flie and genome is the whole genome fasta), i produced a file named accepted_hits.bam which using "

bam2bed < accepted_hits.bam  | bedmap --echo --count genes.bed - > answer4.bed


" command, i have a bed file now....but advider asking me to have a file like what i pasted above

but what i have is like below:

I    334    337    "YAL069W    .    +    protein_coding    start_codon    0    exon_number "1"; gene_id "YAL069W"; gene_name "YAL069W"; p_id "P3633"; transcript_id "YAL069W"; transcript_name "YAL069W"; tss_id "TSS1128";    0
I    334    646    "YAL069W    .    +    protein_coding    CDS    0    exon_number "1"; gene_id "YAL069W"; gene_name "YAL069W"; p_id "P3633"; protein_id "YAL069W"; transcript_id "YAL069W"; transcript_name "YAL069W"; tss_id "TSS1128";    2
I    334    649    "YAL069W    .    +    protein_coding    exon    .    exon_number "1"; gene_id "YAL069W"; gene_name "YAL069W"; p_id "P3633"; seqedit "false"; transcript_id "YAL069W"; transcript_name "YAL069W"; tss_id "TSS1128";    2
I    537    540    "YAL068W-A    .    +    protein_coding    start_codon    0    exon_number "1"; gene_id "YAL068W-A"; gene_name "YAL068W-A"; p_id "P5377"; transcript_id "YAL068W-A"; transcript_name "YAL068W-A"; tss_id "TSS5439";    0
I    537    789    "YAL068W-A    .    +    protein_coding    CDS    0    exon_number "1"; gene_id "YAL068W-A"; gene_name "YAL068W-A"; p_id "P5377"; protein_id "YAL068W-A"; transcript_id "YAL068W-A"; transcript_name

anyway thank you

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by F3.1k
1
gravatar for Istvan Albert
3.1 years ago by
Istvan Albert ♦♦ 77k
University Park, USA
Istvan Albert ♦♦ 77k wrote:

This has nothing to do with tophat - what you have there most likely is the result of an intersect operation between a feature file and an alignment file - produced most likely by bedtools.

By default each overlap will be reported - hence you have the same gene reported each time it overlaps with a read. Consult the bedtools documentation on how to format the results of an intersect.

ADD COMMENTlink written 3.1 years ago by Istvan Albert ♦♦ 77k

tnx Istvan,

what i pasted above is the result of bedops tool by which first i converted gtf.genes to genes.bed then using my accepted_hits.bam as input i got such a result...but i need a file in column one contain gene name repeated equal to the number of reads that have been mapped on

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by F3.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1666 users visited in the last hour