Question: How to find genes overlapped by paired-end reads
0
gravatar for Franck8413
3.2 years ago by
Franck841310
Franck841310 wrote:

Hi everybody,

I am currently working on the implementation of a little package on R which seeks to make predicting operons from RNA-seq data with paired-end reads. Now, I'm looking for a package on R or a tool like Bedtools to do this :

I have two annotation files, one in GTF the other in GFF3 format (with the name of the genes and their coordinates etc ...) And a file in BED format which contains the reads (in paired-end) that I can sort by position or by read pair. I want to find all the genes that are overlapped by one fragment (read1 + read2 reassembled) based on their genomic coordinates. Or better yet, pairs of different genes that are overlapped by one fragment.

I read a lot of fonction that seems to do this but I'm not very sure of the result and I don't know which is the best between Bedtools intersect, annotate etc ... Or GenomicRange, findOverlap on R etc ...

If someone have a suggestion, I'll take it.

Thanks !

ADD COMMENTlink written 3.2 years ago by Franck841310

It's probably just me, but could you provide a small sample of what your your BED file looks like? And also kind of show what you want your outcome to look like? I'm not really understanding what you're trying to do.

ADD REPLYlink written 3.2 years ago by Sinji2.8k

For example my bed look like :

Chromosome  4036592 4036623 SRR191812.5.1/1 59  -
Chromosome  4036463 4036535 SRR191812.5.2/2 59  +
Chromosome  226143  226174  SRR191812.8.1/1 59  -
Chromosome  226059  226135  SRR191812.8.2/2 59  +

....

In the fourth column I have the name of each read, and the first row is for the "Read1" and the next is for the "Read2"

And I have a file in GTF or GFF3, here it will be a GTF that look like this :

Chromosome  protein_coding  start_codon 3041168 3041170 .   -   0    gene_id "b2899"; transcript_id "AAC75937"; exon_number "1"; gene_name "yqfA"; transcript_name "yqfA-1";

I want to know for each pair of reads what are the gene that are overlapped by my pair of reads. Does the start of R1 is include in a gene and the end of R2 is include in a different gene ?

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Franck841310
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1014 users visited in the last hour