How to find genes overlapped by paired-end reads
0
0
Entering edit mode
8.0 years ago
Franck8413 ▴ 20

Hi everybody,

I am currently working on the implementation of a little package on R which seeks to make predicting operons from RNA-seq data with paired-end reads. Now, I'm looking for a package on R or a tool like Bedtools to do this :

I have two annotation files, one in GTF the other in GFF3 format (with the name of the genes and their coordinates etc ...) And a file in BED format which contains the reads (in paired-end) that I can sort by position or by read pair. I want to find all the genes that are overlapped by one fragment (read1 + read2 reassembled) based on their genomic coordinates. Or better yet, pairs of different genes that are overlapped by one fragment.

I read a lot of fonction that seems to do this but I'm not very sure of the result and I don't know which is the best between Bedtools intersect, annotate etc ... Or GenomicRange, findOverlap on R etc ...

If someone have a suggestion, I'll take it.

Thanks !

RNA-Seq paired-end reads genes overlapping • 2.2k views
ADD COMMENT
0
Entering edit mode

It's probably just me, but could you provide a small sample of what your your BED file looks like? And also kind of show what you want your outcome to look like? I'm not really understanding what you're trying to do.

ADD REPLY
0
Entering edit mode

For example my bed look like :

Chromosome  4036592 4036623 SRR191812.5.1/1 59  -
Chromosome  4036463 4036535 SRR191812.5.2/2 59  +
Chromosome  226143  226174  SRR191812.8.1/1 59  -
Chromosome  226059  226135  SRR191812.8.2/2 59  +

....

In the fourth column I have the name of each read, and the first row is for the "Read1" and the next is for the "Read2"

And I have a file in GTF or GFF3, here it will be a GTF that look like this :

Chromosome  protein_coding  start_codon 3041168 3041170 .   -   0    gene_id "b2899"; transcript_id "AAC75937"; exon_number "1"; gene_name "yqfA"; transcript_name "yqfA-1";

I want to know for each pair of reads what are the gene that are overlapped by my pair of reads. Does the start of R1 is include in a gene and the end of R2 is include in a different gene ?

ADD REPLY

Login before adding your answer.

Traffic: 1891 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6