How can I find genes located in the same region (overlapping) of the chromosome ?
1
0
Entering edit mode
2.2 years ago

I take the BAM file as input and perform RNA-Seq. The program prints out a list of genes to which the reads match. Some of the genes in the list overlapping in the same region on the chromosome. I want to add code to the program that reports overlapping genes. How can I do that ?

For example in the picture below, the blast result of 7 overlapping genes in the gene list.

For example, the blast result of 7 overlapping genes in the gene list.

Thank you for the help.

paralogues ambiguous overlapping genes multi-mapped reads RNA-Seq • 1.7k views
ADD COMMENT
0
Entering edit mode

While you are performing RNAseq/BAM the image you show above is from a blast search. Just to clarify, you are referring to "genes" from different hits(organisms) when you refer to the image above, correct?

Is your question about the RNAseq or are you using magicblast to create a BAM file? What is in your RNAseq reference? It would be unusual to have multiple species/transcriptomes at the same time.

ADD REPLY
0
Entering edit mode

I've included the blast image just to explain the overlapping genes. I have a program that takes the genome aligned BAM file as input and assigns the reads to the genes. All of the genes belong to a single organism (homo sapiens). The program simply takes a bam file as input and scores the reads from the samples to the genes. I want to add the following feature to the program; finding which of the reported genes overlap and reporting them in a separate tsv etc. format. (My question is about RNA-Seq and for genes from the same organism.)

ADD REPLY
0
Entering edit mode

...a program that takes the genome aligned BAM file as input and assigns the reads to the genes.

You also provide it with a list of gene descriptions, such as a GTF or GFF file - correct? (if not, how would your program know the gene locations?). There are many ways to deal with overlapping entries in GFF files, and various ways for a program to count reads on entries originating in GFF files. What program are you using? What is the source of your annotation file (e.g. GFF file)?

ADD REPLY
0
Entering edit mode

I just give the program a bam file and an optional bai index file. It counts the reads and assignment to the genes done with the pysam module.

ADD REPLY
0
Entering edit mode

You're still leaving out information which is important for answering your question - is this a program that you wrote? If you didn't write the program, is it a known program? Publicly available? Whatever program you're using must be reaching for gene descriptions in some format, as these are not contained within a BAM file. Without knowing more details, I don't see how this question can be answered.

ADD REPLY
0
Entering edit mode

I'm tiring you out because I'm a beginner, sorry for that. I want to explain it to you in a more understandable way. My code do this: Quantifies scRNA-seq BAM files. Takes alignment and single cell barcode annotated BAM files as input, and writes the quantification results to barcodes.tsv, genes.tsv and matrix.mtx format files, and produces MatrixMarket coordinate format barcode-by-gene read tables. (Note: I am reading the BAM file with the pysam.AlignmentFile function of the pysam module.) I want to add a new function to the code. I will add a flag that finds out which of the reported genes (written in genes.tsv file) are overlapping in the same region of the chromosome. I want to share my code with you for better understanding. Can you give me your e-mail address?

ADD REPLY
0
Entering edit mode
2.2 years ago
jv ★ 1.8k

one option is intersectBed from bedtools https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html

ADD COMMENT
0
Entering edit mode

Thank you. it says it reports overlap in at least two BAM files. But I need which of the genes assigned reads overlap in a single BAM file.

ADD REPLY
0
Entering edit mode

You're confusing file types, and the information they contain. intersectBed takes a variety of file types, one of which can be BAM format. However, BAM files do not contain information about genes, they contain alignment coordinates. The information about genes would be in some other file, such as BED, or GFF. For instance, if you had a BED file containing genomic locations of genes with an overlap, you could use that to identify reads from your BAM file overlapping more than one gene. And if you had a file of gene descriptions you could use that to identify which genes suffer from an overlap problem.

It's still not quite clear what you want to achieve. It sounds like you want to quantify reads on genes, but you want to take some action if a read can be mapped to more than one gene.

ADD REPLY

Login before adding your answer.

Traffic: 2285 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6