Hello, I am having trouble with a process that I thought was going to be very simple. I performed a DiffBind analysis with my ChIP-seq datasets. The output gave me the chromosome location but now I would like to know the gene names.
I downloaded a reference dataset from USCS using the following commands:
wget http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_48/gencode.v48.annotation.gtf.gz
gunzip gencode.v48.annotation.gtf.gz
I then tried to use the intersect command in bedtools (I know bedtools can handle a gtf format as long #as it is tab seperated):
bedtools intersect -a my_file_of_interest.bed -b gencode.v48.annotation.gftf > output_with_gene_names.bed
But the output bed file still just lists the chromosome location and no gene names. Can anyone provide some guidance? I have tried to manipulate the refence dataset so the first column is chromosome, the second column is the start, and the third column is the end.
My file of interest has the following format:
chr16 4936776 4937176
chr12 52147884 52148284
chr21 41507488 41507888
chr1 31413259 31413659
chr13 34348350 34348750
chr1 94875031 94875431
chr2 113157454 113157854
The reference file looks like:
##description: evidence-based annotation of the human genome (GRCh38), version 48 (Ensembl 114)
##provider: GENCODE
##contact: gencode-help@ebi.ac.uk
##format: gtf
##date: 2025-01-19
chr1 HAVANA gene 11121 24894 . + . gene_id "ENSG00000290825.2"; gene_type "lncRNA"; gene_name "DDX11L16"; level 2; tag "overlaps_pseudogene";
chr1 HAVANA transcript 11121 14413 . + . gene_id "ENSG00000290825.2"; transcript_id "ENST00000832824.1"; gene_type "lncRNA"; gene_name "DDX11L16"; transcript_type "lncRNA"; transcript_name "DDX11L16-260"; level 2; tag "TAGENE";
chr1 HAVANA exon 11121 11211 . + . gene_id "ENSG00000290825.2"; transcript_id "ENST00000832824.1"; gene_type "lncRNA"; gene_name "DDX11L16"; transcript_type "lncRNA"; transcript_name "DDX11L16-260"; exon_number 1; exon_id "ENSE00004248723.1"; level 2; tag "TAGENE";
chr1 HAVANA exon 12010 12227 . + . gene_id "ENSG00000290825.2"; transcript_id "ENST00000832824.1"; gene_type "lncRNA"; gene_name "DDX11L16"; transcript_type "lncRNA"; transcript_name "DDX11L16-260"; exon_number 2; exon_id "ENSE00004248735.1"; level 2; tag "TAGENE";
chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000290825.2"; transcript_id "ENST00000832824.1"; gene_type "lncRNA"; gene_name "DDX11L16"; transcript_type "lncRNA"; transcript_name "DDX11L16-260"; exon_number 3; exon_id "ENSE00003582793.1"; level 2; tag "TAGENE";
chr1 HAVANA exon 13453 14413 . + . gene_id "ENSG00000290825.2"; transcript_id "ENST00000832824.1"; gene_type "lncRNA"; gene_name "DDX11L16"; transcript_type "lncRNA"; transcript_name "DDX11L16-260"; exon_number 4; exon_id "ENSE00004248730.1"; level 2; tag "TAGENE";