Do EGA RNAseq .bam files come with gtf/gff file corresponding to the genome assembly used?
Entering edit mode
10 weeks ago
rva_jango ▴ 10

I have downloaded an RNAseq dataset .BAM files with the pyega3 tool at EGA.

I have also downloaded the .tar file that lists the experiments, runs, etc but do not see a .gtf or .gff file to run something like featureCounts on the BAM files.

Any input appreciated, feel like there may be an easy answer here.

gtf gff RNAseq bam EGA • 258 views
Entering edit mode
10 weeks ago
GenoMax 104k

You can look in the headers of the BAM files to see what genome build was used for the alignments. Most aligners will capture the command line used for the alignment and include it in this file. You can choose a GFF/GTF file based on the source/version of that genome build.

If these happen to be unaligned BAM files (yes you can create these from raw fastq data) then you can convert the BAM files back to fastq reads and then use an aligner, genome and annotation combination of your choice.

Entering edit mode

Thanks for your directions.

module load sambamba/0.6.8
sambamba view -H $in | head

sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)

@HD VN:1.0 SO:coordinate @SQ SN:1 LN:249250621
AS:assembly19 SP:Homo_sapiens

I then downloaded the .GTF file from ensemble. Thanks.

To your revert to fastq, I did not do that but it was recommended as several pipelines start with fastq files.

bedtools bamtofastq [OPTIONS] -i <BAM> -fq <FASTQ>

Login before adding your answer.

Traffic: 2152 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6