Entering edit mode
6 months ago
kerrybear20
•
0
Hello,
A previous student sent off some RNA extraction for sequencing and analysis to a company and the files are confusing me so was wondering if anyone could help. The company sent the raw RNA-seq data, Tophat analysis and Cufflinks analysis for 3 conditions. The only file formats in the folder are BAM, BAI and FASTQ files so was wondering which would equate to each analysis programme as I would have expected some GTF files?
Thanks in advance.
1) What analysis do you want to perform (i.e. what's your biological question/hypothesis)? Differential gene (or transcript) expression between conditions? Isoform or variant discovery? Single-sample gene set enrichment analysis? Cell type fraction estimation?
2) You should not be using tophat+cufflinks. Those are outdated programs that really aren't maintained anymore. I recommend a program like kallisto (or another equally suitable alternative).
3) If doing differential expression, start your analysis with the FASTQ files -- download a reference transcriptome (for whatever organism you're working with) from gencode or ensembl and map your reads against it to get gene/transcript abundance estimates.
I'm looking at the differential expression between the 3 conditions. The data was sequenced back in 2017 by another student, I'm just finishing up the project. The company performed all the data analysis for her and sent over the files but I'm unsure which files are the raw data and which are the analysed data as they are all BAM, BAI and FASTQ files and look identical. Thanks.
The raw data are the FASTQ files, nothing more.