Hello!
I am assembling with a reference genome using the BAM files obtained with STAR from 2 samples against the reference genome "MG3_CGAv1.fa". For this I have used the Cufflinks program following the pipeline below.
The assembly I get (transcripts_consensus.fa) seems to be good, with a Completeness (BUSCO analysis) of 95% and a large number of CDS detected. However, when I map the reads against the consensus transcriptome obtained with Cufflinks using the RSEM program, less than 10% map (evidently there is some problem). If I map my reads against that generated transcriptome with STAR, again, almost none of them map.
I thought that I could be indicating the strand specificity wrong, but the libraries were built following the "TruSeq Stranded mRNA Library Prep Kit (dUTP protocol)" so I specified that the type of library is fr-firststrand in Cufflinks and in RSEM I map as stranded reverse (even indicating non-stranded the result does not change).
Do you see something in the script that could generate this later problem in the RSEM quantification step?
Thanks!
module load cufflinks
module load samtools
# Create a list file with the names of the sorted BAM files
ls *.sorted.bam > sorted_bam_list.txt
# Define the variable bam_list
bam_list="sorted_bam_list.txt"
# Iterate over each file in the list
while IFS= read -r bam_file; do
# Extract the base filename of the file (without the extension)
sample_name=$(basename "$bam_file" .sorted.bam)
# Create a unique output directory for each sample
output_dir="cufflinks_output_${sample_name}"
# Create the output list file for Cuffmerge
echo "$output_dir/transcripts.gtf" >> gtf_list.txt
#Run Cufflinks
cufflinks -o "$output_dir" --library-type fr-firststrand -G MG3_CGAv.gtf -b MG3_CGAv1.fasta "$bam_file"
# Rename the resulting GTF file to a sample specific name
mv "$output_dir/transcripts.gtf" "$output_dir/transcripts_${sample_name}.gtf"
done < "$bam_list"
# Create a list file with the GTF files produced by Cufflinks
ls -d cufflinks_output*/transcripts*.gtf > gtf_list.txt
# Run Cuffmerge to combine the results
cuffmerge -o cuffmerge_output -g MG3_CGAv.gtf -s MG3_CGAv1.fasta -p 4 gtf_list.txt
# Get the output GTF file from Cuffmerge
merged_gtf=cuffmerge_output/merged.gtf
# Use gffread to convert the GTF file to FASTA
gffread -w transcripts_consensus.fa -g MG3_CGAv1.fasta $merged_gtf