Low mapping after Cufflinks assembly
0
0
Entering edit mode
6 months ago

Hello!

I am assembling with a reference genome using the BAM files obtained with STAR from 2 samples against the reference genome "MG3_CGAv1.fa". For this I have used the Cufflinks program following the pipeline below.

The assembly I get (transcripts_consensus.fa) seems to be good, with a Completeness (BUSCO analysis) of 95% and a large number of CDS detected. However, when I map the reads against the consensus transcriptome obtained with Cufflinks using the RSEM program, less than 10% map (evidently there is some problem). If I map my reads against that generated transcriptome with STAR, again, almost none of them map.

I thought that I could be indicating the strand specificity wrong, but the libraries were built following the "TruSeq Stranded mRNA Library Prep Kit (dUTP protocol)" so I specified that the type of library is fr-firststrand in Cufflinks and in RSEM I map as stranded reverse (even indicating non-stranded the result does not change).

Do you see something in the script that could generate this later problem in the RSEM quantification step?

Thanks!

module load cufflinks
module load samtools

# Create a list file with the names of the sorted BAM files
ls *.sorted.bam > sorted_bam_list.txt

# Define the variable bam_list
bam_list="sorted_bam_list.txt"

# Iterate over each file in the list
while IFS= read -r bam_file; do
    # Extract the base filename of the file (without the extension)
    sample_name=$(basename "$bam_file" .sorted.bam)

    # Create a unique output directory for each sample
    output_dir="cufflinks_output_${sample_name}"

    # Create the output list file for Cuffmerge
    echo "$output_dir/transcripts.gtf" >> gtf_list.txt

    #Run Cufflinks
    cufflinks -o "$output_dir" --library-type fr-firststrand -G MG3_CGAv.gtf -b MG3_CGAv1.fasta "$bam_file"

    # Rename the resulting GTF file to a sample specific name
    mv "$output_dir/transcripts.gtf" "$output_dir/transcripts_${sample_name}.gtf"

done < "$bam_list"

# Create a list file with the GTF files produced by Cufflinks
ls -d cufflinks_output*/transcripts*.gtf > gtf_list.txt

# Run Cuffmerge to combine the results
cuffmerge -o cuffmerge_output -g MG3_CGAv.gtf -s MG3_CGAv1.fasta -p 4 gtf_list.txt

# Get the output GTF file from Cuffmerge
merged_gtf=cuffmerge_output/merged.gtf

# Use gffread to convert the GTF file to FASTA
gffread -w transcripts_consensus.fa -g MG3_CGAv1.fasta $merged_gtf
assembly gffread cuffmerge Cufflinks RSEM • 242 views
ADD COMMENT

Login before adding your answer.

Traffic: 2368 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6