Output files

Upon completion of the analysis, all files marked with temp() are deleted, therefore the user needs to specify what result files must be copied to the results directory.
The output files in Poppy are defined in the config/output_files.yaml which can be altered to your need.

NB: If you want to make sure that all the results files are kept, use --notemp when launching snakemake.

Expand to view current output_files.yaml
directory: ./results
files:
  - name: Alignment BAM file
    input: alignment/samtools_merge_bam/{sample}_{type}.bam
    output: bam/{sample}_{type}.bam

  - name: Alignment BAM file index
    input: null
    output: bam/{sample}_{type}.bam.bai

  - name: MultiQC
    input: qc/multiqc/multiqc_DNA.html
    output: qc/multiqc_DNA.html

  - name: SNV ensemble soft filtered VCF file
    input: snv_indels/bcbio_variation_recall_ensemble/{sample}_{type}.ensembled.vep_annotated.artifact_annotated.background_annotated.filter.somatic_hard.filter.somatic.vcf.gz
    output: vcf/{sample}_{type}.filter.somatic.vcf.gz

  - name: Caller-specific VCF file
    input: snv_indels/{caller}/{sample}_{type}.merged.vcf.gz
    output: vcf/{sample}_{type}.{caller}.vcf.gz

  - name: Pindel VCF file
    input: cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.artifact_annotated.filter.somatic_hard.filter.pindel.vcf.gz
    output: vcf/{sample}_{type}.pindel.vep_annotated.filter.pindel.vcf.gz

  - name: SVDB CNV VCF file
    input: cnv_sv/svdb_query/{sample}_{type}.pathology.svdb_query.vcf.gz
    output: cnv/{sample}/{sample}_{type}.pathology.svdb_query.vcf.gz

  - name: CNV HTML report, pathology TC
    input: reports/cnv_html_report/{sample}_{type}.pathology.cnv_report.html
    output: cnv/{sample}/{sample}_{type}.pathology.cnv_report.html

  - name: CNV HTML report, purecn TC
    input: reports/cnv_html_report/{sample}_{type}.purecn.cnv_report.html
    output: cnv/{sample}/{sample}_{type}.purecn.cnv_report.html

Default

The following files are located in the results/-folder:

File Format Description
bam/{sample}_{type}.bam bam Deduplicated alignmentfile
bam/{sample}_{type}.bam.bai bai Index to deduplicated alignmentfile
vcf/{sample}_{type}.filter.somatic.vcf.gz vcf.gz Called snvs decopressed, normalized, vep annotated and softfilterd in variant call format (bgzipped)
vcf/{sample}_{type}.{caller}.vcf.gz vcf.gz SNVs called by each caller (see snvs for more detail)
vcf/{sample}_{type}.pindel.filter.pindel.vcf.gz vcf.gz Sdmall indels called by pindel over limited regions defined in config[pindel_call][include_bed]
cnv/{sample}/{sample}_{type}.pathology.svdb_query.vcf.gz vcf.gz CNV calls from CNVkit and GATK in variant call format
cnv/{sample}/{sample}_{type}.pathology.cnv_report.html html html-report with CNV calls using tumour content defined in samples.tsv
cnv/{sample}/{sample}_{type}.purecn.cnv_report.html* html html-report with CNV calls using tumour content estimated by pureCN
qc/multiqc_DNA.html html Aggregated qc results (see below)
* PureCN is throwing silent errors. Tumor content is not estimated and output from CNVkit and GATK will run and I think it will assume 0.8 tumor content instead for these.

MultiQC report

Poppy produces a MultiQC-report for the entire sequencing run to enable easier QC tracking. It can be used in the lab in order to decide if a sample needs to be resequenced or not.
The report starts with a general statistics table showing the most important QC-values followed by additional QC data and diagrams. The entire MultiQC html-file is interactive and you can filter, highlight, hide or export data using the ToolBox at the right edge of the report.

Output files reference pipeline

The output files in the Poppy references pipeline are defined in the config/output_files_references.yaml

Expand to view current output_files_references.yaml
directory: ./reference_files
files:
  - name: CNVkit panel of normals
    input: references/cnvkit_build_normal_reference/cnvkit.PoN.cnn
    output: cnvkit.PoN.cnn

  - name: GATK interval list
    input: references/preprocess_intervals/design.preprocessed.interval_list
    output: design.preprocessed.interval_list

  - name: GATK panel of normals
    input: references/create_read_count_panel_of_normals/gatk_cnv_panel_of_normal.hdf5
    output: gatk.PoN.hdf5

  - name: SVDB database
    input: references/svdb_export/svdb_cnv.vcf
    output: svdb_cnv.vcf

  - name: purecn normaldb
    input: references/purecn_normal_db/output/normalDB.rds
    output: purecn_normal_db.rds

  - name: purecn mapping bias
    input: references/purecn_normal_db/output/mapping_bias.rds
    output: purecn_mapping_bias.rds

  - name: purecn intervals
    input: references/purecn_interval_file/targets_intervals.txt
    output: purecn_targets_intervals.txt

  - name: artifacts tsv-file
    input: references/create_artifact_file/artifact_panel.tsv
    output: artifact_panel.tsv

  - name: artifacts pindel tsv-file
    input: references/create_artifact_file_pindel/artifact_panel.tsv
    output: artifact_panel_pindel.tsv

  - name: background tsv-file
    input: "references/create_background_file/background_panel.tsv"
    output: background_panel.tsv