Output files
Upon completion of the analysis, all files marked with temp() are deleted, therefore the user needs to specify what result files must be copied to the results directory.
The output files in Poppy are defined in the config/output_files.yaml which can be altered to your need.
NB: If you want to make sure that all the results files are kept, use --notemp when launching snakemake.
Expand to view current output_files.yaml
directory: ./results
files:
- name: Alignment BAM file
input: alignment/samtools_merge_bam/{sample}_{type}.bam
output: bam/{sample}_{type}.bam
- name: Alignment BAM file index
input: null
output: bam/{sample}_{type}.bam.bai
- name: MultiQC
input: qc/multiqc/multiqc_DNA.html
output: qc/multiqc_DNA.html
- name: SNV ensemble soft filtered VCF file
input: snv_indels/bcbio_variation_recall_ensemble/{sample}_{type}.ensembled.vep_annotated.artifact_annotated.background_annotated.filter.somatic_hard.filter.somatic.vcf.gz
output: vcf/{sample}_{type}.filter.somatic.vcf.gz
- name: Caller-specific VCF file
input: snv_indels/{caller}/{sample}_{type}.merged.vcf.gz
output: vcf/{sample}_{type}.{caller}.vcf.gz
- name: Pindel VCF file
input: cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.artifact_annotated.filter.somatic_hard.filter.pindel.vcf.gz
output: vcf/{sample}_{type}.pindel.vep_annotated.filter.pindel.vcf.gz
- name: SVDB CNV VCF file
input: cnv_sv/svdb_query/{sample}_{type}.pathology.svdb_query.vcf.gz
output: cnv/{sample}/{sample}_{type}.pathology.svdb_query.vcf.gz
- name: CNV HTML report, pathology TC
input: reports/cnv_html_report/{sample}_{type}.pathology.cnv_report.html
output: cnv/{sample}/{sample}_{type}.pathology.cnv_report.html
- name: CNV HTML report, purecn TC
input: reports/cnv_html_report/{sample}_{type}.purecn.cnv_report.html
output: cnv/{sample}/{sample}_{type}.purecn.cnv_report.html
Default
The following files are located in the results/-folder:
| File | Format | Description |
|---|---|---|
bam/{sample}_{type}.bam |
bam | Deduplicated alignmentfile |
bam/{sample}_{type}.bam.bai |
bai | Index to deduplicated alignmentfile |
vcf/{sample}_{type}.filter.somatic.vcf.gz |
vcf.gz | Called snvs decopressed, normalized, vep annotated and softfilterd in variant call format (bgzipped) |
vcf/{sample}_{type}.{caller}.vcf.gz |
vcf.gz | SNVs called by each caller (see snvs for more detail) |
vcf/{sample}_{type}.pindel.filter.pindel.vcf.gz |
vcf.gz | Sdmall indels called by pindel over limited regions defined in config[pindel_call][include_bed] |
cnv/{sample}/{sample}_{type}.pathology.svdb_query.vcf.gz |
vcf.gz | CNV calls from CNVkit and GATK in variant call format |
cnv/{sample}/{sample}_{type}.pathology.cnv_report.html |
html | html-report with CNV calls using tumour content defined in samples.tsv |
cnv/{sample}/{sample}_{type}.purecn.cnv_report.html* |
html | html-report with CNV calls using tumour content estimated by pureCN |
qc/multiqc_DNA.html |
html | Aggregated qc results (see below) |
| * PureCN is throwing silent errors. Tumor content is not estimated and output from CNVkit and GATK will run and I think it will assume 0.8 tumor content instead for these. |
MultiQC report
Poppy produces a MultiQC-report for the entire sequencing run to enable easier QC tracking. It can be used in the lab in order to decide if a sample needs to be resequenced or not.
The report starts with a general statistics table showing the most important QC-values followed by additional QC data and diagrams. The entire MultiQC html-file is interactive and you can filter, highlight, hide or export data using the ToolBox at the right edge of the report.
Output files reference pipeline
The output files in the Poppy references pipeline are defined in the config/output_files_references.yaml
Expand to view current output_files_references.yaml
directory: ./reference_files
files:
- name: CNVkit panel of normals
input: references/cnvkit_build_normal_reference/cnvkit.PoN.cnn
output: cnvkit.PoN.cnn
- name: GATK interval list
input: references/preprocess_intervals/design.preprocessed.interval_list
output: design.preprocessed.interval_list
- name: GATK panel of normals
input: references/create_read_count_panel_of_normals/gatk_cnv_panel_of_normal.hdf5
output: gatk.PoN.hdf5
- name: SVDB database
input: references/svdb_export/svdb_cnv.vcf
output: svdb_cnv.vcf
- name: purecn normaldb
input: references/purecn_normal_db/output/normalDB.rds
output: purecn_normal_db.rds
- name: purecn mapping bias
input: references/purecn_normal_db/output/mapping_bias.rds
output: purecn_mapping_bias.rds
- name: purecn intervals
input: references/purecn_interval_file/targets_intervals.txt
output: purecn_targets_intervals.txt
- name: artifacts tsv-file
input: references/create_artifact_file/artifact_panel.tsv
output: artifact_panel.tsv
- name: artifacts pindel tsv-file
input: references/create_artifact_file_pindel/artifact_panel.tsv
output: artifact_panel_pindel.tsv
- name: background tsv-file
input: "references/create_background_file/background_panel.tsv"
output: background_panel.tsv