Rules specific to Poppy that are not defined in Hydra Genetics
pindel_processing.smk
These are custom rules created for Poppy to process the output from Pindel so that it can be processed by VEP.
Pindel creates an older type of VCF and therefore has to be processed slightly different than more modern VCFs. Here we add the AF and DP fields to the VCF INFO column, annotate the calls using vep and add artifact annotation based an on artifact panel created with the reference pipeline.
Rule
rule pindel_processing_annotation_vep:
input:
cache=config.get("vep", {}).get("vep_cache", ""),
fasta=config["reference"]["fasta"],
tabix="cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vcf.gz.tbi",
vcf="cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vcf.gz",
output:
vcf=temp("cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.vcf"),
params:
extra=config.get("vep", {}).get("extra", "--pick"),
mode=config.get("vep", {}).get("mode", "--offline --cache --merged "),
log:
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.vcf.log",
benchmark:
repeat(
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.vcf.benchmark.tsv",
config.get("vep", {}).get("benchmark_repeats", 1),
)
threads: config.get("vep", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("vep", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("vep", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("vep", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("vep", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("vep", {}).get("time", config["default_resources"]["time"]),
container:
config.get("vep", {}).get("container", config["default_container"])
message:
"{rule}: vep annotate {input.vcf}"
script:
"../scripts/pindel_processing_annotation_vep.sh"
| Rule parameters |
Key |
Value |
Description |
| input |
cache |
config.get("vep", {}).get("vep_cache", "") |
path to vep cache directory from config["vep"]["vep_cache"] |
|
fasta |
config["reference"]["fasta"] |
path to fasta reference genome |
|
tabix |
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vcf.gz.tbi" |
vcf index file |
| _ _ |
vcf |
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vcf.gz" |
gzipped vcf file to be annotated |
| output |
vcf |
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.vcf" |
annotated (or incase of empty just copied) vcf file |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| container |
string |
From config["vep"]: Name of path to container containing the vep executable
|
| vep_cache |
string |
From config["vep"]: Path to offline VEP cache
|
| mode |
string |
From config["vep"]: VEP arguments for run mode
|
| extra |
string |
From config["vep"]: Additional command line arguments for VEP
|
| benchmark_repeats |
integer |
From config["vep"]: set number of times benchmark should be repeated |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Rule
There are instances where the VEP annotation is not added to a variant. This rule adds missing CSQ annotations back to the VCF file.
rule pindel_processing_add_missing_csq:
input:
vcf="cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.vcf.gz",
tbi="cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.vcf.gz.tbi",
output:
vcf="cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.csq_corrected.vcf",
params:
field="CSQ",
extra=config.get("pindel_processing_add_missing_csq", {}).get("extra", ""),
log:
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.csq_corrected.vcf.log",
benchmark:
repeat(
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.csq_corrected.vcf.benchmark.tsv",
config.get("pindel_processing_add_missing_csq", {}).get("benchmark_repeats", 1),
)
threads: config.get("pindel_processing_add_missing_csq", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("pindel_processing_add_missing_csq", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("pindel_processing_add_missing_csq", {}).get(
"mem_per_cpu", config["default_resources"]["mem_per_cpu"]
),
partition=config.get("pindel_processing_add_missing_csq", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("pindel_processing_add_missing_csq", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("pindel_processing_add_missing_csq", {}).get("time", config["default_resources"]["time"]),
container:
config.get("pindel_processing_add_missing_csq", {}).get("container", config["default_container"])
message:
"{rule}: if need be, add missing CSQ annotation to variants in {input.vcf}"
script:
"../scripts/pindel_processing_add_missing_csq.py"
| Rule parameters |
Key |
Value |
Description |
| input |
vcf |
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.vcf.gz" |
gzipped vcf to be corrected for missing CSQ |
| _ _ |
tbi |
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.vcf.gz.tbi" |
tbi index to input.vcf |
| output |
vcf |
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.csq_corrected.vcf" |
annotated vcf file with blank CSQ if needed |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Rule
rule pindel_processing_fix_af:
input:
vcf="cnv_sv/pindel_vcf/{sample}_{type}.no_tc.vcf",
output:
vcf="cnv_sv/pindel_vcf/{sample}_{type}.no_tc.fix_af.vcf",
params:
extra=config.get("pindel_processing_fix_af", {}).get("extra", ""),
log:
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.fix_af.vcf.log",
benchmark:
repeat(
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.fix_af.vcf.benchmark.tsv",
config.get("pindel_processing_fix_af", {}).get("benchmark_repeats", 1),
)
threads: config.get("pindel_processing_fix_af", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("pindel_processing_fix_af", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("pindel_processing_fix_af", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("pindel_processing_fix_af", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("pindel_processing_fix_af", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("pindel_processing_fix_af", {}).get("time", config["default_resources"]["time"]),
container:
config.get("pindel_processing_fix_af", {}).get("container", config["default_container"])
message:
"{rule}: add af and dp to info field in {input.vcf}"
script:
"../scripts/pindel_processing_fix_af.py"
| Rule parameters |
Key |
Value |
Description |
| input |
vcf |
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.vcf" |
vcf where AF and DP is needed in INFO field |
| output |
vcf |
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.fix_af.vcf" |
vcf with added AF and DP in INFO field |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Rule
rule pindel_processing_artifact_annotation:
input:
vcf="cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.csq_corrected.vcf.gz",
tbi="cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.csq_corrected.vcf.gz.tbi",
artifacts=config["reference"]["artifacts_pindel"],
output:
vcf="cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.artifact_annotated.vcf",
params:
extra=config.get("pindel_processing_artifact_annotation", {}).get("extra", ""),
log:
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.artifact_annotated.vcf.log",
benchmark:
repeat(
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.artifact_annotated.vcf.benchmark.tsv",
config.get("pindel_processing_artifact_annotation", {}).get("benchmark_repeats", 1),
)
threads: config.get("pindel_processing_artifact_annotation", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("pindel_processing_artifact_annotation", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("pindel_processing_artifact_annotation", {}).get(
"mem_per_cpu", config["default_resources"]["mem_per_cpu"]
),
partition=config.get("pindel_processing_artifact_annotation", {}).get(
"partition", config["default_resources"]["partition"]
),
threads=config.get("pindel_processing_artifact_annotation", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("pindel_processing_artifact_annotation", {}).get("time", config["default_resources"]["time"]),
container:
config.get("pindel_processing_artifact_annotation", {}).get("container", config["default_container"])
message:
"{rule}: add artifact annotation on {input.vcf}, based on arifact_panel_pindel.tsv "
script:
"../scripts/pindel_processing_artifact_annotation.py"
| Rule parameters |
Key |
Value |
Description |
| input |
vcf |
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.csq_corrected.vcf.gz" |
gzipped vcf to be artifact annotated |
|
tbi |
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.csq_corrected.vcf.gz.tbi" |
tbi index to input.vcf |
| _ _ |
artifacts |
config["reference"]["artifacts_pindel"] |
tsv file with artifact pindel calls, created in reference pipeline |
| output |
vcf |
"cnv_sv/pindel_vcf/{sample}_{type}.no_tc.normalized.vep_annotated.artifact_annotated.vcf" |
vcf with artifact annotation |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
Since when running svdb --merge with the priority flag set, svdb cuts off the FORMAT column for cnvkit variants git issue. We use a non-Hydra Genetics rule for the svdb --merge command.
Rule
rule svdb_merge_wo_priority:
input:
vcfs=get_vcfs_for_svdb_merge,
output:
vcf=temp("cnv_sv/svdb_merge/{sample}_{type}.{tc_method}.merged.vcf"),
params:
extra=config.get("svdb_merge", {}).get("extra", ""),
overlap=config.get("svdb_merge", {}).get("overlap", 0.6),
bnd_distance=config.get("svdb_merge", {}).get("bnd_distance", 10000),
log:
"cnv_sv/svdb_merge/{sample}_{type}.{tc_method}.merged.vcf.log",
benchmark:
repeat(
"cnv_sv/svdb_merge/{sample}_{type}.{tc_method}.merged.benchmark.tsv",
config.get("svdb_merge", {}).get("benchmark_repeats", 1),
)
threads: config.get("svdb_merge", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("svdb_merge", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("svdb_merge", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("svdb_merge", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("svdb_merge", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("svdb_merge", {}).get("time", config["default_resources"]["time"]),
container:
config.get("svdb_merge", {}).get("container", config["default_container"])
message:
"{rule}: merges vcf files from different cnv callers into {output.vcf}"
shell:
"(svdb --merge "
"--vcf {input.vcfs} "
"--bnd_distance {params.bnd_distance} "
"--overlap {params.overlap} "
"{params.extra} "
"> {output.vcf}) 2> {log}"
| Rule parameters |
Key |
Value |
Description |
| input |
vcfs |
get_vcfs_for_svdb_merge |
a function get_vcfs_for_svdb_merge (common.smk) is used to list all files (eg. from different callers) that should be merge into a SVDB 'vcf' |
| output |
vcf |
"cnv_sv/svdb_merge/{sample}_{type}.{tc_method}.merged.vcf" |
a 'vcf' file containing the merged SV calls |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| container |
string |
Name or path to container containing the svdb executable
|
| tc_method |
array |
Tumor cell content estimation methods
|
| overlap |
number |
Minimum overlap between regions for merging
|
| extra |
string |
Additional arguments to pass to svdb
|
Resources settings (resources.yaml)
| Key |
Type |
Description |
| mem_mb |
integer |
max memory in MB to be available |
| mem_per_cpu |
integer |
memory in MB used per cpu |
| partition |
string |
partition to use on cluster |
| threads |
integer |
number of threads to be available |
| time |
string |
max execution time |
reference_rules.smk
Software used specifically to create the reference-files for Poppy.
Rule
rule reference_rules_create_artifact_file_pindel:
input:
vcfs=set([f"cnv_sv/pindel_vcf/{t.sample}_{t.type}.no_tc.normalized.vep_annotated.vcf.gz" for t in units.itertuples()]),
tbis=set([f"cnv_sv/pindel_vcf/{t.sample}_{t.type}.no_tc.normalized.vep_annotated.vcf.gz.tbi" for t in units.itertuples()]),
output:
artifact_panel=temp("references/create_artifact_file_pindel/artifact_panel.tsv"),
params:
extra=config.get("create_artifact_file_pindel", {}).get("extra", ""),
log:
"references/create_artifact_file_pindel/artifact_panel.tsv.log",
benchmark:
repeat(
"references/create_artifact_file_pindel/artifact_panel.tsv.benchmark.tsv",
config.get("create_artifact_file_pindel", {}).get("benchmark_repeats", 1),
)
threads: config.get("create_artifact_file_pindel", {}).get("threads", config["default_resources"]["threads"])
resources:
mem_mb=config.get("create_artifact_file_pindel", {}).get("mem_mb", config["default_resources"]["mem_mb"]),
mem_per_cpu=config.get("create_artifact_file_pindel", {}).get("mem_per_cpu", config["default_resources"]["mem_per_cpu"]),
partition=config.get("create_artifact_file_pindel", {}).get("partition", config["default_resources"]["partition"]),
threads=config.get("create_artifact_file_pindel", {}).get("threads", config["default_resources"]["threads"]),
time=config.get("create_artifact_file_pindel", {}).get("time", config["default_resources"]["time"]),
container:
config.get("create_artifact_file_pindel", {}).get("container", config["default_container"])
message:
"{rule}: create artifact PoN for pindel"
script:
"../scripts/create_artifact_file_pindel.py"
| Rule parameters |
Key |
Value |
Description |
| input |
vcfs |
set([f"cnv_sv/pindel_vcf/{t.sample}_{t.type}.no_tc.normalized.vep_annotated.vcf.gz" for t in units.itertuples()]) |
all (gzipped) vcfs to be used for artifact panel |
| _ _ |
tbis |
set([f"cnv_sv/pindel_vcf/{t.sample}_{t.type}.no_tc.normalized.vep_annotated.vcf.gz.tbi" for t in units.itertuples()]) |
tbi index to all input vcfs |
| output |
artifact_panel |
"references/create_artifact_file_pindel/artifact_panel.tsv" |
tsv file with chr, pos, svtype, median, sd, num_obs of detected variants |
Configuration
Software settings (config.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |
Resources settings (resources.yaml)
| Key |
Type |
Description |
| benchmark_repeats |
integer |
set number of times benchmark should be repeated |
| container |
string |
name or path to docker/singularity container |
| extra |
string |
parameters that should be forwarded |