There are two main ways to use this template:
In both cases, no one must utilize all provided packages and patterns, one can gradually improve reproducibility. This section will walk you through some of the core aspects to utilize this work.
In order to get reproducible and traceable, one must version control ideally documentation, code, data and execution environment.
knitr::opts_knit$set(root.dir = rprojroot::find_rstudio_root_file())
How to use this package:
Your Project should utilize R - although it may be possible to adapt to Python code aswell, it’s native to R.
Your Project should use Git.
Your Project MUST have a README
file
In order to start with your reproducible Project use the following steps to get the template running:
Create a RStudio-Project in the structure of a package - the template already established a project template.
Define a license for your Project. By default the template already uses gpl3 - usethis::use_gpl3_license()
is the command setting the license file.
Utilize an environment tool (renv
). The project template comes with an renv.lock
file and accompanying other files.
Lint your code to follow good styleguides. Project Template recommends running lintr:::addin_lint_package()
- if your package is in package structure, this lints all project files.
Decide where to host your data. The project template utilizes git for version control, and may be extended with Git-LFS. Use data
or data-raw
folders.
Use version control for everything - including code, data, the environment, and even the packaged dependencies.
Publish your work with a repository. The template is hosted under https://github.com/g4challenge/repro-fair-neuro-ds-template - when using GitHub change the path and name of your template.
Use continous integration. The template utizes .github/workflows
folders with the incorporation of pkgdown build of the Documentation, codecov and R-CMD-Check of the Execution in order to trace the results.
Use code tests. The template suggests using testthat
and documentation based tests like roxytest
.
Use software engineering practices like docstrings, roxytest. The template utilizes the vignettes
folder, roxygen2
and roxytest
for documentation and good practices.
use workflows like drake, targets. The vignettes/drake_spec.Rmd
file is one example on how to utilize the drake
package. A workflow captures inputs and outputs of executions in order to trace results. The template version controls even the executions within the .drake
folder, this must be curated for good release practices.
utilize PROV-Tools (rdtlite) - which creates a PROV-based representation of the workflow executions.
package your results - this work packages everything on Zenodo.org in the specifications according to ROCrate
Then you need to consider a few questions on how you want to start building your package:
The scenario 1 is one simple scenario involving several steps in a linear manner.
The file is in vignettes/scenario_1.Rmd
, it utilizes R/packages.R"
for package loading, R/functions.R
for general functions, R/functions_scenario_1.R
for scenario specific functions and the plan is defined in R/plan.R
Open the file scenario_1.Rmd
where the statement make(scenario_1)
does the execution, and the workflow of drake is visualized as graph.
This graph shows, the sequence from input question, to reading the dataset to producing a report.
testthat
is able to test the local package of your analysis using test_local()
the following code chunk tests all targets and executions of this package’s workflows.
testthat::test_local(".")
## Loading required package: tidyverse
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.3.3 ✔ purrr 0.3.4
## ✔ tibble 3.1.1 ✔ dplyr 1.0.6
## ✔ tidyr 1.1.3 ✔ stringr 1.4.0
## ✔ readr 1.4.0 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ purrr::is_null() masks testthat::is_null()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::matches() masks tidyr::matches(), testthat::matches()
## Loading required package: rmarkdown
## Loading required package: knitr
## Loading required package: magrittr
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
## The following objects are masked from 'package:testthat':
##
## equals, is_less_than, not
## ✔ | OK F W S | Context
##
⠏ | 0 | roxytest-tests-functions_scenario_1
⠏ | 0 | File R/functions_scenario_1.R: @tests
⠙ | 2 | File R/functions_scenario_1.R: @tests
✔ | 4 | File R/functions_scenario_1.R: @tests [0.3 s]
##
⠏ | 0 | roxytest-tests-functions_scenario_2
⠏ | 0 | File R/functions_scenario_2.R: @tests
✔ | 1 | File R/functions_scenario_2.R: @tests
##
⠏ | 0 | roxytest-tests-functions
⠏ | 0 | File R/functions.R: @tests
✔ | 2 | File R/functions.R: @tests
##
⠏ | 0 | roxytest-tests-plan
⠏ | 0 | File R/plan.R: @tests
✔ | 1 | File R/plan.R: @tests
##
## ══ Results ═════════════════════════════════════════════════════════════════════
## Duration: 0.3 s
##
## [ FAIL 0 | WARN 0 | SKIP 0 | PASS 8 ]
If you want to curate your results as release
- it may be useful and suitable to curate the traces. The following commands help to achieve this:
drake::clean() #clean the cache to force rebuild
drake::clean(destroy=TRUE) # clean cache and curate "one" clean run in the drake cache.
The second command cleans the .drake
folder and allows for version control of only the curated final release.
In order to achieve a track record of complete provenance within the template, rdtLite
is used in conjunction with a makefile, the make.R
is comparable and similar to the scenario_1.Rmd
except it can be captured by rdtLite as workflow run.
getwd() #validate path to project root
rdtLite::prov.run(r.script.path = "make.R", prov.dir=".prov/")
This results in the fresh creation of a .prov
subdirectory - which has provenance execution traces according to the rdtlite specification.
If you may want to continue development on this project and extend it further to your needs. Developing with the project template is comparable to using the template.
Description of steps during development, starting with Docker + RStudio right away.
git clone https://github.com/g4challenge/repro-fair-neuro-ds-template
docker run --rm -p 8787:8787 -e PASSWORD="1234" -v $(pwd):/home/rstudio my_fair_project
open your browser at http://localhost:8787
login using user=rstudio
password=1234
click open on the .rproj file.
Start using the project, adapt it to your needs + change the git remote
Local Docker build
docker build . -t my_fair_project
Local Docker run with default user rstudio
and PASSWORD=“1234” - use different password
docker run --rm -p 8787:8787 -e PASSWORD="1234" -v $(pwd):/home/rstudio g4challenge/repro-fair-neuro-ds-template
DESCRIPTION
fileNews.md
filegit tag -s v1.5 -m 'my signed 1.5 tag'
git push origin main
docker tag
Q: “Where should I start my documentation?”
A: start with the Readme.md
with the base context and efforts involved in your project. Then if necessary adapt and change the vignettes to your projects implementation.
Q: How do the various documentations relate to each other?
A: The readme
serves as starting and entrypoint, while the main documentation by pkgdown
is hosted as github page. There, all function documentation and vignettes are rendered and described.
Q: What adaptations do I have to make?
A: Ideally, you adapt/fork the template and start by clicking on “use this template” in GitHub. There a new traceable analysis project is created. You need to maintain Links to GitHub Pages, Zenodo and DockerHub by yourself.
Q: How can a prioritize these adaptations?
A: You start by the most pressing needs first - getting the template externally working. Priority one is fixing the traces and validating these.
Q: What is the minimal set of adaptations that I need to make?
A: Minimal is readme, links and the vignettes folder cleaning with one drake_spec, one function.R, one packages.R, one plan.R and finally one docker build. Then your analysis is version controlled and mostly traceable with comparatively low effort.
Q: Can I also apply this to other contexts within medicine, or even outside of a medical context or?
A: Yes, the main target is utilizing good clinical practice, STROBE and DAQCORD, but it may suit other needs as well.