What Next?

class: center, middle, inverse, title-slide

.title[
# What Next?
]
.subtitle[
## more skills to improve your data analysis workflow
]
.author[
### John Thompson
]
.date[
### 29th March 2023
]

---

# Rmarkdown to Quarto 
 
- quarto extends rmarkdown to allow code chunks in R, Python, Julia and Javascript 
 
- Developed by **Posit** as part of their move to support other languages 
 
- At present rmarkdown files can be rendered with quarto without any adaptation 
 
- In future, development efforts will be directed at quarto

---
# Archiving Data

- archive scripts and the raw data and you do not need to archive data files or results files unless compute time is a issue

- create a local data repo with the R Packages such as `repo` and `archivist`

- small amounts of data can be saved to GitHub (50Mb limit)

- data repositories (see https://www.nature.com/sdata/policies/repositories) will host data for sharing, e.g. as part of a publication

- cloud computing providers will host your data. Amazon AWS, Microsoft Azure, Google cloud etc. (all offer a free service to get you started)

---
# Make Files

- for reproducibility, a project should be controlled by a make file (a master script that runs everything else).

- the R package `makepipe` creates a master script

- the R package `targets` creates a master script and caches intermediate results

- for maximum efficiency, `makepipe` and `targets` monitor changes to the code and only re-run code that is affected by the changes

---
# Package Versions

- packages are updated regularly, with bug fixes, new features etc. 
 
- code that works with one version of a package, may not work with another version 
 
- packages are only guaranteed to work together when their versions are in sync 
 
- code that works for you, might fail for someone else if they use a different version of a package 
 
- R sets up a single library of packages that is accessed by all of your projects 
 - updating might be good for one project and disastrous for another 
 
- `renv` allows separate libraries for every project and keeps track of the version numbers that you are using so that you can pass that information to a collaborator

---
# Your Own Packages

- a personal package is a convenient way to store your functions 
 
- functions in a package can be documented with `roxygen2` to provide help on the syntax 
 
- functions in a personal package can be used across all of your projects 
 
- you can upload your package to GitHub and share it with colleagues or even the whole world 
 
- do not go public without much thought. Maintaining a package is a big undertaking

---
# Functional programming

- Good Practice says that
 - modularisation of project code makes it easier to write and maintain 
 - using functions for repeated tasks makes scripts easier to write and maintain 
 - well-named functions act like comments 
 
- the `purrr` package provides ways of applying a function multiple times with different inputs, so called functional programming (FP) 
 - coding without loops 
 - it takes a while to adjust to full FP but eventually your code will be easier to follow and less likely to contain errors 
---
# Learn html and latex

- markdown only offers basic formatting. At some point, you will need a format that markdown does not provide 
 
- it is good to have a house-style for your documents. You could start with a markdown theme found on the internet, but eventually you will want to design your own 
 
- to have full control over an html document you need to know the html language and to control a pdf you need to know latex

---
# Learn Shiny

- shiny provides interactive visualisations 
 
- shiny dashboards are excellent for 
 - exploring large datasets 
 - communicating your results 
 
- visit https://shiny.rstudio.com/gallery/ to see what shiny can do

---
# Make Better Reports

- learn a tables package so that you can produce pretty tables 
 - `gt` and `flextable` are the best 
 - `kable` and `kableExtra` are simpler alternatives 
 
- improve your ggplot2 skills. e.g. use `patchwork` 
 
- learn how to produce maps in R 
 
- learn how to make diagrams in R, e.g. R package `diagrammeR`,
latex package **tikz** 
 
- handle BibTeX citation with an R package. see https://ropensci.org/blog/2020/05/07/rmd-citations/ 
---
# Use Machine Learning Methods 
 
- any fool can run fit an ML model, very few people understand those models 
 - understand boasting, bagging, regularization etc. 
 
- Machine Learning models often out-perform statistical models 
 - try tree-based methods, such as random forests or XGBoost 
 
- learn to analyse textual and image data 
 
- learn the basics of deep learning 
 - how do you fit a neural network? 
 - why are large neural networks so good at prediction? 
 - what are the problems of a model with a billion parameters? 
 - what is the difference between a transformer and a GAN?
---
# Integrate AI into Your Workflow

- deep learning will dominate data science, at least in the medium term 
 
- explore ways of using ChatGPT or GPT-4 in your daily work 
 - to explain a difficult concept 
 - to review a topic 
 - to write your next grant application 
 - to write R code 
 - to add comments to existing R code 
 - to find errors in your code 
 
- consider the ethics of using AI in research. 
 - What about the ethics of not using AI? 
 - Do you need a group policy? 
 
---
# Learn a Second Language

- many R packages use C++ in the background to provide speed 
 - learn C/C++ and the Rcpp package 
 
- learn python if you want to get deeper into machine learning 
 
- learn Julia or another of the modern languages 
 - nearly as fast a C but without the pain 
 - some good ML and Bayesian packages