ECAS-SFdS class on random forest

Practical information

ECAS-SFdS class on “Random forests: basics, extensions and applications” will be held on October 8-13, 2023 at Fréjus, France. I will be teaching in this class. This page gathers information on the material produced for participants and on practical recommandations for the computing sessions.

TL;DR:

  1. Clone the github repository of the class on your computer

  2. Download the course material as described in the README files of the directories */class/

  3. Download the data as described in the README files of the directories */data/

  4. Install R, RStudio, and R packages ggplot2, reshape2, SISIR, GENIE3, igraph, PRROC, and rfPermute

  5. Install Python, Jupyter notebook, and the Python librairies matplotlib, numpy, pyts, session_info, and sklearn

Material

My class will cover 2-3 topics (work in progress…) including:

  • random forest for functional data analysis (e.g., mostly time series)
    • slides (theoretical part).
    • practical part on “Using random forest for functional data with time-series random forest and BOSS random forest”: Analysis of the GunPoint dataset shared through Google Colab. This file is a Jupyter notebook also available in the directory fda/practical of the class github repository. You are free to either: i) use it directly on Google Colab by creating a copy (File / Save a copy in drive), ii) use it on your own computer (be sure to have the necessary Python libraries installed), or iii) use it in a RStudio cloud account.
    • practical part on “Interval selection for random forest with functional data”: Truffle analysis with random forest. This file is the HTML output of a Quarto file (i.e., Rmarkdow) and its source code is available in the directory fda/practical of the class github repository. You are free to either: i) directly use this file and copy/paste the code in an R terminal, ii) use directly the Quarto file on your own computer (make sure to have downloaded the data and installed the libraries), or iii) use the Quarto file in a RStudio cloud account. Data for this practical session have to be downloaded as described in README file of the directory fda/data of the class github repository.
  • random forest for network inference (in biology)
    • slides (theoretical part).
    • practical part on “Network inference with random forest”: Analysis of some expression data for Bacillus subtilis. This file is the HTML output of a Quarto file (i.e., Rmarkdow) and its source code is available in the directory /network/practical of the class github repository. You are free to either: i) directly use this file and copy/paste the code in an R terminal, ii) use directly the Quarto file on your own computer (make sure to have downloaded the data and installed the packages), or iii) use the Quarto file in a RStudio cloud account. Data for this practical session are included in the directory network/data of the class github repository.

Technical information

I am using Ubuntu 22.04 LTS (xubuntu distribution).

  • Python configuration: On my computer, I am using Python 3.10.12 with Jupyter notebook (6.4.12). The following librairies are required for the notebooks (versions are given for the records but the Google Colab versions are not the same and the notebook works perfectly):
    • matplotlib 3.6.2
    • numpy 1.23.5
    • pyts 0.13.0
    • session_info 1.0.0
    • sklearn 1.3.0
  • R configuration: I am using R 4.3.1 in RStudio (any recent version should work). The following packages are required for the Quarto files:
    • ggplot2 3.4.3
    • reshape2 1.4.4
    • SISIR 0.2.2
    • GENIE3 1.22.0
    • igraph 1.5.1
    • PRROC 1.3.1
    • rfPermute 2.5.2

For R, the renv configuration file is provided. If you want to use renv, the R command line renv::init() using the “Restore” option should properly install all the required packages for the practicals.