Welcome to the Reproducible Bioinformatics project

Welcome to the Reproducible Bioinformatics project - Reproducible Bioinformatics

The aim of Reproducible Bioinformatics project is the creation of easy to use Bioinformatics workflows that fullfill the following roles (Sandve et al. PLoS Comp Biol. 2013):

  1. For Every Result, Keep Track of How It Was Produced
  2. Avoid Manual Data Manipulation Steps
  3. Archive the Exact Versions of All External Programs Used
  4. Version Control All Custom Scripts
  5. Record All Intermediate Results, When Possible in Standardized Formats
  6. For Analyses That Include Randomness, Note Underlying Random Seeds
  7. Always Store Raw Data behind Plots
  8. Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected
  9. Connect Textual Statements to Underlying Results
  10. Provide Public Access to Scripts, Runs, and Results


The paper on the Reproducible Bioinformatics project is on BMC Bioinformatics (Kulkarni et al. 2018).

The paper on rCASC: reproducible classification analysis of single-cell sequencing data ins on GigaScience (Alessandri et al. 2019)



The Reproducible Bioinformatics project

Reproducible Bioinformatics is a non-profit and open-source project.

We are a group of Bioinformaticians interested to simplify the use of bioinformatics tools to Biologists w/wo scripting ability. At the same time we are interested in providing robust and reproducible workflows.

For this reason we have developed the docker4seq package.

At the present time a total of four workflows are available:

Under development are:

  • PDX workflow: variants calling in patient derived xenograft (PDX) from RNAseq and EXOMEseq data
  • Metagenomics workflow

All workflows are controlled by a set of R fuctions, part of docker4seq package, and the algorithms used are all encapsulated into Docker images and stored at docker.io/repbioinfo repository.

More info on docker4seq here


4SeqGUI is the GUI that can be used to control  docker4seq functionalities.

Video tutorials for 4SeqGUI:

HowTo run a full RNAseq analysis

HowTo run a full miRNAseq analysis

HowTo run a full ChIPseq analysis



How to be part of the Reproducible Bioinformatics project

Any bioinformatician interested to embed specific applications in the available workflows or interested to develop a new workflow is requested to embed the application(s) in a docker image, save it in a public repository and configure one or more R functions that can be used to interact with the docker image.

Steps required to submit a new application/workflow:

  • Edit the skeleton.R function and the ubuntu docker image (docker.io/repbioinfo/ubuntu) to create the new application.
  • Create a public docker repository for the docker image, e.g. at docker.com.
  • Create a workflow.Rmd vignette using RStudio and publish it via RStudio. As example of a vignette see docker4seq vignette.
  • Once the docker image, the function(s) and vignette are ready please fill this submission form. 
    • We will test and incorporate the code in docker4seq package. 
    • Mantainers will be responsable of the maintainance of their application(s).

If you are interested to participate to the project or if you need more information please contact info@reproducible-bioinformatics.org

The SeqBox Project

Short reads sequencing technology has been used for more than a decade now. However, the analysis of RNAseq and ChIPseq data is still computational demanding and the simple access to raw data does not guarantee results reproducibility between laboratories. To address these two aspects, we developed SeqBox, a cheap, efficient and reproducible RNAseq/ChIPseq hardware/software solution based on NUC6I7KYK mini-PC (an Intel consumer game computer with a fast processor and a high performance SSD disk), and Docker container platform. In SeqBox the analysis of RNAseq and ChIPseq data is supported by a friendly GUI. This allows access to fast and reproducible analyses also to scientists with/without scripting experience.

More info on SeqBox characteristics and cost are available at www.seqbox.com


rCASC (reproducible Classification Analysis of Single Cell Sequencing Data) is part of this project and provides single cell analysis functionalities within the reproducible rules described by Sandve et al. PLoS Comp Biol. 2013. rCASC is designed to provide a complete workflow for cell-subpopulation discovery.

More info on rCASC can be found here

Reproducible Bioinformatics

Bx2MVia Nizza 5210126 c/o B&Gu@MBC TorinoTel: +39 0116706454info@reproducibile-bioinformatics.org