Getting started¶
This section shows all the necessary steps to setup and run the default GRAPE pipeline on the provided test dataset.
Installation¶
Execute the following commands to install GRAPE:
$ mkdir grape2
$ cd grape2
# create an isolated python environment for GRAPE
grape2 $ virtualenv --no-site-packages .
# activate the virtual environment
grape2 $ source bin/activate
# install GRAPE (since GRAPE 2.0 development is in beta stage you will need the --pre option)
grape2 $ pip install distribute==0.6.36 zc.buildout==2.1.0
grape2 $ pip install grape-pipeline --pre
Activate the virtualenv¶
With the installation above you will need to activate the virtual environment each time you want to use GRAPE:
$ cd grape2
grape2 $ source bin/activate
Pipeline buildout¶
Once GRAPE is succesfully installed you need to run grape-buildout to setup the pipeline home folder with the required configuration files and modules.
The pipeline home folder location must be defined with the GRAPE_HOME environment variable:
grape2 $ mkdir pipeline
grape2 $ export GRAPE_HOME=$PWD/pipeline
grape2 $ grape-buildout
If everything goes well you should get the following output:
Creating directory '<grape_home>/bin'.
Creating directory '<grape_home>/parts'.
Creating directory '<grape_home>/eggs'.
Creating directory '<grape_home>/develop-eggs'.
Getting distribution for 'hexagonit.recipe.download'.
Got hexagonit.recipe.download 1.7.
Installing gem.
Downloading http://barnaserver.com/gemtools/releases/GEMTools-static-i3-1.6.2.tar.gz
grape.install_module: Extracting module package to <grape_home>/modules/gemtools/1.6.2
Installing flux.
Downloading http://sammeth.net/artifactory/barna/barna/barna.capacitor/1.2.4/flux-capacitor-1.2.4.tgz
grape.install_module: Extracting module package to <grape_home>/modules/flux/1.2.4
Installing samtools.
Downloading http://genome.crg.es/~epalumbo/grape/modules/samtools-0.1.19.tgz
grape.install_module: Extracting module package to <grape_home>/modules/samtools/0.1.19
Installing crgtools.
Downloading http://genome.crg.es/~epalumbo/grape/modules/crgtools-0.1.tgz
grape.install_module: Extracting module package to <grape_home>/modules/crgtools/0.1
Installing testdata.
Downloading http://genome.crg.es/~epalumbo/grape/testdata.tgz
testdata: Extracting package to <grape_home>
Removing directory '<grape_home>/bin'.
Removing directory '<grape_home>/develop-eggs'.
Removing directory '<grape_home>/eggs'.
Removing directory '<grape_home>/parts'.
Project¶
To run the pipeline you will need to create a folder for the project and initalize it with the grape init command:
grape2 $ mkdir project
grape2 $ cd project
project $ grape init
Initializing project ... Done
A project has been created and initialized with an empty configuration. For further information about GRAPE projects please see Projects
Reference files¶
The reference genome and annotation files for the project must be set with the grape config command:
project $ grape config --set genome $GRAPE_HOME/testdata/genome/H.sapiens.genome.hg19.test.fa
project $ grape config --set annotation $GRAPE_HOME/testdata/annotation/H.sapiens.EnsEMBL.55.test.gtf
project $ grape config
Project: 'Default project'
========== =========================================
genome genomes/H.sapiens.genome.hg19.test.fa
annotation annotations/H.sapiens.EnsEMBL.55.test.gtf
========== =========================================
Fastq files¶
To import the test RNA-seq data into the project you have to run the grape scan command:
grape2 $ grape scan $GRAPE_HOME/testdata/reads
Scanning <grape_home>/testdata/reads folder ... 4 fastq files found
Checking known data ... 4 new files found
Adding 'testB': data/testB_1.fastq.gz
Adding 'testB': data/testB_2.fastq.gz
Adding 'testA': data/testA_1.fastq.gz
Adding 'testA': data/testA_2.fastq.gz
You can check that the files were correctly imported with the grape list command:
grape2 $ grape list
Project: 'Default project'
2 datasets registered in project
===== ====================== =====
id path type
===== ====================== =====
testA reads/testA_2.fastq.gz fastq
testA reads/testA_1.fastq.gz fastq
testB reads/testB_1.fastq.gz fastq
testB reads/testB_2.fastq.gz fastq
===== ====================== =====
Running the pipeline¶
You can run the pipeline for all the test files from within the project folder with the grape run command. Before actually running, you can perform a dry run:
project $ grape run --dry
This command will show you the pipeline graph and commands for all the samples. For one sample (e.g. testA) you can do:
project $ grape run testA --dry
To submit the pipeline to a HPC cluster environment replace the run command with the submit command. A dry run will also show you information about the jobs that will be submitted such as threads, memory, queue, etc..
For more information about running GRAPE please see Pipeline Execution.