Skip to contents

This function performs all necessary steps in the ChIP-seq processing pipeline.

Usage

process_epigenome(
  fastq_files,
  out_name = NULL,
  seq_type = c("ATAC", "CHIP", "CT"),
  type = "SE",
  cores = 8,
  path_fastqc = "FastQC/",
  path_bam = "BAM/",
  path_peaks = "Peaks/",
  path_logs = "Logs/",
  run_fastqc = TRUE,
  index = "/vault/refs/indexes/hg38",
  extra_bowtie2 = "",
  remove = c("chrM", "chrUn", "_random", "_hap", "_gl", "EBVls"),
  blacklist = "/vault/refs/hg38-blacklist.v2.bed",
  type_peak = c("narrow", "broad"),
  shift = c(TRUE, FALSE),
  chunk = 1e+07,
  gen_sizes = "/vault/refs/hg38.chromSizes.txt"
)

Arguments

fastq_files

Character string (single-end) or character vector of length 2 (paired-end) with the file names of the samples to be analysed.

out_name

Character vector, with the same length as fastq_files, indicating the output filenames.

seq_type

Experiment type, either "ATAC" (default) or "CHIP".

type

Sequence type, one of "SE" (single end) or "PE" (paired end).

cores

Number of threads to use for the analysis.

path_fastqc

Character indicating the output directory for the FastQC reports.

path_bam

Character indicating the output directory for the bam files.

path_peaks

Character indicating the output directory for the peak files.

path_logs

Character indicating the output directory for the logs.

run_fastqc

Logical indcating whether to run (TRUE) or not (FALSE) FastQC. Default: TRUE.

index

Character indicating the location and basename for the Bowtie2 index.

extra_bowtie2

Character containing additional arguments to be passed to bowtie2 alignment call.

remove

Character vector with chr that will be filtered out. Any chromosome name containing matches for these characters will be removed.

blacklist

Character indicating the file containing blacklist regions in bed format. Any reads overlapping these regions will be discarded.

type_peak

Character indicating the type of peak to be called with MACS2, either "narrow" or "broad".

shift

Logical indicating whether the reads should be shifted -100bp and extended to 200bp (TRUE) or not (FALSE, default).

chunk

Size of the chunk to load into memory for ATAC-seq read offset. This argument is necessary only when type="SE".

gen_sizes

Character string indicating the path where the file with chromosome name and sizes can be found. This argument is necessary only when type="SE".

Value

Creates the folders path_fastqc, path_bam, path_peaks, path_logs, by default in your working directory, containing the output files from de different analyses.

Details

This function ocesses ATAC-seq or ChIP-seq from FastQ files using the following pipeline:

  1. Quality Control (FastQC).

  2. Alignment to reference genome (Bowtie2).

  3. Post-processing (Samtools), including removing duplicates, blacklisted regions and non-reference chromosomes.

  4. (only for ATAC-seq) Offset correction (Samtools).

  5. Peak calling (MACS2).

This function can process paired and single end FastQ files:

  • Single end files. The argument fastq_files should be a character vector with the name of each file.

  • Paired end files. The argument fastq_files should be a list, where each element is a vector of size 1, where the first one is the R1 and the second one is the R2.

Examples

if (FALSE) {
process_epigenome(fastq_files=c("path/to/file.fastq.gz", "path/to/file2.fastq.gz"),
                  seq_type="ATAC",
                  out_name=c("sample1", "sample2"),
                  type="SE",
                  cores=8)
}