Automated processing of ChIP-seq and ATAC-seq samples
process_epigenome.Rd
This function performs all necessary steps in the ChIP-seq processing pipeline.
Usage
process_epigenome(
fastq_files,
out_name = NULL,
seq_type = c("ATAC", "CHIP", "CT"),
type = "SE",
cores = 8,
path_fastqc = "FastQC/",
path_bam = "BAM/",
path_peaks = "Peaks/",
path_logs = "Logs/",
run_fastqc = TRUE,
index = "/vault/refs/indexes/hg38",
extra_bowtie2 = "",
remove = c("chrM", "chrUn", "_random", "_hap", "_gl", "EBVls"),
blacklist = "/vault/refs/hg38-blacklist.v2.bed",
type_peak = c("narrow", "broad"),
shift = c(TRUE, FALSE),
chunk = 1e+07,
gen_sizes = "/vault/refs/hg38.chromSizes.txt"
)
Arguments
- fastq_files
Character string (single-end) or character vector of length 2 (paired-end) with the file names of the samples to be analysed.
- out_name
Character vector, with the same length as
fastq_files
, indicating the output filenames.- seq_type
Experiment type, either "ATAC" (default) or "CHIP".
- type
Sequence type, one of "SE" (single end) or "PE" (paired end).
- cores
Number of threads to use for the analysis.
- path_fastqc
Character indicating the output directory for the FastQC reports.
- path_bam
Character indicating the output directory for the bam files.
- path_peaks
Character indicating the output directory for the peak files.
- path_logs
Character indicating the output directory for the logs.
- run_fastqc
Logical indcating whether to run (TRUE) or not (FALSE) FastQC. Default: TRUE.
- index
Character indicating the location and basename for the Bowtie2 index.
- extra_bowtie2
Character containing additional arguments to be passed to bowtie2 alignment call.
- remove
Character vector with chr that will be filtered out. Any chromosome name containing matches for these characters will be removed.
- blacklist
Character indicating the file containing blacklist regions in bed format. Any reads overlapping these regions will be discarded.
- type_peak
Character indicating the type of peak to be called with MACS2, either "narrow" or "broad".
- shift
Logical indicating whether the reads should be shifted -100bp and extended to 200bp (TRUE) or not (FALSE, default).
- chunk
Size of the chunk to load into memory for ATAC-seq read offset. This argument is necessary only when
type="SE"
.- gen_sizes
Character string indicating the path where the file with chromosome name and sizes can be found. This argument is necessary only when
type="SE"
.
Value
Creates the folders path_fastqc
, path_bam
, path_peaks
, path_logs
,
by default in your working directory, containing the output files from de different
analyses.
Details
This function ocesses ATAC-seq or ChIP-seq from FastQ files using the following pipeline:
Quality Control (FastQC).
Alignment to reference genome (Bowtie2).
Post-processing (Samtools), including removing duplicates, blacklisted regions and non-reference chromosomes.
(only for ATAC-seq) Offset correction (Samtools).
Peak calling (MACS2).
This function can process paired and single end FastQ files:
Single end files. The argument
fastq_files
should be a character vector with the name of each file.Paired end files. The argument
fastq_files
should be a list, where each element is a vector of size 1, where the first one is the R1 and the second one is the R2.