[R] ChIP-seq 분석

2020. 1. 5. 15:51

[R] ChIP-seq 분석 Start

BioinformaticsAndMe

ChIP-seq

: ChIP-seq(ChIP-sequencing)은 Chromatin ImmunoPrecipitation sequencing의 약자로, DNA에 결합하는 단백질 분석에 사용되는 방법

: 특정 단백질과 결합된 DNA를 면역침강방법으로 분리하여 해당 서열을 확인

*Microarray로 서열 확인 → ChIP-chip

*NGS로 서열 확인 → ChIP-seq

: ChIP-seq은 전사인자 또는 염색질관련단백질들이 Phenotype(주로 발현)에 미치는 기작을 연구하기 위해 사용

: 'ChIPseeker'는 ChIP-seq 분석을 간단하게 수행할 수 있는 R 패키지로 높은 인용수를 보임

1. 패키지 설치 및 로딩

#ChIPseeker 설치
source("https://bioconductor.org/biocLite.R")
biocLite("ChIPseeker")
biocLite("clusterProfiler")

library(ChIPseeker)
library(clusterProfiler)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene

2. ChIP profiling

1) 분석에 사용될 예제 샘플들 확인
files <- getSampleFiles()
peak <- readPeakFile(files[[4]])
peak

2) ChIP peaks coverage plot
covplot(peak, weightCol="V5")
covplot(peak, weightCol="V5", chrs=c("chr17", "chr18"), xlim=c(4.5e7, 5e7))

3) Transcription Start Site(TSS)의 ChIP peak 결과를 heatmap으로 보여주기
promoter <- getPromoters(TxDb=txdb, upstream=3000, downstream=3000)
tagMatrix <- getTagMatrix(peak, windows=promoter)
tagHeatmap(tagMatrix, xlim=c(-3000, 3000), color="red")

3. Peak annotation

1) ChIP-seq peak에 annotation 하기 (파이 차트)
peakAnno <- annotatePeak(files[[4]], tssRegion=c(-3000, 3000), TxDb=txdb, annoDb="org.Hs.eg.db")
plotAnnoPie(peakAnno)

2) TSS에 결합하는 Transcription Factor(TF)의 상대적 위치 그리기
plotDistToTSS(peakAnno)

4. Functional enrichment analysis

1) Peak 부분에 대한 생물학적 패스웨이 분석 시각화
source("https://bioconductor.org/biocLite.R")
biocLite("ReactomePA")
library(ReactomePA)

pathway1 <- enrichPathway(as.data.frame(peakAnno)$geneId)
dotplot(pathway1)

5. ChIP peak data set comparison

1) 여러 데이터 셋의 ChIP peak 결과를 동시 비교 시각화
promoter <- getPromoters(TxDb=txdb, upstream=3000, downstream=3000)
tagMatrixList <- lapply(files, getTagMatrix, windows=promoter)

2) 여러 데이터 셋의 ChIP peak annotation 결과를 동시 비교 시각화
peakAnnoList <- lapply(files, annotatePeak, TxDb=txdb, tssRegion=c(-3000, 3000), verbose=FALSE)
genes = lapply(peakAnnoList, function(i) as.data.frame(i)$geneId)
names(genes) = sub("_", "\n", names(genes))
compKEGG <- compareCluster(geneCluster=genes, fun="enrichKEGG", pvalueCutoff=0.05, pAdjustMethod="BH")
plot(compKEGG, showCategory=15, title="KEGG Pathway Enrichment Analysis")

6. Data Mining with ChIP seq data

1) GEO(Gene Expression Omnibus)의 ChIP 데이터 수집
getGEOspecies()
getGEOgenomeVersion()
hg19 <- getGEOInfo(genome="hg19", simplify=TRUE)

2) GEO ChIP 데이터 다운로드
downloadGEObedFiles(genome="hg19", destDir="hg19")
gsm <- hg19$gsm[sample(nrow(hg19), 10)]
downloadGSMbedFiles(gsm, destDir="hg19")

#Reference

1) https://en.wikipedia.org/wiki/ChIP-sequencing

2) https://www.insilicogen.com/blog/tag/CHIP-Seq

3) https://www.ncbi.nlm.nih.gov/pubmed/25765347

4) http://bioconductor.org/packages/devel/bioc/vignettes/ChIPseeker/inst/doc/ChIPseeker.html

[R] ChIP-seq 분석 End

BioinformaticsAndMe

저작자표시 (새창열림)

'R' 카테고리의 다른 글

[R] Logistic regression (로지스틱 회귀분석) (0)	2020.01.21
[R] Multiple linear regression (다중회귀분석) (1)	2020.01.14
[R] Circos plot (0)	2019.12.30
[R] ggplot2 (0)	2019.12.16
[R] 상자그림(Box plot) (0)	2019.12.10

BioinformaticsAndMe

[R] ChIP-seq 분석

'R' 카테고리의 다른 글

+ Recent posts

티스토리툴바