r

pheatmap으로 heatmap그리기 2018.09.11
R에서 Dataframe 합치기 2018.09.05
centos 6에서 R 설치를 위한 라이브러리 설치 2018.07.25 2
Arguments in R 2018.07.25

pheatmap으로 heatmap그리기

바닐라스카이 2018. 9. 11. 13:41

2018. 9. 11. 13:41

pheatmap으로 heatmap그리기

pheatmap은 pretty heamaps의 약자로 heatmap을 그릴 때 더 이쁘고 쉽게 그려보자는 취지에서 만든 R 패키지이다. 주로 사용하게 되는 몇 가지 예시에 대해서 더 자세하게 정리해보고자 한다.

첫 스탭은 당연히 pheatmap을 불러오는 것.

library("pheatmap")

두 번째는 범례를 만드는 법이다.

데이터 frame에 세포주와 약물을 처리했을 때 샘플들이 많으면 한 눈에 구분하기가 쉽지 않다. 구분하기 쉬운 색상을 미리 골라서 보기 쉽게 분리하도록 하자.

cell_line = c("cell_1","cell_2","cell_3","cell_4","cell_5")

drug = c("drug_1","drug_2","durg_3")

## annotation dataframe

anno.df = data.frame(cell_line=anno_cell_line, drug=anno_drug)

rownames(anno.df) = anno_samples

## annotation color

ann_colors = list(

cell_line = c("cell_1" = "#235725", cell_2 = "#1FD7F1", cell_3 = "#E4AF30", cell_4 = "#CADF7C", cell_5 = "#Fc0D7D"),

drug = c(drug_1 = "#BDBE32", drug_2 = "#FE9089", "drug_3" = "#82B7FC")

)

위의 예시는 cell 5 종류와 drug 3 종류를 구분하였고 data frame을 만들었다.

anno_samples는 cell_line과 drug 사이에 _를 넣고 붙인 리스트이다. 나중에 데이터와 일치화 시켜야 하기 때문에 상황에 맞게 다르게 줘야 할 수도 있다.

data frame이 제대로 만들어 졌다면 annotation 정보를 가지고 있는 아래와 같은 구조를 가질 것이다.

cell_line drug

cell_1_drug_1 cell_1 drug_1

cell_1_drug_2 cell_1 drug_2

cell_1_drug_3 cell_1 drug_3

cell_2_drug_1 cell_2 drug_1

cell_2_drug_2 cell_2 drug_2

cell_2_drug_3 cell_2 drug_3

cell_3_drug_1 cell_3 drug_1

cell_3_drug_2 cell_3 drug_2

cell_3_drug_3 cell_3 drug_3

cell_4_drug_1 cell_4 drug_1

cell_4_drug_2 cell_4 drug_2

cell_4_drug_3 cell_4 drug_3

cell_5_drug_1 cell_5 drug_1

cell_5_drug_2 cell_5 drug_2

cell_5_drug_3 cell_5 drug_3

이후에 데이터가 들어가 있는 data frame과 rowname을 일치시켜주고 plot을 그리면 된다.

names(df) = anno_samples

pdf("test.pdf")

pheatmap(df, cluster_rows=F, cluster_cols=F, show_rownames=T, annotation_col=anno.df, annotation_colors = ann_colors, main=("pheatmap_test"), fontsize_row=6, legend_breaks = c(-0.5, 0, 0.5, max(newdf)), legend_labels = c("-0.5", "0", "0.5", "1-(q-value)\n"), legend=T)

dev.off()

＊여기서 annotation 정보가 있는 data frame의 row name이 데이터가 들어가 있는 data frame의 column name과 일치되어야 한다.

rownames(anno.df) = anno_samples

names(df) = anno_samples

행과 열을 clustering하는 여부나 이름 표시 여부 등은 아래 메뉴얼을 참고하는 것이 더 빠를 듯 하다. 대부분은 직관적으로 이해할 수 있다.

legend_breaks와 legend_labels은 데이터 표시 범위의 범례를 조절하는데 default로 놓아도 크게 무리없는듯 하다.

예시대로 plot을 그리면 아래처럼 나온다. 색상은 직접 조절하도록 하자.

2018/08/31 - [etc.] - 16진수 RGB코드 알아내는법

Reference -

https://cran.r-project.org/web/packages/pheatmap/pheatmap.pdf

저작자표시 비영리 변경금지 (새창열림)

'Computer Science > R' 카테고리의 다른 글

pheatmap 값에 따른 color 범위 조절하기 (0)	2018.09.12
Kegg pathway에 속하는 유전자 정보 가져오기 (2)	2018.09.11
R에서 Dataframe 합치기 (0)	2018.09.05
DESeq2 에서 multiple condition 수행하기 (1)	2018.07.27
centos 6에서 R 설치를 위한 라이브러리 설치 (2)	2018.07.25

R에서 Dataframe 합치기

바닐라스카이 2018. 9. 5. 17:48

2018. 9. 5. 17:48

R에서 Dataframe 합치기

main이 같은 dataframe끼리 합친다.

df <- merge(df1, df2, by="main")

두 dataframe의 row name이 다르다면 각각을 지정해준다.

df <- merge(df1, df2, by.x="xmain", by.y="ymain")

df1에서는 xmain이라는 row와 df2에서는 ymain이라는 row의 값이 같으면 합친다.

값이 채워지지 않는다면 빈칸으로 존재하는데 이를 그냥 무시하고 지나가면 값이 밀릴 수 있다.

df$xmain <- ifelse(df$xmain == "" , "NA", df$xmain)

df의 xmain이 비어있다면 "NA"로 채우고 비어있지 않다면 값을 유지하고 지나간다.

저작자표시 비영리 변경금지 (새창열림)

'Computer Science > R' 카테고리의 다른 글

Kegg pathway에 속하는 유전자 정보 가져오기 (2)	2018.09.11
pheatmap으로 heatmap그리기 (0)	2018.09.11
DESeq2 에서 multiple condition 수행하기 (1)	2018.07.27
centos 6에서 R 설치를 위한 라이브러리 설치 (2)	2018.07.25
Arguments in R (0)	2018.07.25

centos 6에서 R 설치를 위한 라이브러리 설치

바닐라스카이 2018. 7. 25. 12:43

2018. 7. 25. 12:43

Centos 6에서 R 설치를 위한 라이브러리 설치

Root권한이 있다면 yum이나 apt-get으로 설치하면서 진행할 수 있겠지만 local로 설치하려고 하니 local library에 따로 설치해야 하는 것들이 많아서 정리해 보고자 한다.

먼저 local library 설치 폴더를 만든다. 간단하게 만들어도 무방하다.

mkdir ~/library

zlib

제일 처음 문제가 생긴 부분은 zlib 버전이 1.2.5 미만이라는 것. zlib를 다운받는다. 이때 1.2.10 이상 버전을 받으면 lexiographically 1.2.10이 더 낮다고 판단하는 문제가 있기 때문에 무난하게 1.2.9로 받도록 하자. (lexiographically는 숫자를 문자로 인식해서 5와 15를 비교할 때 5 > 1 로 비교해서 5가 15보다 더 높다고 비교하는 방식이다.)

참조 (https://stackoverflow.com/questions/42076936/zlib-bz2-library-and-headers-are-requried-for-compiling-r)

wget https://osdn.net/frs/g_redir.php?m=kent&f=libpng%2Fzlib%2F1.2.9%2Fzlib-1.2.9.tar.gz

tar -zxf zlib-1.2.9.tar.gz

cd zlib-1.2.9

./configure --prefix=~/library

make && make install

일반적인 설치 방법대로 하면 zlib가 설치된다. 설치가 끝나면 ~/library 폴더 안에 lib와 include폴더가 생성되는데 여기에 나머지 라이브러리들도 다 넣고 이 라이브러리 경로를 잡아서 R을 설치 할 것이다.

bzip

bzip은 설치방법이 약간 다르다. configure가 없다. 바로 make를 하는데 prefix가 대문자임에 주의하면서 진행하면 된다.

wget http://www.bzip.org/1.0.6/bzip2-1.0.6.tar.gz

tar -zxf bzip2-1.0.6.tar.gz

cd bzip2-1.0.6

make --PREFIX=~/library

liblzma

lzma는 lzma를 직접 설치하는게 아니라 XZ를 설치하면 자동으로 해결된다. https://tukaani.org/xz/ 에서 받을 수 있다.

wget https://tukaani.org/xz/xz-5.2.4.tar.gz

tar -zxf xz-5.2.4.tar.gz

cd xz-5.2.4

./configure --prefix=~/library

make && make install

pcre

pcre는 추가로 설정해줘야하는 부분이 있다. 우선 파일은 https://ftp.pcre.org/pub/pcre/에서 받으면 되는데 pcre2를 받지 말고 pcre 8.42정도를 받자.

wget https://ftp.pcre.org/pub/pcre/pcre-8.42.tar.gz

tar -zxf pcre-8.42.tar.gz

cd pcre-8.42

./configure --prefix=~/library --enable-utf8

make && make install

cd ~/library/include

mkdir pcre

ln -s pcre* include/.

make 이후에 ~/library/include에 생성된 pcre.h 헤더 파일이 있는데 이 파일을 include/pcre 폴더 안에 링크로 넣어주는 작업까지 해야 완료가 된다. 왜인지는 모르겠지만 R 설치시 헤더파일을 pcre폴더 안에서 찾기 때문이다.

curl

마지막으로 curl은 일반적인 설치 방식을 따라가면 된다. https://curl.haxx.se/download.html

wget https://curl.haxx.se/download/curl-7.61.0.tar.gz

tar -zxf curl-7.61.0.tar.gz

cd curl-7.6.10

./configure --prefix=~/library

make && make install

모든 dependency가 설치되었으면 R의 configure를 해보도록 하자.

./configure --prefix=/PATH/TO/INSTALL/R --enable-R-shlib LDFLAGS="-L~/library/lib/" CPPFLAGS="-I~/library/include/"

make && make install

중간에 warning이 뜨긴 했지만 일단 무시하고 진행했으며 R일 실행되는 것 까지 확인하였다.

Reference -

https://unix.stackexchange.com/questions/343452/how-to-install-r-3-3-1-in-my-own-directory

저작자표시 비영리 변경금지 (새창열림)

'Computer Science > R' 카테고리의 다른 글

Kegg pathway에 속하는 유전자 정보 가져오기 (2)	2018.09.11
pheatmap으로 heatmap그리기 (0)	2018.09.11
R에서 Dataframe 합치기 (0)	2018.09.05
DESeq2 에서 multiple condition 수행하기 (1)	2018.07.27
Arguments in R (0)	2018.07.25

Arguments in R

바닐라스카이 2018. 7. 25. 09:44

2018. 7. 25. 09:44

Arguments in R

Rscript를 사용할 때 argument를 input으로 받는 방법.

args = commandArgs(trailingOnly=TRUE)

species <- args[1]

inputfile <= args[2]

Rsciprt test.R human hg19.fasta

argument를 더 복잡하게 쓰려면 optparse라는 라이브러리를 써도 되지만 간단하게 정리하고 싶다면 위와 같이 작성할 수 있다.

추가로 argument를 입력하지 않았을 때 간단한 설명을 넣고 싶다면 아래와 같이 하면 된다.

if(length(args)==0 {

stop("All argument must be supplied ex) human hg19.fasta",call.=FALSE))

}

argument가 하나도 들어오지 않았다면 ERROR 메세지 뒤에 정해놓은 문자열을 출력하고 자동 종료된다.

저작자표시 비영리 변경금지 (새창열림)

'Computer Science > R' 카테고리의 다른 글

Kegg pathway에 속하는 유전자 정보 가져오기 (2)	2018.09.11
pheatmap으로 heatmap그리기 (0)	2018.09.11
R에서 Dataframe 합치기 (0)	2018.09.05
DESeq2 에서 multiple condition 수행하기 (1)	2018.07.27
centos 6에서 R 설치를 위한 라이브러리 설치 (2)	2018.07.25

PREV 이전 1 NEXT 다음

Be great

r