CAFE v4.0 설치 및 실행하기

바닐라스카이 2017. 9. 11. 15:28

2017. 9. 11. 15:28

CAFE v4.0 설치 및 실행하기

CAFE v4.0 Software for Computational Analysis of gene Family Evolution

gene gain and loss를 계산하기 위해 CAFE 설치를 해보기로 했다.

homepage - https://hahnlab.github.io/CAFE/index.html

다운로드는 홈페이지 내의 다운로드 탭에서 github 링크를 따라가서 cafe-version.tar.gz을 다운받는다.

tar xf cafe*tar.gz

./configure

make

CAFE/release/cafe 실행파일이 생성된다.

tutorial.pdf 파일을 보고 따라해 보자.

1. ensembl biomarket에서 종의 protein sequence를 받아야 한다.

ensembl genes, species를 고르고 attributes에서 sequence->peptide를 고르고 header information에서 trascript ID를 빼고 CDS length를 넣어주면 된다.

Results를 누르고 이메일로 받거나 그냥 다운받으면 된다.

다운받으면 모두 한 폴더에 넣은 뒤 압축을 풀면 된다.

2. longest isoform 만을 추려내야 한다.

cafe/manual/tutorial files에 가면 script가 존재한다. (https://iu.box.com/v/cafetutorial-files)

튜토리얼 파일 전부를 받으면 용량이 5Gb가 넘어가니 필요한 부분만 따로 받는것이 좋다.

python python_scripts/cafetutorial_longest_iso .py -d twelve_spp_proteins/

cat twelve_spp_proteins/longest*.fa > makeblastdb_input.fa

-d 뒤에는 sequence가 있는 폴더명을 적어주면 되고 폴더 안에있는 .fa 확장자 파일을 읽어서 longest 파일만을 가져온다.

여러 종의 longest 파일이 각각 존재하니 한 파일로 묶어 주어야 한다.

3. blast로 align하기.

makeblastdb -in makeblastdb_input.fa -dbtype prot -out blastdb

blastp -num_threads 4 -db blastdb -query makeblastdb_input.fa -outfmt 7 -seg yes > blast_output.txt

시간이 상당히 많이 걸린다. tutorial에서는 align된 결과도 위의 tutorial files에서 제공하고 있다.

4. clustering 하기

mcl이 설치되어 있어야 한다.

wget https://www.micans.org/mcl/src/mcl-latest.tar.gz

tar -zxf mcl-latest.tar.gz

./configure --prefix=/PATH/TO/INSTALL/MCL

make && make install

export PATH=/PATH/TO/INSTALL/MCL/bin:$PATH

이후에 mcl를 사용해서 gene family 레벨에서 clustering을 진행한다.

grep -v "#" blast_output.txt | cut -f 1,2,11 > blast_output.abc

mcxload -abc blast_output.abc --stream-mirror --stream-neg-log10 -stream-tf 'ceil(200)' -o blast_output.mci -write-tab blast_output.tab

mcl blast_output.mci -I 3

mcxdump -icl out.blast_output.mci.I30 -tabr blast_output.tab -o dump.blast_output.mci.I30

tutorial에 있던 python script를 사용해서 결과를 만든다.

cafetutorial_mcl2rawcafe.py -i dump.blast_output.mci.I30 -o unfiltered_cafe_input.txt -sp "ENSAPLG ENSFALG ENSGALG ENSTGUG ENSMGAG"

cafetutorial_clade_and_size_filter.py -i unfiltered_cafe_input.txt -o filtered_cafe_input.txt -s

5. tree 계산하기

((( cow :0.09289 ,( cat :0.07151 , horse :0.05727) :0.00398) :0.02355 ,(((( orang

:0.01034 ,( chimp :0.00440 , human :0.00396) :0.00587) :0.00184 , gibbon

:0.01331) :0.00573 ,( macaque :0.00443 , baboon :0.00422) :0.01431)

:0.01097 , marmoset :0.03886) :0.04239) :0.03383 ,( rat :0.04110 , mouse

:0.03854) :0.10918) ;

위와 같은 NEWICK 포맷의 종간 거리가 계산되어 있는 파일이 필요하다.

튜토리얼에서는 이미 계산되어 있는 경우 그냥 가져다 쓰면 된다고 나와있지만 그런 경우는 별로 없을 것 같다.

ultrametric으로 새로 계산하기 위해서 r8s 라는 프로그램을 소개하고있다.

phylogenetic tree를 계산하는 방법은 세 가지 인데

Cladogram - branch의 길이는 의미 없음. 대략적으로 어떻게 분류되는지만 보여준다.

Phylogram - 길이는 유전적인 관계를 나타낸다. 같은 종류에 묶여있더라도 길이에 따라 더 가까운 관계가 존재한다.

Ultrametirc - Phylogram과 비슷하나 유전적인 관계보다 시간적인 요소에 가중치를 두었다.

r8s는 phylogram으로 계산한 방식을 ultrametric으로 바꿀 수 있는 프로그램인데 링크가 다 깨져 있어서 접속 불가.

ultrametric tree를 그려주는 BEAST 프로그램으로 대체하였다.

2017/09/27 - [bioinformatics] - BEAST 설치 및 실행하기

저작자표시 비영리 변경금지 (새창열림)

'bioinformatics' 카테고리의 다른 글

Racon 설치 및 실행하기 (0)	2017.09.19
Phylip 설치 및 실행하기 (0)	2017.09.18
Synteny Circos plot 그리기 (0)	2017.08.30
Arrow/Quiver 설치 및 사용하기 (0)	2017.08.24
SyMap 설치 및 실행하기 (0)	2017.08.23

Be great

CAFE v4.0 설치 및 실행하기

'bioinformatics' 카테고리의 다른 글

+ Recent posts

티스토리툴바