전체 글

GCC 설치하기 2017.09.18 1
Phylip 설치 및 실행하기 2017.09.18
CAFE v4.0 설치 및 실행하기 2017.09.11
De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds 2017.09.06
ArgParse 모듈 사용하기 2017.08.31

GCC 설치하기

바닐라스카이 2017. 9. 18. 15:33

2017. 9. 18. 15:33

GCC 설치하기

GCC 홈페이지 - https://gcc.gnu.org/

GCC는 GNU C Compiler의 약자로서 일반적인 프로그램 설치를 위해 많이 쓰인다.

9/19/2017 기준으로 최신 버전은 7.2이다. 최근 1~2년 사이에 major update가 많이 진행되었는지 숫자가 높아졌다.

설치 방법은 어렵지 않지만 depedency가 있다.

gcc를 설치하기 위해서는 gmp 4.2+, mpfr 2.3.1+, mpc 0.8.0+ 가 필요하다.

root권한이 있다면 libgmp-dev, libmpc-dev, libmpfr-dev를 시스템에 설치하면 되지만 local로 설치할 때는 각각 설치한 후 PATH를 잡아주는 번거로운 작업을 거쳐야 한다.

Ubuntu

sudo apt-get install libgmp-dev libmpfr-devl libmpc-dev

Red Hat and Fedora

sudo yum install gmp gmp-devel mpfr mpfr-devel libmpc libmpc-devel

Manual install

gmp, mpfr, mpc은 아래의 포스팅에서 설치방법을 확인할 수 있다. gmp, mpfr, mpc 순서대로 설치해야 한다.

2017/09/19 - [linux] - GMP 설치하기

2017/09/19 - [linux] - MPFR 설치하기

2017/09/19 - [linux] - MPC 설치하기

이제 gcc를 설치하기 위해 가까운 미러 사이트인 일본 (http://ftp.tsukuba.wide.ad.jp/software/gcc/releases/)에 들어가서 원하는 버전을 다운 받자.

현재 최신버전인 7.2 버전으로 진행하였다.

wget http://ftp.tsukuba.wide.ad.jp/software/gcc/releases/gcc-7.2.0/gcc-7.2.0.tar.gz

tar -zxf gcc-7.2.0.tar.gz

cd gcc-7.2.0

./configure --prefix=/PATH/TO/INSTALL/GCC--with-gmp=/PATH/TO/INSTALL/GMP --with-mpfr=/PATH/TO/INSTALL/MPFR --with-mpc=/PATH/TO/INSTALL/MPC 

make && make install

위의 명령어에서 gmp, mpfr, mpc를 Manual하게 설치하지 않았다면 configure할 때 PATH는 따로 잡아주지 않아도 된다.

make 할 때 에러가 떠서 아래처럼 fix했다.

error: 'GATHER_STATISTICS' was not declared in this scope

라는 에러가 뜨면서 설치가 중단됐는데 아래와 같이 변수를 unset해주니 해결됐다.

unset LIBRARY_PATH CPATH C_INCLUDE_PATH PKG_CONFIG_PATH CPLUS_INCLUDE_PATH INCLUDE

출처 : https://stackoverflow.com/questions/29981492/gcc-4-9-2-installation-failed-on-linux

libmpc.so.3가 잡히지 않아서 LD_LIBRARY_PATH로 library 경로를 따로 잡아주었다.

export LD_LIBRARY_PATH=/PATH/TO/INSTALL/GMP/lib:/PATH/TO/INSTALL/MPFR/lib:/PATH/TO/INSTALL/MPC/lib:$LD_LIBRARY_PATH

configure: error: I suspect your system does not have 32-bit development libraries (libc and headers). If you have them, rerun configure with --enable-multilib. If you do not have them, and want to build a 64-bit-only compiler, rerun configure with --disable-multilib.

64bit only compiler로 하고싶지 않아서 --enable-multilib 옵션을 추가하였다.

저작자표시 비영리 변경금지

'Computer Science > linux' 카테고리의 다른 글

MPFR 설치하기 (0)	2017.09.19
GMP 설치하기 (1)	2017.09.19
cURL 로컬 설치하기 (0)	2017.08.24
Repeatmasker 설치 (0)	2017.05.05
GBrowse2 설치하기 (0)	2017.04.25

Phylip 설치 및 실행하기

바닐라스카이 2017. 9. 18. 09:53

2017. 9. 18. 09:53

Phylip 설치 및 실행하기

홈페이지 - http://evolution.genetics.washington.edu/phylip.html

University of Washington에서 만든 간단한 phylogenetic tree를 그려주는 프로그램이다.

몇 안되는데 True or False 값으로만 유전자 존재 유무를 체크하고 그려주는 binary phylogenetic tree 프로그램이다. MSA로 진행하는 프로그램을 Cluster Omega를 포함해서 많으니 그 쪽을 참고하기 바란다. binary input을 사용하는 모듈은 clique이다.

주의해야 할 점은 발현하는 유전자 FPKM cutoff를 주고 그 이상 발현되는 유전자만을 따로 추려내도록 해야 한다.

mis-aligned된 reads로 인한 FPKM을 고려하지 않으면 결과가 의도치 않은 방향으로 나올 것이다.

윈도우 버전으로 사용할 때는 따로 설치는 필요 없고 다운받은 파일의 압축만 풀면 된다.

Linux 버전으로 사용할 때는 tar.gz 파일을 받아서

tar -zxf phylip-3.696.tar.gz

cd phylip-3.696/src/

make -f Makefile.unx install

하고 나면 phylip-3.696/exe 폴더 내에 실행파일이 생긴 것을 확인할 수 있다.

실행파일을 열면 바로 infile이 없다고 나오면서 inputfile의 이름을 넣으라고 나오는데 윈도우에서는 실행한 위치에 파일이 있어야 읽을 수 있다. linux에서는 실행한 곳에 파일이 있으면 된다.

inputfile의 format은 윈도우버전의 exe/testdata 폴더 안에서 확인할 수 있는데 프로그램이 오래 전에 만들어져서 tab으로 간격을 두지 않고 띄어쓰기의 숫자로 간격이 정해진다.

반드시 testdata를 보고 띄어쓰기의 개수를 맞춰서 진행하면 에러 없이 진행되는 것을 확인할 수 있다.

저작자표시 비영리 변경금지

'bioinformatics' 카테고리의 다른 글

MHAP 설치 및 실행하기 (0)	2017.09.19
Racon 설치 및 실행하기 (0)	2017.09.19
CAFE v4.0 설치 및 실행하기 (0)	2017.09.11
Synteny Circos plot 그리기 (0)	2017.08.30
Arrow/Quiver 설치 및 사용하기 (0)	2017.08.24

CAFE v4.0 설치 및 실행하기

바닐라스카이 2017. 9. 11. 15:28

2017. 9. 11. 15:28

CAFE v4.0 설치 및 실행하기

CAFE v4.0 Software for Computational Analysis of gene Family Evolution

gene gain and loss를 계산하기 위해 CAFE 설치를 해보기로 했다.

homepage - https://hahnlab.github.io/CAFE/index.html

다운로드는 홈페이지 내의 다운로드 탭에서 github 링크를 따라가서 cafe-version.tar.gz을 다운받는다.

tar xf cafe*tar.gz

./configure

make

CAFE/release/cafe 실행파일이 생성된다.

tutorial.pdf 파일을 보고 따라해 보자.

1. ensembl biomarket에서 종의 protein sequence를 받아야 한다.

ensembl genes, species를 고르고 attributes에서 sequence->peptide를 고르고 header information에서 trascript ID를 빼고 CDS length를 넣어주면 된다.

Results를 누르고 이메일로 받거나 그냥 다운받으면 된다.

다운받으면 모두 한 폴더에 넣은 뒤 압축을 풀면 된다.

2. longest isoform 만을 추려내야 한다.

cafe/manual/tutorial files에 가면 script가 존재한다. (https://iu.box.com/v/cafetutorial-files)

튜토리얼 파일 전부를 받으면 용량이 5Gb가 넘어가니 필요한 부분만 따로 받는것이 좋다.

python python_scripts/cafetutorial_longest_iso .py -d twelve_spp_proteins/

cat twelve_spp_proteins/longest*.fa > makeblastdb_input.fa

-d 뒤에는 sequence가 있는 폴더명을 적어주면 되고 폴더 안에있는 .fa 확장자 파일을 읽어서 longest 파일만을 가져온다.

여러 종의 longest 파일이 각각 존재하니 한 파일로 묶어 주어야 한다.

3. blast로 align하기.

makeblastdb -in makeblastdb_input.fa -dbtype prot -out blastdb

blastp -num_threads 4 -db blastdb -query makeblastdb_input.fa -outfmt 7 -seg yes > blast_output.txt

시간이 상당히 많이 걸린다. tutorial에서는 align된 결과도 위의 tutorial files에서 제공하고 있다.

4. clustering 하기

mcl이 설치되어 있어야 한다.

wget https://www.micans.org/mcl/src/mcl-latest.tar.gz

tar -zxf mcl-latest.tar.gz

./configure --prefix=/PATH/TO/INSTALL/MCL

make && make install

export PATH=/PATH/TO/INSTALL/MCL/bin:$PATH

이후에 mcl를 사용해서 gene family 레벨에서 clustering을 진행한다.

grep -v "#" blast_output.txt | cut -f 1,2,11 > blast_output.abc

mcxload -abc blast_output.abc --stream-mirror --stream-neg-log10 -stream-tf 'ceil(200)' -o blast_output.mci -write-tab blast_output.tab

mcl blast_output.mci -I 3

mcxdump -icl out.blast_output.mci.I30 -tabr blast_output.tab -o dump.blast_output.mci.I30

tutorial에 있던 python script를 사용해서 결과를 만든다.

cafetutorial_mcl2rawcafe.py -i dump.blast_output.mci.I30 -o unfiltered_cafe_input.txt -sp "ENSAPLG ENSFALG ENSGALG ENSTGUG ENSMGAG"

cafetutorial_clade_and_size_filter.py -i unfiltered_cafe_input.txt -o filtered_cafe_input.txt -s

5. tree 계산하기

((( cow :0.09289 ,( cat :0.07151 , horse :0.05727) :0.00398) :0.02355 ,(((( orang

:0.01034 ,( chimp :0.00440 , human :0.00396) :0.00587) :0.00184 , gibbon

:0.01331) :0.00573 ,( macaque :0.00443 , baboon :0.00422) :0.01431)

:0.01097 , marmoset :0.03886) :0.04239) :0.03383 ,( rat :0.04110 , mouse

:0.03854) :0.10918) ;

위와 같은 NEWICK 포맷의 종간 거리가 계산되어 있는 파일이 필요하다.

튜토리얼에서는 이미 계산되어 있는 경우 그냥 가져다 쓰면 된다고 나와있지만 그런 경우는 별로 없을 것 같다.

ultrametric으로 새로 계산하기 위해서 r8s 라는 프로그램을 소개하고있다.

phylogenetic tree를 계산하는 방법은 세 가지 인데

Cladogram - branch의 길이는 의미 없음. 대략적으로 어떻게 분류되는지만 보여준다.

Phylogram - 길이는 유전적인 관계를 나타낸다. 같은 종류에 묶여있더라도 길이에 따라 더 가까운 관계가 존재한다.

Ultrametirc - Phylogram과 비슷하나 유전적인 관계보다 시간적인 요소에 가중치를 두었다.

r8s는 phylogram으로 계산한 방식을 ultrametric으로 바꿀 수 있는 프로그램인데 링크가 다 깨져 있어서 접속 불가.

ultrametric tree를 그려주는 BEAST 프로그램으로 대체하였다.

2017/09/27 - [bioinformatics] - BEAST 설치 및 실행하기

저작자표시 비영리 변경금지

'bioinformatics' 카테고리의 다른 글

Racon 설치 및 실행하기 (0)	2017.09.19
Phylip 설치 및 실행하기 (0)	2017.09.18
Synteny Circos plot 그리기 (0)	2017.08.30
Arrow/Quiver 설치 및 사용하기 (0)	2017.08.24
SyMap 설치 및 실행하기 (0)	2017.08.23

De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds

바닐라스카이 2017. 9. 6. 10:33

2017. 9. 6. 10:33

De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds

cost-effective chromosome-length scaffold를 소개하고 있다. short-read로 생산된 draft genome을 chromosome level로 올리려면 long read나 optical mapping data가 필요하였다. 기존의 Hi-C 방식의 scaffold는 chromosome scale inversion, misjoin 등을 만들어서 어려움이 있었지만 이 논문에서 새로운 알고리즘 (split,anchor, order, and orient) Figure1. 을 소개하면서 그 방식을 통하여 scaffold하면 에러를 줄일 수 있다고 말하고 있다. 실제로 only short Illumina reads(67X)로 생산된 human genome에 in situ Hi-C 데이터(6.7X)를 사용해서 scaffolding했을 때 23개의 large chromosome이 전체의 99.5%의 서열을 가지고 있었다. Zika virus의 운반책인 이집트모기의 genome을 같은 방식으로 assembly 하였고 다른 strain의 모기도 Hi-C 데이터를 생산하여 두 종이 150-200million years 전에 분화되었으며 특정 chromosome에서의 rearrangement가 일어나는 것을 확인하였다. Hi-C 데이터를 생산하고 위의 알고리즘을 적용하면 포유동물의 genome을 만드는데 10,000 달러 이하로 만들 수 있을 것이라고 말하고있다.

resource -

Dudchenco O et al., De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds, Science, 2017

저작자표시 비영리 변경금지

'Research > paper review' 카테고리의 다른 글

LoFreq (0)	2018.07.04
Genome-wide characterization of centromeric satellites from multiple mammalian genomes (0)	2017.12.07
NextSV (0)	2017.10.11
Fast and accurate de novo genome assembly from long uncorrected reads (0)	2017.09.20

ArgParse 모듈 사용하기

바닐라스카이 2017. 8. 31. 10:42

2017. 8. 31. 10:42

ArgParse 모듈 사용하기

if Pkg.installed("ArgParse") == nothing

println("Pakage 'ArgParse' will be installed...")

Pkg.add("ArgParse")

end

using ArgParse

ArgParse 모듈은 스크립트의 옵션을 조절하게 해 준다.

프로그램을 만들 때 ArgParse를 사용하면 argument 관리하기가 훨씬 쉽다. 위의 내용을 스크립트 상단에 추가하면 ArgParse가 없으면 설치하고 불러온다.

공식 설명은 http://carlobaldassi.github.io/ArgParse.jl/stable/ 에 있다.

using ArgParse function parse_commandline() s = ArgParseSettings() @add_arg_table s begin "--opt1" help = "an option with an argument" "--opt2", "-o" help = "another option with an argument" arg_type = Int default = 0 "--flag1" help = "an option without argument, i.e. a flag" action = :store_true "arg1" help = "a positional argument" required = true end return parse_args(s) end function main() parsed_args = parse_commandline() println("Parsed args:") for (arg,val) in parsed_args println(" $arg => $val") end end main()

세 가지 조건의 argument를 만들 수 있는데

1. 옵션

2. flag

action = :store_true

가 들어가며 true or false값으로만 저장된다.

3. 필수

required = true

이며 반드시 값이 들어가야 한다.

arg1에 넣어준 값은 parsed_args["arg1"] 으로 불러올 수 있다.

저작자표시 비영리 변경금지

'Computer Science > julia' 카테고리의 다른 글

StatsBase 모듈 사용하기 (0)	2017.08.24
Genome으로부터 sequence 가져오기. (0)	2017.08.18
Julia 설치 및 실행하기 (0)	2017.08.16
string에 섞여있는 float 찾기 (0)	2017.03.16
특정 확장자를 가진 파일을 리스트로 받기 (0)	2016.12.18

PREV 이전 1 ···31 32 33 34 35 36 37 ···40 NEXT 다음

Be great

전체 글

GCC 설치하기

'Computer Science > linux' 카테고리의 다른 글

Phylip 설치 및 실행하기

'bioinformatics' 카테고리의 다른 글

CAFE v4.0 설치 및 실행하기

'bioinformatics' 카테고리의 다른 글

De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds

De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds

resource -

'Research > paper review' 카테고리의 다른 글

ArgParse 모듈 사용하기

'Computer Science > julia' 카테고리의 다른 글

+ Recent posts

티스토리툴바