전체 글

GLOOME 설치 및 실행하기

바닐라스카이 2016. 12. 19. 17:55

2016. 12. 19. 17:55

GLOOME 설치 및 실행하기

2010년 Bioinformatics에 출판된 논문(https://doi.org/10.1093/bioinformatics/btq549)에서 소개하고 있는 프로그램으로 유전자의 presense와 absense정보를 받아서 트리로 그려주는 프로그램이다.

Ofir Cohen et al, GLOOME: gain loss mapping engine, Bioinformatics, 2010

web page http://gloome.tau.ac.il/로 가면 fasta file로 input을 받는다.

fasta 형식은

>speciesA

10001100110

>speciesB

11001100110

각 1/0은 같은 유전자의 유무를 binary로 나타내면 된다.

위의 예시에서는 2번째 위치의 유전자가 A에서는 없고 B에서는 있는 것이고 그 외의 둘 다 0이거나 1인 경우는 동일하게 가지고 있거나 가지고있지 않거나를 의미한다.

위의 프로그램은 그림을 그려주기는 하나 java 에러가 나서 ph파일까지만 이메일로 받고

http://iubio.bio.indiana.edu/treeapp/treeprint-form.html 웹사이트에 가서 ph파일에서 tree.pdf 파일로 변환.

저작자표시 비영리 변경금지

'bioinformatics' 카테고리의 다른 글

NCBI BLAST+ 설치 및 실행하기 (0)	2017.08.16
Busco 설치 및 실행하기 (0)	2017.08.16
Circos plot 그리기. (0)	2017.08.15
bowtie2에서 mismatch 허용하기 (0)	2016.09.13
miRNA 명명 규칙 (0)	2016.09.01

특정 확장자를 가진 파일을 리스트로 받기

바닐라스카이 2016. 12. 18. 16:24

2016. 12. 18. 16:24

Julia에서 폴더를 하나 선택 후 그 폴더에 있는 하위폴더만 고르거나 특정 확장자를 가지고 있는 파일만 리스트로 만드려면

먼저 해당 폴더를 input으로 받은 뒤 filter를 이용해서 조건에 맞는 파일들만 고르면 된다..

readdir() 은 디렉토리를 읽고 안에있는 모든 파일을 가져오는 것이며

isdir()은 해당 파일이 디렉토리인지 확인하는 함수이고

endswith()는 해당 파일이름의 마지막이 주어진 조건과 매치하는지 확인하는 것이다.

세 함수를 조합해서 아래처럼 사용하면 된다.

inputdir = ARGS[1]

dirlist = filter(x -> isdir(inputdir*x), readdir(inputdir))

zipfilelist = filter(x -> endswith(x,".zip"), readdir(inputdir))

스크립트를 실행하면서 넣어준 ARGS[1] 디렉토리에서

하위 폴더는 dirlist에 리스트 형식으로 저장 될 것이고

.zip으로 끝나는 파일들을 zipfilelist에 리스트 형식으로 저장 될 것이다.

저작자표시 비영리 변경금지

'Computer Science > julia' 카테고리의 다른 글

ArgParse 모듈 사용하기 (0)	2017.08.31
StatsBase 모듈 사용하기 (0)	2017.08.24
Genome으로부터 sequence 가져오기. (0)	2017.08.18
Julia 설치 및 실행하기 (0)	2017.08.16
string에 섞여있는 float 찾기 (0)	2017.03.16

bowtie2에서 mismatch 허용하기

바닐라스카이 2016. 9. 13. 15:29

2016. 9. 13. 15:29

Bowtie2에서 mismatch 허용하기.

bowtie2에서는 bowtie와 차이점은 단순히 read length에 따른 최적화나, gap 허용 외에도

bowtie에서는 최대 3개까지의 mismatch만을 허용하는데 반해 bowtie2에서는 mismatch 또는 indel의 각각의 페널티 점수를 입력하여 read length의 일정 비율만큼 mismatch 허용할 수 있다는 차이점이 있다.

bowtie2에 대한 자세한 옵션은 http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml 를 참고하면 된다.

이 포스팅에서는 mismatch에 대해서만 다루고자 한다.

For an alignment to be considered "valid" (i.e. "good enough") by Bowtie 2, it must have an alignment score no less than the minimum score threshold. The threshold is configurable and is expressed as a function of the read length. In end-to-end alignment mode, the default minimum score threshold is -0.6 + -0.6 * L, where L is the read length. In local alignment mode, the default minimum score threshold is 20 + 8.0 * ln(L), where L is the read length. This can be configured with the --score-min option. For details on how to set options like --score-min that correspond to functions, see the section on setting function options.

Scoring options

`--ma <int>`	Sets the match bonus. In `--local` mode `<int>` is added to the alignment score for each position where a read character aligns to a reference character and the characters match. Not used in `--end-to-end` mode. Default: 2.
`--mp MX,MN`	Sets the maximum (`MX`) and minimum (`MN`) mismatch penalties, both integers. A number less than or equal to `MX` and greater than or equal to `MN`is subtracted from the alignment score for each position where a read character aligns to a reference character, the characters do not match, and neither is an `N`. If `--ignore-quals` is specified, the number subtracted quals `MX`. Otherwise, the number subtracted is `MN + floor( (MX-MN)(MIN(Q, 40.0)/40.0) )` where Q is the Phred quality value. Default: `MX` = 6, `MN` = 2.
`--np <int>`	Sets penalty for positions where the read, reference, or both, contain an ambiguous character such as `N`. Default: 1.
`--rdg <int1>,<int2>`	Sets the read gap open (`<int1>`) and extend (`<int2>`) penalties. A read gap of length N gets a penalty of `<int1>` + N * `<int2>`. Default: 5, 3.
`--rfg <int1>,<int2>`	Sets the reference gap open (`<int1>`) and extend (`<int2>`) penalties. A reference gap of length N gets a penalty of `<int1>` + N * `<int2>`. Default: 5, 3.
`--score-min <func>`	Sets a function governing the minimum alignment score needed for an alignment to be considered "valid" (i.e. good enough to report). This is a function of read length. For instance, specifying `L,0,-0.6` sets the minimum-score function `f` to `f(x) = 0 + -0.6 * x`, where `x` is the read length. See also: setting function options. The default in `--end-to-end` mode is `L,-0.6,-0.6` and the default in `--local` mode is `G,20,8`.

bowtie2 manual 홈페이지에서 가져온 score 계산 방법이다.

우선 위의 end-to-end score threshold는 -0.6 + -0.6 * readlength 라고 적혀있다.

read length를 100으로 놓으면 비율로 생각하기 쉬우니 그렇게 계산해보면 -60.6 보다 점수가 낮으면 align하지 않겠다는 뜻으로 해석 가능하다.

각 base가 match, mismatch, gap일 때의 score를 살펴보면 마찬가지로 end-to-end일때 mismatch score는 read의 base quality에 따라 점수가 다르게 측정 되는 것으로 보인다. (maximum mismatch penalty와 minimum mismatch penalty가 존재)

preprocess과정에서 low quality base를 자르기도 하고 요즘 sequencing을 하면 대체적으로 high quality read가 많으니 일단 quality가 좋다고 가정할 때의 score인 6으로 계산한다.

(read의 quality가 나쁘면 penalty는 작다. 예를들어 read의 base가 A이며 quliaty score가 낮다면 실제로 이 base는 T이고 mismatch가 아니라 match일 수도 있기 때문이다.)

default 설정의 경우

- mismatch가 10번 생기면 -60점이니 read length의 10%라고 보면 된다.

- gap같은 경우는 open과 extend가 각각 다르게 적용되니 gap이 하나만 있다면 18개 까지 생길 수 있다.

사용할 때는 dafult 값인 mp와 rdg등은 가급적이면 안건드리고 --score-min L,-0.6,-0.3 등으로 바꿔서 (5%의 mismatch를 허용) 해보는 것을 추천 하지만 data 특성에 따라서 mismatch적절한 값을 주고 사용해야 한다.

저작자표시 비영리 변경금지

'bioinformatics' 카테고리의 다른 글

NCBI BLAST+ 설치 및 실행하기 (0)	2017.08.16
Busco 설치 및 실행하기 (0)	2017.08.16
Circos plot 그리기. (0)	2017.08.15
GLOOME 설치 및 실행하기 (0)	2016.12.19
miRNA 명명 규칙 (0)	2016.09.01

cannot mkdir R_TempDir 에러

바닐라스카이 2016. 9. 7. 10:23

2016. 9. 7. 10:23

Python에서 rpy 모듈을 사용하다가

Python 2.6.9 (unknown, Feb 26 2015, 10:49:14)

[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>> import rpy

Fatal error: cannot mkdir R_TempDir

cannot mkdir R_TempDir 에러메세지 발생했다.

/tmp/ 폴더를 확인해 봐야 한다.

권한이 없거나, tmp 폴더에 설정한 용량을 채우면 더 진행되지 않는다.

확인필요.

저작자표시 비영리 변경금지

'Computer Science > python' 카테고리의 다른 글

Primer 서열 분석을 위한 python 코드 (0)	2021.08.17
String Format으로 길이 고정하기 (0)	2020.06.24
python multi-level argparse (0)	2019.07.12
python 파일 입출력 (0)	2019.07.12
Python 설치 및 실행하기 (0)	2017.08.16

miRNA 명명 규칙

바닐라스카이 2016. 9. 1. 16:21

2016. 9. 1. 16:21

miRNA의 명명법은 규칙이 있다.

이 내용을 mirbase에서 소개하고 있으며 요약하고자 한다.

원문은 http://www.mirbase.org/help/nomenclature.shtml 참조 하면 된다.

요약하자면

1. hsa-mir-121이라는 miRNA가 있을 때 학명-mir-숫자 의 형식을 따른다.

2. 숫자는 발견된 순서로서 마지막으로 121이라는 miRNA가 있었다면 이후에 발견되는 miRNA는 122부터 시작한다.

3. genome 상의 다른 영역에서 같은 mature miRNA를 가지는 precursor miRNA가 있다면 이름은 hsa-mir-121-1, hsa-mir-121-2를 가진다.

4. genome 상의 다른 영역에서 유사한 mature miRNA를 가지고 있다면 hsa-mir-121a, hsa-mir-121b를 가진다.

5. mature miRNA는 위치에 따라 precursor miRNA 이름 뒤에 -5p , -3p를 가진다. ex) hsa-mir-121-5p, hsa-mir-121-3p

6. 이 규칙은 항상 적용되는 것은 아니며 예외가 있을 수 있다.

마지막으로 이름은 아주 일부의 정보만 가지고 있기 때문에 miRNA의 정확한 정보를 알기 위해서는 database를 검색해야지 이름에 의존하면 안된다고 하며 설명을 마무리 하고 있다.

저작자표시 비영리 변경금지

'bioinformatics' 카테고리의 다른 글

NCBI BLAST+ 설치 및 실행하기 (0)	2017.08.16
Busco 설치 및 실행하기 (0)	2017.08.16
Circos plot 그리기. (0)	2017.08.15
GLOOME 설치 및 실행하기 (0)	2016.12.19
bowtie2에서 mismatch 허용하기 (0)	2016.09.13

PREV 이전 1 ···35 36 37 38 39 NEXT 다음

Be great

전체 글