'분류 전체보기' 카테고리의 글 목록 (7 Page)

분류 전체보기

seaborn clustermap color label 2022.05.24
flask_sqlalchemy 2022.05.23
centos 8에 slurm 설치하기 2022.05.11
구글 스프레드시트와 database 연동 2022.04.25 1
python 설치 2022.04.06
Progress bar 모듈 tqdm 2022.03.07
conda 채널 추가 2022.02.22
pandas 활용하기 2022.02.18
logging 모듈 사용하기 2022.02.17
f-string을 활용한 regex 사용법 2022.02.15

seaborn clustermap color label

바닐라스카이 2022. 5. 24. 13:14

2022. 5. 24. 13:14

seaborn 에서 clustermap에 color label 입히기.

condition_labels = df_data.condition
condition_uniq = sorted(condition_labels.unique())
condition_pal = sns.color_palette('husl',len(condition_uniq))
condition_lut = dict(zip(map(str, condition_uniq), condition_pal))
condition_colors = pd.Series(condition_labels).map(condition_lut)

tnm_labels = df_data.tnm
tnm_uniq = sorted(tnm_labels.unique())
tnm_pal = sns.color_palette('Paired',len(tnm_uniq))
tnm_lut = dict(zip(map(str, tnm_uniq), tnm_pal))
tnm_lut['NA']=(1,1,1)
tnm_colors = pd.Series(tnm_labels).map(tnm_lut)

age_labels = df_data.age
age_uniq = sorted(age_labels.unique())
age_pal = sns.color_palette('flare',len(age_uniq))
age_lut = dict(zip(map(str, sorted(age_labels.unique())), age_pal))
age_lut['NA']=(1,1,1)
age_colors = pd.Series(age_labels).map(age_lut)

condition_node_colors = pd.DataFrame(condition_colors).join(pd.DataFrame(tnm_colors)).join(pd.DataFrame(age_colors))
plt.figure(figsize=(100,120))
g = sns.clustermap(df_data.iloc[:,3:].T, cmap="vlag", col_colors = condition_node_colors)

for label in tnm_uniq:
    g.ax_col_dendrogram.bar(0, 0, color=tnm_lut[label], label=label, linewidth=10)
l2 = g.ax_col_dendrogram.legend(title='tnm', loc='center', ncol=2, bbox_to_anchor=(0.65, 1.05), bbox_transform=gcf().transFigure)
xx = []
for label in condition_uniq:
    x = g.ax_col_dendrogram.bar(0, 0, color=condition_lut[label], label=label, linewidth=10)
    xx.append(x)
#l1 = g.ax_col_dendrogram.legend(title='condition', loc='center', ncol=2, bbox_to_anchor=(0.2, 1.05), bbox_transform=gcf().transFigure)
legend3 = plt.legend(xx, condition_uniq, loc='center', title='condition', ncol=2, bbox_to_anchor=(0.35, 1.05), bbox_transform=gcf().transFigure)
yy = []
for label in age_uniq:
    y = g.ax_col_dendrogram.bar(0, 0, color=age_lut[label], label=label, linewidth=10)
    yy.append(y)
#l1 = g.ax_col_dendrogram.legend(title='condition', loc='center', ncol=2, bbox_to_anchor=(0.2, 1.05), bbox_transform=gcf().transFigure)
legend4 = plt.legend(yy, age_uniq, loc='center', title='age', ncol=5, bbox_to_anchor=(0.5, 1.15), bbox_transform=gcf().transFigure)

저작자표시 비영리 변경금지 (새창열림)

'Computer Science > python' 카테고리의 다른 글

프로토콜과 'abc' 모듈 (0)	2024.06.11
Python 데코레이터 (0)	2024.05.21
flask_sqlalchemy (0)	2022.05.23
python 설치 (0)	2022.04.06
Progress bar 모듈 tqdm (0)	2022.03.07

flask_sqlalchemy

바닐라스카이 2022. 5. 23. 16:00

2022. 5. 23. 16:00

app.py

from database import db_session
from models import employees
from flask import Flask, Response, request
import pandas as pd
import json
import datetime

app = Flask(__name__)

@app.route('/employees/select', methods=['GET'])
def select():
        queryset = employees.query.limit(5)
        print(queryset)
        df = pd.read_sql(queryset.statement, queryset.session.bind)
        print(df)
        return Response(df.to_json(orient="records"), mimetype='application/json')

@app.route('/employees/insert', methods=['POST'])
def insert():
        emp_no = request.args.get('emp_no', default = 1, type = int)
        birth_date = request.args.get('birth_date', default = '9999-01-01')
        first_name = request.args.get('first_name', default='Gil-Dong', type=str)
        last_name = request.args.get('last_name', default='Hong', type=str)
        gender = request.args.get('gender', default='M')
        a = employees(emp_no, birth_date, first_name, last_name, gender)
        db_session.merge(a)
        db_session.commit()
        return 'done\n'

app.run(debug=True)

models.py

from sqlalchemy import Column, Integer, String, DateTime
from database import Base
import datetime

class employees(Base):
        __tablename__ = 'employees'
        emp_no = Column(Integer, primary_key=True)
        birth_date = Column(DateTime)
        first_name = Column(String)
        last_name = Column(String)
        gender = Column(String)
        hire_date = Column(DateTime)

        def __init__(self, emp_no, birth_date, first_name, last_name, gender):
                self.emp_no = emp_no
                self.birth_date = birth_date
                self.first_name = first_name
                self.last_name = last_name
                self.gender = gender
                self.hire_date =  datetime.date.today().strftime("%y-%m-%d")

        def __repr__(self):
                return f'{self.emp_no} : {self.first_name}'

database.py

from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
from sqlalchemy.ext.declarative import declarative_base

engine = create_engine('mysql+mysqlconnector://root:0000@localhost/employees?charset=utf8', convert_unicode=True)
db_session = scoped_session(sessionmaker(autocommit=False,
                                         autoflush=False,
                                         bind=engine))
Base = declarative_base()
Base.query = db_session.query_property()

def init_db():
    # import all modules here that might define models so that
    # they will be registered properly on the metadata.  Otherwise
    # you will have to import them first before calling init_db()
    import models
    Base.metadata.create_all(bind=engine)

저작자표시 비영리 변경금지 (새창열림)

'Computer Science > python' 카테고리의 다른 글

Python 데코레이터 (0)	2024.05.21
seaborn clustermap color label (0)	2022.05.24
python 설치 (0)	2022.04.06
Progress bar 모듈 tqdm (0)	2022.03.07
pandas 활용하기 (0)	2022.02.18

centos 8에 slurm 설치하기

바닐라스카이 2022. 5. 11. 12:32

2022. 5. 11. 12:32

#2022-05-27 재작성

single node 기준. (서버와 노드가 하나의 컴퓨터)

slurm, munge 데몬 설치.

sudo dnf install slurm slurm-slurmd slurm-slurmctld munge

slurm 계정 및 그룹 생성.

groupadd -g 900 slurm
useradd  -m -c "SLURM workload manager" -d /var/lib/slurm -u 900 -g slurm -s /sbin/nologin slurm

sudo chown slurm:slurm -R /var/spool/slurm/

munge key 생성

sudo /usr/sbin/create-munge-key -r

필요에 따라 conf 파일을 수정해야함. 위치는 /etc/slurm/slurm.conf

SlurmUser=slurm
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdLogFile=/var/log/slurmd.log

로그 파일의 수정 권한을 SlurmUser가 가지고 있어야함.

foreground에서 slurmctld 데몬 실행해서 에러가 나오는지 확인.

sudo slurmctld -D

정상적으로 작동되는지 확인.

sudo systemctl start slurmctld
sudo systemctl enable slurmctld
sudo systemctl status slurmctld

저작자표시 비영리 변경금지 (새창열림)

'Computer Science > linux' 카테고리의 다른 글

사용자 계정 관리 및 조직의 구조화 툴 (LDAP) (0)	2024.05.17
Docker와 MariaDB연결하기 (0)	2022.08.24
conda 채널 추가 (0)	2022.02.22
conda proxy 에러 해결방법 (0)	2022.02.07
Jupyter notebook 설정 (0)	2020.11.03

구글 스프레드시트와 database 연동

바닐라스카이 2022. 4. 25. 09:05

2022. 4. 25. 09:05

여러 사람이 동시에 편집해서 사용할 수 있는 구글 스프레드시트에 데이터 베이스를 연동하면 접근성이 좋아짐.

단점은 스프레드시트의 appscript에서 각 함수가 실행 되는 시간을 max 30분으로 지정했는데

데이터의 양이 많아지면 time out으로 강제 종료 될 수 있음.

var connectionName = 'database_address:3306'; // 접속할 MySQL 서버의 IP와 Port(Default:3306)
var user = 'userID'; // MySQL 유저 ID
var userPwd = 'password'; // MySQL 유저 PW
var db = 'db_name'; // 접속할 MySQL DB명
var instanceUrl = 'jdbc:mysql://' + connectionName;
var dbUrl = instanceUrl + '/' + db + '?characterEncoding=UTF-8';

function exportDatabase() {
  query="select * from table"
  var start = new Date();
  var conn = Jdbc.getConnection(dbUrl, user, userPwd); // DB 연결
  var stmt = conn.createStatement();
  stmt.setMaxRows(5000);
  var results = stmt.executeQuery(query); // 쿼리
  var metaData = results.getMetaData()
  var numCols = metaData.getColumnCount();
  var sheetname = SpreadsheetApp.getActive();
  var sheet = sheetname.getSheetByName('sheet1');
  sheet.clearContents();

  var arr = [];
  for (var col = 0; col < numCols; col++) {
    arr.push(metaData.getColumnName(col + 1));
  }
  sheet.appendRow(arr); #write header

  while (results.next()) {
    arr=[];
    for (var col = 0; col < numCols; col++) {
    arr.push(results.getString(col + 1));
    } 
    sheet.appendRow(arr); #write data line by line
  }

  var end = new Date();
  Logger.log("Time spend : "+((end - start)/(1000*60) % 60).toFixed(3)+" min");
  //sheet.autoResizeColumns(1, numCols+1);

  results.close();
  stmt.close();
}

저작자표시 비영리 변경금지 (새창열림)

'Data Science > database' 카테고리의 다른 글

DBMS와 NoSQL의 차이점과 최신 트렌드: LIMS 데이터베이스 구축에 적합한 선택은? (1)	2024.10.28
ISO 27001을 활용한 의료 데이터베이스 보호 및 데이터 관리 체계 구축 가이드 (0)	2024.10.28
2023년과 2024년 개정 개인정보보호법에 따른 환자 데이터 보호와 정보 교환 방안 (3)	2024.10.28
OMOP CDM이란? (0)	2024.10.16

python 설치

바닐라스카이 2022. 4. 6. 16:13

2022. 4. 6. 16:13

./configure --enable-loadable-sqlite-extensions

configure와 make까지 진행했을 때 설치를 더 진행해도 되지만 아래 메세지를 확인하고 설치가 안되는 모듈이 있음을 확인해야함.

Python build finished successfully!
The necessary bits to build these optional modules were not found:
_tkinter                                                       
To find the necessary bits, look in setup.py in detect_modules() for the module's name.

환경 변수를 설정했음에도 _sqlite3 가 지속적으로 보여서 확인해보니 setup.py 파일에서 직접 수정을 해야 했음.

sqlite_incdir = sqlite_libdir = None
sqlite_inc_paths = [ '/usr/include']

저작자표시 비영리 변경금지 (새창열림)

'Computer Science > python' 카테고리의 다른 글

seaborn clustermap color label (0)	2022.05.24
flask_sqlalchemy (0)	2022.05.23
Progress bar 모듈 tqdm (0)	2022.03.07
pandas 활용하기 (0)	2022.02.18
logging 모듈 사용하기 (0)	2022.02.17

Progress bar 모듈 tqdm

바닐라스카이 2022. 3. 7. 14:19

2022. 3. 7. 14:19

tqdm을 사용해서 얼마나 진행되었는지 작업 진행 정도를 표시함.

전체 양을 알 때와 모를 때를 나눠서 표시 할 수 있음.

전체 양을 모를 때

from tqdm import tqdm
with open(filename) as f :
       for index, line in enumerate(tqdm(f, unit='reads', unit_scale=True, mininterval=1)):
               continue

결과물 :

58.9Mreads [00:24, 2.38Mreads/s]

3회 평균 소요 시간 21.7초

전체 양을 알 때

from tqdm import tqdm
with open(filename) as f :
        lines = f.readlines()
        for index, line in enumerate(tqdm(lines, total=len(lines), unit='reads', unit_scale=True, mininterval=1)):
                continue

결과물 :

100%|██████████████████████████████████████████████████████████| 58.9M/58.9M [00:15<00:00, 3.88Mreads/s]

3회 평균 소요 시간 30.6초

f.readlines() 함수로 파일 전체를 읽어서 사이즈를 계산하고 진행 했을 때는 이미 메모리에 내용이 올라왔기 때문에 시간 당 읽는 줄 수는 빠르지만 파일을 읽는데 드는 시간으로 인해 총 시간은 더 느림 하지만 전체 진행율을 알 수 있다는 장점이 있음.

저작자표시 비영리 변경금지 (새창열림)

'Computer Science > python' 카테고리의 다른 글

flask_sqlalchemy (0)	2022.05.23
python 설치 (0)	2022.04.06
pandas 활용하기 (0)	2022.02.18
logging 모듈 사용하기 (0)	2022.02.17
f-string을 활용한 regex 사용법 (0)	2022.02.15

conda 채널 추가

바닐라스카이 2022. 2. 22. 11:07

2022. 2. 22. 11:07

conda 설치 후 채널을 추가해야 최신 버전의 프로그램을 갱신해서 쓸 수 있다.

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

저작자표시 비영리 변경금지 (새창열림)

'Computer Science > linux' 카테고리의 다른 글

Docker와 MariaDB연결하기 (0)	2022.08.24
centos 8에 slurm 설치하기 (0)	2022.05.11
conda proxy 에러 해결방법 (0)	2022.02.07
Jupyter notebook 설정 (0)	2020.11.03
Centos yum repo 변경 (0)	2020.08.03

pandas 활용하기

바닐라스카이 2022. 2. 18. 10:43

2022. 2. 18. 10:43

import pandas as pd

df = pd.DataFrame()

#make dataframe from dictionary
tmp_df = pd.DataFrame([foo_dic], index=id)

#sum data by raw
total_count = tmp_df.sum(axis=1)[0]
tmp_df = tmp_df.div(total_count)

#concat multiple dataframe parellel
concat_df = pd.concat([df1,df2],axis=1).fillna(0)

#specific columns contain letter 'test'
df = df.loc[:,df.columns.str.contains('test', regex=True)]

#merge dataframe consider index
merge_df = pd.merge(df1, df2, left_index=True, right_index=True)

저작자표시 비영리 변경금지 (새창열림)

'Computer Science > python' 카테고리의 다른 글

python 설치 (0)	2022.04.06
Progress bar 모듈 tqdm (0)	2022.03.07
logging 모듈 사용하기 (0)	2022.02.17
f-string을 활용한 regex 사용법 (0)	2022.02.15
Primer 서열 분석을 위한 python 코드 (0)	2021.08.17

logging 모듈 사용하기

바닐라스카이 2022. 2. 17. 12:43

2022. 2. 17. 12:43

log 파일 작성 모듈 logging.

import logging

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
formatter = logging.Formatter('[%(levelname)s %(asctime)s] %(message)s',"%Y-%m-%d %H:%M:%S")

stream_handler = logging.StreamHandler()
stream_handler.setFormatter(formatter)
stream_handler.setLevel(logging.INFO)
logger.addHandler(stream_handler)

file_handler = logging.FileHandler(f'my.log')
file_handler.setFormatter(formatter)
file_handler.setLevel(logging.DEBUG)
logger.addHandler(file_handler)

logging.info(f'Read Database File')
logging.debug(f'Read Database File')

handler를 여러 개 만들어서 하나는 stdout 다른 하나는 my.log 파일로 만들고 level에 따라 출력 범위를 다르게 조절한다.

위의 예시에서는 'DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL', 중에 debug는 파일로만 생성되도록 설정되었다.

개발 단계에서는 stream_handler를 DEBUG로 놓고 진행하다가 개발 완료시 INFO로 수정하면 원하는 부분만 출력하도록 조정 가능하다.

a = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT).communicate()[0].decode('UTF-8')
logging.debug(f'{cmd}\n{a}')

subprocess와 연결해서 command를 입력하고 나오는 출력물까지 debug로 한 번에 연결 할 수 있다.

저작자표시 비영리 변경금지 (새창열림)

'Computer Science > python' 카테고리의 다른 글

Progress bar 모듈 tqdm (0)	2022.03.07
pandas 활용하기 (0)	2022.02.18
f-string을 활용한 regex 사용법 (0)	2022.02.15
Primer 서열 분석을 위한 python 코드 (0)	2021.08.17
String Format으로 길이 고정하기 (0)	2020.06.24

f-string을 활용한 regex 사용법

바닐라스카이 2022. 2. 15. 10:00

2022. 2. 15. 10:00

read 서열에서 error를 1이하로 허용하는 내에 BESTMATCH를 찾아 시작과 종료지점 그리고 매치되는 서열을 확인하는 코드. error는 mismatch, insertion, deletion을 의미한다.

import regex

primer_seq, read_seq

regex_primer_seq = fr'({primer_seq}{{e<=1}})'
match_object = regex.search(regex_primer_seq, read_seq, regex.BESTMATCH)

match_start, match_end = match_object.span()
match_seq = match_object.captures()

저작자표시 비영리 변경금지 (새창열림)

'Computer Science > python' 카테고리의 다른 글

pandas 활용하기 (0)	2022.02.18
logging 모듈 사용하기 (0)	2022.02.17
Primer 서열 분석을 위한 python 코드 (0)	2021.08.17
String Format으로 길이 고정하기 (0)	2020.06.24
python multi-level argparse (0)	2019.07.12

PREV 이전 1 ···4 5 6 7 8 9 10 ···20 NEXT 다음

Be great