[TensorFlow1.0] Cancer classification using gene expression

2019. 11. 20. 14:42

[TensorFlow1.0] Cancer classification using gene expression Start

BioinformaticsAndMe

[TensorFlow] Cancer classification using gene expression

: TensorFlow 1.0 에서 수행되는 유전자 발현 기반의 Cancer classification 과정

*유전자 발현에 근거하여, Cancer와 Normal을 구분하는 모델링 수행

: 아래 캐글(Kaggle) 사이트에는 '백혈병 AML/ALL'을 예측하는 다양한 텐서플로우 모델들이 존재

*https://www.kaggle.com/varimp/gene-expression-classification

[TensorFlow] 유전자 발현에 근거한 암 분류 모델 예제

: Bladder~Stomach → 해당 조직의 발현값

: Cancer_1 → 1(암환자), 0(정상인)

: 아래 예제 파일은 첫번째 행(Annotation)을 필터한 상태

Example_expression.csv

Bladder	Breast	Brain	Colon	Kidney	Liver	Lung	Pancreas	Ovary	Stomach	Cancer_1
6.0588	2.8661	3.0927	6.0077	2.8955	5.5982	2.8897	6.1144	5.3427	8.1293	0
4.4945	2.2051	0.9343	5.2312	1.787	7.3889	2.5828	3.4843	5.5047	6.8844	0
6.2015	3.66	2.9109	6.6927	1.5316	6.6405	3.6972	4.4913	5.4835	8.8552	0
6.2664	1.5854	4.9801	5.3313	2.8055	7.7239	4.2165	3.8075	5.8058	8.6699	1
5.904	2.1047	4.3618	5.9225	1.3454	6.388	3.1604	5.6396	5.1506	8.3964	1

# '__future__' : python 2에서 python 3 문법 사용 가능
from __future__ import absolute_import, division, print_function

# 텐서플로우, 넘파이 라이브러리 임포트
import tensorflow as tf
import numpy as np

# 유전자 발현 데이터 로딩 및 배열 확인
xy = np.loadtxt('Example_expression.csv', delimiter=',', dtype=np.float32)
x_data = xy[:, 0:-1]
y_data = xy[:, [-1]]

print(x_data, y_data)
print(x_data.shape, y_data.shape)
[[6.0588 2.8661 3.0927 ... 6.1144 5.3427 8.1293]
 [4.4945 2.2051 0.9343 ... 3.4843 5.5047 6.8844]
 [6.2015 3.66   2.9109 ... 4.4913 5.4835 8.8552]
 ...
 [6.6107 0.7146 2.6828 ... 4.8744 4.8949 6.7165]
 [6.9365 2.5683 5.0457 ... 4.89   5.7063 7.8359]
 [5.9141 2.676  2.1013 ... 4.3999 6.5358 8.5825]] [[0.]
 [0.]
 [0.]
 ...
 [1.]
 [1.]
 [1.]]
(5000, 10) (5000, 1)

# 프로그램 실행 순간에 변수값을 입력하기 위해 placedholder 함수 사용
X = tf.placeholder(tf.float32, shape=[None, 10])
Y = tf.placeholder(tf.float32, shape=[None, 1])

# 텐서플로우에서 학습될 W(Weight) 및 b(bias) 값을 변수(Variable) 노드로 정의
# W와 b 값의 초기값 정보가 없기에 랜덤하게 값을 설정
W = tf.Variable(tf.random_normal([10,1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')

# sigmoid 함수를 이용한 가설 정의
hypothesis = tf.sigmoid(tf.matmul(X, W) + b)

# cost function 정의 (reduce_mean 함수로 평균 계산)
cost = tf.reduce_mean(Y*tf.log(hypothesis + 0.001) + (1-Y)*tf.log(1-hypothesis+0.001))

# 최적화를 위한 경사하강법 정의
train = tf.train.GradientDescentOptimizer(learning_rate=0.0001).minimize(cost)

# hypothesis 값이 0.5 이상이면 'true'로 예측
predicted = tf.cast(hypothesis > 0.5, dtype=tf.float32)

# 정확도 연산
accuracy = tf.reduce_mean(tf.cast(tf.equal(predicted, Y), dtype=tf.float32))

# 세션을 생성하고 그래프 실행
with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  for step in range(10001):
    cost_val, _ = sess.run([cost, train], feed_dict={X:x_data, Y:y_data})
    if step%1000==0:
      print(step, cost_val)

# 정확도 결과 출력
  h, c, a = sess.run([hypothesis, predicted, accuracy], feed_dict={X:x_data, Y:y_data})

  print("\nHypothesis: ", h, "\nCorrect (Y): ", c, "\nAccuracy: ", a)
0 -3.9441097
1000 -3.9540856
2000 -3.9601758
3000 -3.96437
4000 -3.9674852
5000 -3.9699223
6000 -3.9718964
7000 -3.973543
8000 -3.9749436
9000 -3.9761567
10000 -3.9772208

Hypothesis:  [[1.5199184e-06]
 [3.5762787e-07]
 [4.1723251e-07]
 ...
 [2.3841858e-07]
 [2.9802322e-07]
 [0.0000000e+00]] 
Correct (Y):  [[0.]
 [0.]
 [0.]
 ...
 [0.]
 [0.]
 [0.]] 
Accuracy:  0.4216

#Reference

1) https://github.com/aymericdamien/TensorFlow-Examples/tree/master/tensorflow_v2

2) https://medium.com/@manjabogicevic/multiple-linear-regression-using-python-b99754591ac0

3) http://contents.kocw.net/document/ch5_6.pdf

4) https://www.kaggle.com/varimp/gene-expression-classification

5) 2019년도 유전체 분석 분야 재직*연구자 전문교육

[TensorFlow1.0] Cancer classification using gene expression End

BioinformaticsAndMe

저작자표시 (새창열림)

'Machine Learning' 카테고리의 다른 글

[TensorFlow] 이미지 분류 신경망 (0)	2019.12.06
[TensorFlow1.0] 인공신경망 (Artificial neural network) 기초 (0)	2019.11.29
[TensorFlow1.0] Multiple Linear Regression (0)	2019.11.11
차원의 저주 (Curse of dimensionality) (0)	2019.11.04
Feature selection vs Feature extraction (0)	2019.10.29

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

BioinformaticsAndMe

[TensorFlow1.0] Cancer classification using gene expression

'Machine Learning' 카테고리의 다른 글

+ Recent posts

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역