Articles & Issues
- Language
- korean
- Conflict of Interest
- In relation to this article, we declare that there is no conflict of interest.
- Publication history
-
Received July 28, 2010
Accepted September 1, 2010
- This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/bync/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright © KIChE. All rights reserved.
All issues
유기물의 인화점 예측을 위한 부분최소자승법과 SVM의 비교
Comparison of Partial Least Squares and Support Vector Machine for the Flash Point Prediction of Organic Compounds
서울대학교 화학생물공학부, 151-742 서울시 신림동 산56-1 1광운대학교 화학공학과, 139-701 서울시 노원구 월계동 447-1 2충주대학교 화공생물공학과, 380-702 충북 충주시 대학로 50
Department of Chemical and Biological Engineering, Seoul National University, San 56-1, Shilim-dong, Gwanak-gu, Seoul 151-742, Korea 1Department of Chemical Engineering, Kwangwoon University, 447-1 Wolgye-dong, Nowon-gu, Seoul 139-701, Korea 2Department of Chemical and Biological Engineering, Chungju National University, 50 Daehak-ro, Chungju-si, Chungbuk 380-702, Korea
glee@cjnu.ac.kr
Korean Chemical Engineering Research, December 2010, 48(6), 717-724(8), NONE Epub 11 January 2011
Download PDF
Abstract
액체의 화재 및 폭발위험을 나타내는 가장 중요한 물성의 하나인 인화점의 실험 데이터는 그 필요에도 불구하고 실제로 데이터를 확보하는 것이 가능하지 않은 경우가 많다. 이 연구에서는 DIPPR 801에서 얻은 893개 유기물의 인화점 실험데이터로부터 인화점을 예측하는 부분최소자승법(PLS) 및 support vector machine(SVM) 모델을 만들고 비교하였다. 분자를 구성하는 각 구성요소들이 분자의 물성에 일정한 기여를 한다는 가정을 이용하여 분자의 물성을 예측하는 방법인 그룹기여법을 이용하여 65개 작용기가 이 예측모델의 독립변수가 되었고 분자량의 로그값이 추가되었다. 두 모델에서 결정해야 할 매개변수는 교차검증에서 계산된 오차를 이용하여 결정되었는데, SVM모델은 그 매개변수가 많아 particle swarm optimization을 이용한 최적화를 이용하였다. 훈련데이터의 선택이 예측성능에 영향을 줄 수 있_x000D_
어 임의로 100개의 데이터 세트를 생성하여 테스트하였다. 전체 데이터에 대해 계산된 평균절대오차는 PLS가 13.86~14.55였고, SVM이 7.44~10.26여서 SVM이 PLS에 비해 매우 우수한 예측성능을 보였다.
The flash point is one of the most important physical properties used to determine the potential for fire and explosion hazards of flammable liquids. Despite the needs of the experimental flash point data for the design and construction of chemical plants, there is often a significant gap between the demands for the data and their availability. This study have built and compared two models of partial least squares(PLS) and support vector machine(SVM) to predict the experimental flash points of 893 organic compounds out of DIPPR 801. As the independent variables of the models,_x000D_
65 functional groups were chosen based on the group contribution method that was oriented from the assumption that each fragment of a molecule contributes a certain amount to the value of its physical property, and the logarithm of molecular weight was added. The prediction errors calculated from cross-validation were employed to determine the optimal parameters of two models. And, an optimization technique should be used to get three parameters of SVM model. This work adopted particle swarm optimization that is one of heuristic optimization methods. As the selection of training data can affect the prediction performance, 100 data sets of randomly selected data were generated and tested. The PLS and SVM results of the average absolute errors for the whole data range from 13.86 K to 14.55 K and 7.44 K to 10.26 K, respectively, indicating that the predictive ability of the SVM is much superior than PLS.
Keywords
References
Katritzky AR, Petrukhin R, Jain R, Karelson M, J. Chem. Inf. Comput. Sci., 41, 1521 (2001)
Crowl DA, Louvar JF, Chemical Process Safety: Fundamentals with Applicatoins, 2nd Ed., Prentice Hall, Upper Saddle River, NJ (2001)
Vidal M, Rogers WJ, Holste JC, Mannan MS, Process Saf. Prog., 23, 47 (2004)
Suzuki T, Ohtaguchi K, Koide K, J. Chem. Eng. Jpn., 24, 258 (1991)
Tetteh J, Suzuki T, Metcalfe E, Howells S, J. Chem. Inf. Comput. Sci., 39, 491 (1999)
Katritzky AR, Stoyanova-Slavova IB, Dobchev DA, Karelson M, J. Mol. Graph. Model., 26, 529 (2007)
Gharagheizi F, Alamdari RF, QSAR Comb. Sci., 27, 679 (2008)
Pan Y, Jiang J, Wang R, Cao H, Zhao J, QSAR Comb. Sci., 27, 1013 (2008)
Patel SJ, Ng D, Mannan MS, Ind. Eng. Chem. Res., 48(15), 7378 (2009)
http://michem.disat.unimib.it/mole_db/
Constantinou L, Gani R, AIChE J., 40(10), 1697 (1994)
Wen X, Qiang Y, Ind. Eng. Chem. Res., 40(26), 6245 (2001)
Albahri TA, Ind. Eng. Chem. Res., 42(3), 657 (2003)
Kolska Z, Kukal J, ZAbransk M, Ruzicka V, Ind. Eng. Chem. Res., 47(6), 2075 (2008)
Lee CJ, Lee G, So W, Yoon ES, Korean J. Chem. Eng., 25(3), 568 (2008)
http://dippr.byu.edu/.
Lee HD, Lee MH, Cho HW, Han C, Chang KS, HWAHAK KONGHAK, 35(5), 605 (1997)
Russell EL, Chiang LH, Braatz RD, Data-driven Techniques for Fault Detection and Diagnosis in Chemical Processes, Springer-Verlag, London (2000)
Vapnik VN, The Nature of Statistical Learning Theory, Springer-Verlag, New York, NY (1995)
ttp://www.csie.ntu.edu.tw/~cjlin/libsvm/.
Schwaab M, Biscaia EC, Monteiro JL, Pinto JC, Chem. Eng. Sci., 63(6), 1542 (2008)
Crowl DA, Louvar JF, Chemical Process Safety: Fundamentals with Applicatoins, 2nd Ed., Prentice Hall, Upper Saddle River, NJ (2001)
Vidal M, Rogers WJ, Holste JC, Mannan MS, Process Saf. Prog., 23, 47 (2004)
Suzuki T, Ohtaguchi K, Koide K, J. Chem. Eng. Jpn., 24, 258 (1991)
Tetteh J, Suzuki T, Metcalfe E, Howells S, J. Chem. Inf. Comput. Sci., 39, 491 (1999)
Katritzky AR, Stoyanova-Slavova IB, Dobchev DA, Karelson M, J. Mol. Graph. Model., 26, 529 (2007)
Gharagheizi F, Alamdari RF, QSAR Comb. Sci., 27, 679 (2008)
Pan Y, Jiang J, Wang R, Cao H, Zhao J, QSAR Comb. Sci., 27, 1013 (2008)
Patel SJ, Ng D, Mannan MS, Ind. Eng. Chem. Res., 48(15), 7378 (2009)
http://michem.disat.unimib.it/mole_db/
Constantinou L, Gani R, AIChE J., 40(10), 1697 (1994)
Wen X, Qiang Y, Ind. Eng. Chem. Res., 40(26), 6245 (2001)
Albahri TA, Ind. Eng. Chem. Res., 42(3), 657 (2003)
Kolska Z, Kukal J, ZAbransk M, Ruzicka V, Ind. Eng. Chem. Res., 47(6), 2075 (2008)
Lee CJ, Lee G, So W, Yoon ES, Korean J. Chem. Eng., 25(3), 568 (2008)
http://dippr.byu.edu/.
Lee HD, Lee MH, Cho HW, Han C, Chang KS, HWAHAK KONGHAK, 35(5), 605 (1997)
Russell EL, Chiang LH, Braatz RD, Data-driven Techniques for Fault Detection and Diagnosis in Chemical Processes, Springer-Verlag, London (2000)
Vapnik VN, The Nature of Statistical Learning Theory, Springer-Verlag, New York, NY (1995)
ttp://www.csie.ntu.edu.tw/~cjlin/libsvm/.
Schwaab M, Biscaia EC, Monteiro JL, Pinto JC, Chem. Eng. Sci., 63(6), 1542 (2008)