Volume 3, Issue 3, May 2018, Page: 67-76
Research on Feature Selection in Power User Identification
Qiu Yanhao, College of Engineering, Virginia Polytechnic Institute and State University, Virginia, The United States
Song Xiaoyu, School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou, China
Sun Xiangyang, School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou, China
Zhao Yang, School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou, China
Received: Apr. 18, 2018;       Accepted: May 7, 2018;       Published: Jun. 1, 2018
DOI: 10.11648/j.mcs.20180303.11      View  1015      Downloads  75
Abstract
In the previous study of user identification, most of the researchers improved the recognition algorithm. In this paper, we use large data technology to extract electricity feature from different angles and study the impact of different features on recognition. Firstly, the raw data was cleaned. In order to obtain the key information of power theft user identification, the features of the data set are extracted from three aspects: basic attribute feature, statistical feature under different time scale and similarity feature under different time scale. Then we use feature sets of different combinations to carry out experiments under the KNN model, the random forest (RF) model and the XGBoost model. The experimental results show that the experimental results of the BF+SF+PF feature set in the three classifiers are obviously better than the other two feature sets. Therefore, it is concluded that different features have obvious effects on the recognition results.
Keywords
Feature Selection, Power User Identification, KNN, Random Forest, XG Boost
To cite this article
Qiu Yanhao, Song Xiaoyu, Sun Xiangyang, Zhao Yang, Research on Feature Selection in Power User Identification, Mathematics and Computer Science. Vol. 3, No. 3, 2018, pp. 67-76. doi: 10.11648/j.mcs.20180303.11
Copyright
Copyright © 2018 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Reference
[1]
Song Y, Zhou G, Zhu Y. Present Status and Challenges of Big Data Processing in Smart Grid. Power System Technology, 2013, 37(4):927-935.
[2]
Tan Z. Design and implementation of online abnormal electricity utilization and risk monitoring system based on electricity behavior analysis. South China University of Technology, 2015.
[3]
Chen W, Chen Y, Qiu L, et al. Analysis of anti-stealing electric power based on big data technology. Journal of Electronic Measurement & Instrumentation, 2016.
[4]
Zhuang C, Zhang B, Jun H U, et al. Anomaly Detection for Power Consumption Patterns Based on Unsupervised Learning. Proceedings of the Csee, 2016.
[5]
Zhou L, Zhao L, Gao W. Application of Sparse Coding in Detection for Abnormal Electricity Consumption Behaviors. Power System Technology, 2015
[6]
Monedero I, Biscarri F, León C, et al. Detection of frauds and other non-technical losses in a power utility using Pearson coefficient, Bayesian networks and decision trees. International Journal of Electrical Power & Energy Systems, 2012, 34(1): 90-98.
[7]
Jian F J, Cao M, Wang L, et al. SVM Based Energy Consumption Abnormality Detection in AMI System. Electrical Measurement & Instrumentation, 2014.
[8]
Chen C, Cook D J. Energy Outlier Detection in Smart Environments// Artificial Intelligence and Smarter Living: the Conquest of Complexity, Papers From the 2011 AAAI Workshop, San Francisco, California, Usa, August. 2011.
[9]
Nizar A H, Dong Z Y, Wang Y. Power utility nontechnical loss analysis with extreme learning machine method. IEEE Transactions on Power Systems, 2008, 23(3): 946-955.
[10]
Geng Y J, Zhang J Y, Yuan X G. A feature relevance measure based on sparse representation coefficient. Pa- ttern Recognition & Artificial Intelli- gence, 2013, 26(1):106-113.
[11]
Zhang Y, Shang C. Combining Newton interpolation and deep learning for image classification. Electronics Letters, 2015, 51(1):40-42.
[12]
Kong Y H, Jing M L. Research of the Classification Method Based on Confu- sion Matrixes and Ensemble Learning. Computer Engineering & Science, 2012, 34(6):111-117.
[13]
Song Y F, Wang X D, Lei L. Evaluating evidence reliability based on confusion matrix. XI Tong Gong Cheng Yu Dian Zi Ji Shu/systems Engineering & Elec- tronics, 2015, 37(4):974-978.
[14]
Huang Y A, You Z H, Gao X, et al. Using Weighted Sparse Representation Model Combined with Discrete Cosine Transformation to Predict Protein-Protein Interactions from Protein Sequence. Biomed Research International, 2015, 2015:902198.
[15]
Song H L, He J, Huang P X, et al. Application of parametric method and non-parametric method in estimation of area under ROC curve. Academic Journal of Second Military Medical University, 2006, 27(7):726-728.
[16]
Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters, 2006, 27(8):861-874.
[17]
Chen T, Guestrin C. XGBoost:A Scalable Tree Boosting System// ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016:785-794.
[18]
Zhang L, Zhan C. Machine Learning in Rock Facies Classification: An Applic- ation of XGBoost// International Geophysical Conference, Qingdao, China, 17-20 April. 2017: 1371-1374.
Browse journals by subject