After the P2P company suffered a cold spell last year, it said that it would transform itself into a fashionable Fintech financial technology company by using big data, machine learning and other technological means to serve finance. However, the implementation process is not simple, letting the machine replace people to process data to make judgments, and it has only just begun in the domestic financial sector.
CreditX is a company that uses a machine to learn to do ventures in contact with Lei Feng Network (search for “Lei Feng Net†public number) , and has a lot of experimental experience and thinking in the financial field. Mr. Zhu Mingjie, the founder of Lucent, recently delivered a speech at the Langdi China Summit and talked about the difficulties in applying machine learning to finance and how to enhance the interpretability of the model. The following is edited from the content of the speech, with deletions.
I have done machine learning for more than a decade, using machines instead of people to process data to make decisions and judgments. Over the past decade of machine learning, successful applications have been on the Internet, search, advertising, and recommendation. It can be said that the Internet has taken the lead in reaching the data age. To the field of financial innovation, how to achieve Internet-level machine learning and artificial intelligence, everyone has just started. I would like to talk about some of the experiences and thoughts of our CreditX practice of Internet-level machine learning in the financial field.
The pain points of financial risk control
I always think that "the progress of science and technology is forced out of business needs." In the past, we were forced to rely on algorithms and machines in the Internet industry. Why, because the volume of data is too large, you want to search for a mobile phone shell, and let Ali's classmates learn from hundreds of millions of products. Like most suitable, it is impossible. In a traditional financial scenario, a loan of 1 million yuan depends mainly on risk control personnel and relationships. It is feasible. At the bank’s credit card center, the backlog of applications is reviewed and the reviewers are allowed to work overtime every week.
Now that Internet finance is facing a more inclusive situation, such as a mobile phone loan of several hundred yuan, it must be impossible to rely on shop-floor personnel. Therefore, this is not only a matter of improving operational efficiency, but it is necessary to hand over the work to the machine, let the machine learn the experience of people's risk control, and the robot becomes a risk control expert.
Difficulties in the application of machine learning and artificial intelligence in the financial sector
The first problem is too little data . Because financial data is very sparse and many forms of financial products have not occurred before, there is no more than a decade of data accumulation. In other words, there is a lack of training data, which is also called cold start and lack of data. In addition, bad debts in the financial sector range from as little as a month to a few months, and data accumulation needs to wait for a long time. In contrast, the Internet search field can quickly get click feedback, which is quite different. So missing data is a huge obstacle to the machine to learn human experience.
The second is too much data. This refers to a large number of data feature dimensions that exceed human processing power. Traditional finance has only a dozen dimensions of characteristic variables, which can be adjusted manually. But now with so many dimensions of data, everyone also thinks a lot of good visions and discusses a lot of data that can be used. But why not use it? The question is whether we have a way to have a strong ability to express these primitive data features that can also be called weak variables. Combining the weak feature data with the results, the intuitive experience can be understood and let the risk control expert feedback.
In a financial scenario, it is not like a black box like internet machine learning. A bunch of data is thrown in, and the results are fed back. Within the financial scene, special emphasis is placed on the interpretability of the model in order to link people's risk control experience and intuitive feelings with data performance results. On this basis, we can say that people's experience is involved in the use of data for machine learning modeling operations. To be able to trace back characteristics, especially financial feedback results have to wait for a long time, people need to be able to quickly intervene feedback.
How to solve the cold start problem of financial risk control
Too little dataFor problems with too little data and too slow production, the cold start problem is a very typical case. We often face the problem of lack of data in the Internet industry, and we also have accumulated mature experience, which is to superimpose human factors into the machine learning process. When we do search advertising, we will ask people to annotate the data, and then guide the algorithm engineer to optimize the algorithm through the data annotation experts to improve the sorting results. In the financial scenario, we have many ready-made experiences and experienced risk control personnel. These experts have strong knowledge of risk control.
In theory, if there are hundreds of risk control experts, and we do not have to pay wages, we can still do mobile credit, but the reality is that we must rely on machines to learn from people's experience in risk control. Therefore, we use semi-supervised learning methods to combine business risk control experts with actual credit results in online learning. In this process, the risk control personnel can intervene in real time, constantly making some adjustments based on the output results, and then feedback in real time to the iterative process of model training.
This means that we attach special importance to human factors. Now everyone is talking about artificial intelligence. What is the nature of artificial intelligence? In my understanding is to let the machine learn people's experience . Previously we depended on a few experienced risk control personnel. Now we can let the machine learn the human experience and let the machine make an automatic decision.
Financial business results and samples are very precious. For example, I have previously accumulated some samples of the mortgage business and then switched to a new consumer credit business or switched from one consumer credit business to another. These precious sample data can not be lost, but how to use it? We can use existing experience and knowledge as much as possible to separate the generic core model of risk from domain knowledge, and then combine the knowledge based on the business scenario and prior knowledge in the scenario to learn and recover on this basis. Use cross-domain cross-scene knowledge and knowledge accumulation.
The Difficulties of Deep Learning Technology to Solve Feature Engineering
Too much dataNext we look at "too much data". I divided this question into two parts.
The first is that there are many characteristic dimensions of data. We are concerned with how to link big data with financial risk control issues. In fact, this requires very powerful feature processing and expression capabilities. This is a traditional linear regression statistical modeling method is difficult to complete. There are many ways for us. This includes "deep learning", which is now very popular. The essence of deep learning is to learn how people process knowledge and data through the processing of data features. In order to solve the problem of too much data, people can see the vastness of the original data. At the front end of the model, we tried different depth feature coding methods. The unsupervised learning method preprocesses the original data to reduce the dimension of the feature. , hook up the vast original data and the final result.
Interpretability of the modelThe second is the interpretability of the model. Financial experts are particularly concerned with the interpretability of the model. There are two meanings here:
If the credit object is given a score result, if it cannot be explained, it is difficult to communicate with the applicant;
In addition, we are faced with a very complicated environment. If the result of the risk control is still a black box in the black box, the risk is difficult to control and estimate.
If the model fails, the resulting risk hole is something we cannot afford. In the context of the rapid growth of Internet finance business, it is very likely that the company's business will not be able to continue. Therefore, the method of black boxes entering the black box on the Internet does not apply to financial scenarios. It needs an interpretable local model to do this. Our practical experience is to use LIME to capture key variables in the results or local results , and then let the risk control experts quickly grasp which features lead to changes in results .
Achieved results
We have made some difficult attempts in the financial context of the Internet's technical experience and gained some practical experience, from the initial data acquisition and processing, to the involvement of people, to the intervention of complex models, and finally to the formation of Our practice.
In terms of efficiency, one of our partners has achieved very good results. They set up a financial credit scene and deployed them on the system and model of Xunxin. They only needed 3-4 business risk control and operation personnel. Most of the work of risk control was left to the machine.
In addition, from the effect point of view, we use the DNN model to make a result, we can see that the result of the traditional LR model ks value increased from 0.19 to 0.43. Numbers and results are the most direct answer to the people we model. There is nothing to say about the concept.
Everyone had high expectations of big data, and was repeatedly disappointed. Now it is actually a good time for data technology. Because everyone really needs the ability to use data and use machines to solve financial problems, this is also an opportunity and an outlet for our time. It is also a new beginning.
Next month on the 12th and 13th, Lei Fengnet will hold an unprecedented artificial intelligence and robotics summit in Shenzhen. At that time, we will publish the list of “Intelligent Intelligence & Robots Top 25 Innovative Companies Listâ€. We are collecting and confirming this. High-quality projects in several areas such as AI, robotics, autopilot, and drones. If your project is related to the field and there are enough technical barriers and sufficient growth, please contact us.