信用评分卡(附代码,博主录制)
There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.
– Albert Einstein
生活只有两种方式。一个好像什么都不是奇迹。另一个就好像一切都是奇迹。 - 艾尔伯特爱因斯坦
A Commentary on Curiosity
I think the best way to appreciate and enjoy the trivial is to travel. When I say trivial, it includes doorknobs, posters, letterboxes, graffiti and everything we never bother to turn our heads for in our own city. I experienced the same last week while traveling with my wife across Florence and Tuscany. I think one’s level of awareness and curiosity goes up many-fold while traveling. In Florence, we stayed at a lovely bed-and-breakfast named Fiorenza. The breakfast was good and the people even better. There we met this amicable family from the UK with a year old baby named Owen and his 7-year-old sister Kyra. Owen and Kyra were playing hide and seek while having their breakfast. Kyra hid behind the same chair repeatedly and jumped out to reveal herself to her younger brother. Owen was pleasantly surprised every time during this process. All humans are born curious. However, they lose it as they grow older and get familiar with things. The phenomenon could be the reason why we never turn our heads for the trivial in our own city.
我认为欣赏和享受琐事的最佳方式是旅行。当我说琐碎的时候,它包括门把手,海报,信箱,涂鸦以及我们从未在我们自己的城市中转过头来做的一切。上周我与妻子一起在佛罗伦萨和托斯卡纳旅行时经历了同样的经历。我认为一个人的意识水平和好奇心在旅行时会增加很多倍。在佛罗伦萨,我们住在一个可爱的住宿加早餐,名为Fiorenza。早餐很好,人们甚至更好。在那里,我们遇到了这个来自英国的友好家庭,一个名叫Owen的婴儿和他7岁的妹妹Kyra。欧文和凯拉在吃早餐时玩捉迷藏。凯拉反复躲在同一把椅子后面,跳出来向她的弟弟透露自己。欧文在这个过程中每次都感到惊喜。所有人都天生好奇。然而,随着年龄的增长和熟悉事物,他们会失去它。这种现象可能是我们永远不会为自己城市中的琐事而烦恼的原因。
Curiosity and Data Science Career
Being curious and aware requires constant energy and effort. Perhaps, humans have the natural tendency to slip into a low energy state. Nonetheless, this is particularly dangerous for analysts since their job requires finding meaning in something that seems mundane to others. In my opinion, the biggest challenge for analytics is not the sophistication of statistical algorithms and enhancement of computing power, but for its practitioners to stay curious and constantly ask questions. Zen Buddhists try to achieve cosmic awareness by living in the moment. If that is too difficult, I would recommend that treat your job like a wonderful travel destination and be a good tourist – curious and aware.
Ok, so that was a bit of a detour from our original discussion on scorecards. However, there are a couple of reasons for telling you the above: primarily, to tell you why I was late in posting this part of the series. Secondly, I would like us to have a discussion on the importance and challenges of being curious at work and life in general. I already have a few examples in mind i.e. Louis Pasteur and Edward Lorenz but that is for later.
Now, let’s continue with the topic for this part i.e. model evaluation.
好奇心与数据科学事业
充满好奇和意识需要不断的精力和努力。也许,人类有自然倾向于陷入低能量状态。尽管如此,这对分析师来说尤其危险,因为他们的工作需要在对他人而言看似平凡的事情中找到意义。在我看来,分析的最大挑战不是统计算法的复杂性和计算能力的提高,而是让其从业者保持好奇并不断提出问题。禅宗佛教徒试图通过生活在当下来实现宇宙意识。如果这太难了,我建议把你的工作当作一个很棒的旅游目的地,做个好游客 - 好奇又有意识。好的,所以这与我们对记分卡的原始讨论有点迂回。但是,有几个原因告诉你上面的内容:主要是告诉你为什么我在发布这个系列的这一部分时迟到了。其次,我希望我们讨论一般对工作和生活充满好奇的重要性和挑战。我已经有一些例子,即路易斯巴斯德和爱德华洛伦兹,但这是为了以后。
现在,让我们继续讨论这个部分的主题,即模型评估。
Model Validation & Evaluation
When I was in high school, I joined a cricket academy during the summer vacations. Cricket is a game quite similar to baseball. I shall use baseball terminology in parenthesises for everyone to understand. The design of the training camp was to train for about a month followed by a full game with kids at same skill-level from another club. There was this tall and lean kid with us in the camp; he was the star bowler (pitcher) throughout during the training sessions. He used to bowl (pitch) some of the best Yorkers (curve balls). We were quite sure he would outperform everyone in the game. We ask him to open the bowling, his first bowl went for a six (home run) followed by several more. Maybe it was a mix match pressure, expectations, and the crowd but his performance was an absolute disaster. Later the coach told us what happened was not unusual and he had seen this several times before. At higher levels, the game is played not on the ground but the space between the ears. Clearly, he was referring to players’ presence of mind and temperament.
当我在高中时,我在暑假期间加入了板球学院。 Cricket是一款与棒球非常相似的游戏。我将在括号中使用棒球术语,让每个人都能理解。训练营的设计是训练大约一个月,然后与来自另一个俱乐部的相同技能水平的孩子进行完整的比赛。在营地里有一个高大瘦弱的孩子和我们在一起;在训练期间,他一直是明星投手(投手)。他过去常常把一些最好的Yorkers(曲线球)弄成一团糟。我们非常肯定他会在游戏中胜过每个人。我们要求他打开保龄球,他的第一个碗去了六个(本垒打),然后是几个。也许这是混合比赛压力,期望和人群,但他的表现是绝对的灾难。后来教练告诉我们发生的事情并不罕见,他以前曾多次见过这件事。在更高的级别,游戏不是在地面上播放,而是在耳朵之间的空间播放。显然,他指的是球员的思想和气质。
Sampling Strategy for Model Validation
As the famous saying goes, the test of the pudding is in the eating. One could be a star on the training fields but a complete flop in the match situation. The same is true for an analytical model as well. A model, after going through a round of training () goes through a several rounds of testing.
1. Out of sample test: remember , where we have divided our sample into the training and the test sample. The first level of testing happens on the holdout or test sample. The test sample needs to perform as well as the training sample. Let us come back to this in the next section when I will discuss the measures for performance and ROC curve.
2. Out of time sample test: since the model was built on a sample of the portfolio with reasonable vintage (), the analyst would like to test the performance of a more recent portfolio. The number of bad borrowers (90+ DPD) in this out of time sample will be certainly less but the overall trend of good/bad ratio against scores will still be a good indicator for model performance. Additionally, the analyst could relax the condition for bad loans and consider 30+ DPD as bad. Again, the overall trend should match the scorecard estimations.
3. On field test: this is where the test of the pudding is; the analyst needs to be completely aware of any credit policy changes that the bank has gone through since the scorecard is developed and more importantly, the impact the changes will have on the scorecard. Always remember not every policy change will influence the scorecard – a good business understanding and a bit of common sense really help here. A regular monitoring and accordingly calibrating the scorecard is a good way to keep it updated.
正如俗名所说,布丁的考验就在于吃。一个人可能是训练场上的明星,但在比赛情况下完全失败了。对于分析模型也是如此。经过一轮训练(系列的第5部分)后,模型经过了几轮测试。
1.train VS test样品外测试:记住第2条,我们将样品分成培训和测试样品。第一级测试发生在保持或测试样本上。测试样本需要与训练样本一样好。让我们在下一节回到这一点,我将讨论性能和ROC曲线的措施。
2.OOT超时样本测试:由于该模型是基于合理年份的投资组合样本(参见第2部分),因此分析师希望测试最近投资组合的表现。在这段时间样本中,不良借款人(90+ DPD)的数量肯定会减少,但是对比分的好/坏比率的整体趋势仍将是模型表现的良好指标。此外,分析师可以放松不良贷款的条件,并认为30+ DPD是坏的。同样,整体趋势应该与记分卡估计相匹配。
3.政策变化对模型影响大
场景测试:这是布丁测试的地方;分析师需要完全了解银行自开发记分卡以来所经历的任何信贷政策变化,更重要的是,变更将对记分卡产生的影响。永远记住不是每个政策变化都会影响记分卡 - 良好的商业理解和一些常识在这里真的很有帮助。定期监控并相应地校准记分卡是保持更新的好方法。
Performance Tests for Model Validation
There are several ways to test the performance of the scorecard such as confusion matrix, KS statistics, Gini and area under ROC curve (AUROC) etc. The KS statistics is widely used metric in scorecards development. However, I personally prefer the AUROC to the others. I must add the Gini is a variant of the AUROC. The reason for my liking of the AUROC could be my formal training in Physics and engineering. I think it is a more holistic measure and lets the analyst visually analyze the model performance. I prefer graph and visual statistics any day to raw numbers.
有几种方法可以测试记分卡的性能,例如混淆矩阵,KS统计,基尼系数和ROC曲线下面积(AUROC)等.KS统计量是记分卡开发中广泛使用的度量标准。 但是,我个人更喜欢AUROC和其他人。 我必须添加Gini是AUROC的变种。 我喜欢AUROC的原因可能是我在物理和工程方面的正式培训。 我认为这是一个更全面的衡量标准,让分析师可以直观地分析模型的表现。 我更喜欢图形和视觉统计数据,以及原始数字。
The adjacent graph shows a ROC. The two axes on the curve are true and false positive rates. As expected, the plot informs about the level of prediction for the model. A perfect model will perfectly segregate good and bad cases. Hence, you will get 100% true positives in the beginning (i.e. absolute lift) as shown with the green curve in the graph. However, like anything in life perfection does not exist. As they say – If it is too good to be true it probably is. On the other extreme is a worthless model, curve marked in red. Anything close to or below the red curve is as good as tossing a coin, then why to bother with the effort to build a model. Finally, a typical scorecard ROC will look like the blue curve. The AUROC for a usual credit-scoring model is within 70 to 85, higher the better. However, for some fraud and insurance models, a slightly above 60 is an acceptable ROC. Again, analysts should be sure about the business benefits from the scorecard before finalizing the ROC. A simple cost-benefit analysis helps significantly before finalizing the model and reporting it to the top management.
相邻的图表显示了ROC。曲线上的两个轴是真实和误报率。正如预期的那样,该图表通知了该模型的预测水平。一个完美的模型将完美地隔离好的和坏的案件。因此,您将在开始时获得100%真实的正数(即绝对提升),如图中的绿色曲线所示。但是,生活中的任何事物都不存在完美。正如他们所说 - 如果真是太好了,那可能就是这样。另一个极端是一个毫无价值的模型,曲线标记为红色。任何靠近或低于红色曲线的东西都和投掷硬币一样好,那么为什么要费心去打造一个模型。最后,典型的记分卡ROC看起来像蓝色曲线。通常的信用评分模型的AUROC在70到85之间,越高越好。但是,对于某些欺诈和保险模式,略高于60的是可接受的ROC。同样,分析师应该在最终确定ROC之前确保记分卡的业务收益。在最终确定模型并将其报告给最高管理层之前,简单的成本效益分析可以显着提供帮助。
Sign-off Note
I hope after reading this, you will pick up your camera and visit that unexplored nook at the corner of the street – and be ready for some wonderful surprises!
References1. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring – Naeem Siddiqi 2. Credit Scoring for Risk Managers: The Handbook for Lenders – Elizabeth Mays and Niall Lynas