L5/3 Loss Functions(損失函數) - YouTube - 沉浸在深度學習-柏克萊大學開放式課程- P1(機率)

Dive into Deep LearningUC Berkeley, STAT 157Slides 影片原始連結:https://www.youtube.com/watch?v=oqeZRCpG15Q&list=PLZSO_6-bSqHQHBCoGaObUljoXAyyqhpFW&index=22 現在我們要提到的是損失函數，你可以知道怎麼優化一個簡單的損失函數，範例正是我剛剛描述的那個函數，所以這只是一個高斯函數，藍色的函數是我所描述的損失函數，它是Y 減 y 的二分之一，綠色函數是 e 減去損失函數之後的函數，當然在這種情況下，你可以看到我的意思是我選擇 Y 為零，我所做的只是將這個綠色函數歸一化，收斂到在負五到五的範圍內積分為一，並忽略外面的一切，但沒有達到常態分佈... 字幕[此字幕為機器自動翻譯] ------ lost functions yeah so you know what can we optimize over well this is a very 丟失函數是的，所以你知道我們可以優化什麼這是一個非常 simple loss function all right that's exactly the one that I 簡單的損失函數，這正是我 just described so that's just a Gaussian and mind you these plots are all 剛剛描述的那個，所以這只是一個高斯函數，請注意，這些圖都是 generated directly in blue on so let me explain to you what you're seeing there 直接以藍色生成的，所以讓我解釋一下對你來說，你在那裡看到的， so so the blue function is the loss function that I describes it's one-half 所以藍色函數是我描述的損失函數，它是 Y minus y hat squared the green function is e to the minus that loss function and Y 減 y 的二分之一，綠色函數是 e 減去損失函數 of course in this case as you can see I mean I picked Y prime to be zero and all ，當然在這種情況下，你可以看到我的意思是我選擇 Y 素數為零， I did is because I'm lazy I just normalized this green function to 我所做的只是因為我很懶，我只是將這個綠色函數歸一化， integrate to one over the range from minus five to five and ignoring 以在負五到五的範圍內積分為一，並忽略 everything that's outside but short of that that's a normal distribution okay 外面的一切，但沒有達到正態分佈 the red line is the derivative of the blue line and that's automatically 好的，紅線是藍線的導數，它是自動 generated so as part of your homework you're going to use auto grad to 生成的，因此作為作業的一部分，您將使用 auto grad automatically do this okay well that sounds pretty boring here 自動完成此操作，好吧，這聽起來很無聊， right this mean hey we can all just write out y minus y prime but there are 這意味著嘿 w e 都可以只寫出 y 減去 y 素數，但是還有 other losses where this is actually really quite an advantage okay let's 其他損失，這實際上是一個非常大 take something like this let's just look at what happens with the optimization so what we saw before is that if I minimize the l2 loss I get exactly the mean out 的優勢 loss 我得到了正確的平均值 of it right and so I've just you know plotted with those red arrows the ，所以我剛剛知道， magnitude of the gradient if I had some observations and that will give me the 如果我有一些觀察結果，用那些紅色箭頭繪製梯度的大小，這會給我 mean of them okay so let's take the l1 loss and again I did exactly the same 它們的平均值，所以讓我們取 l1損失和我再次做了完全相同的 thing I took the absolute value function y 事情，我取絕對值函數 y minus y prime and I plotted this blue line then I exponentiated it and I get 減去 y 素數，然後繪製了這條藍線，然後對它取冪，我 the green line again normalized to you know integrate out to one on the 再次將綠線歸一化，你知道 interval between -5 and 5 here and then the orange line is the 在 -5 和 5 之間的區間內積分為 1橙色線是 derivative again automatically generated 再次自動生成的導數， now this loss function has a rather fun property right namely its gradients are 現在這個損失函數有一個相當有趣的屬性，即它的梯度 either minus 1 or 1 so if I end up optimizing by you know trying to find 是負 1 或 1，所以如果我最終通過優化你知道試圖 the point where the gradients balance out I need the same number of points the 找到 po int 梯度平衡的地方我需要相同數量的點 left into the right decisions call that the median if you had an odd number of 左側進入正確的決策調用中位數如果你有奇數 points I would you know pick that one exactly otherwise I can pick anything in 點我會你知道準確選擇一個否則我可以 the interval between two it's the greatest don't change that's the median 在兩個之間的間隔中選擇任何東西它是最大的不要改變那是中位數， okay so now let's pick something a little bit weird so this is called 好吧，所以現在讓我們選擇一些有點奇怪的東西，所以這被稱為 hubris robust loss and who was robust loss is weird insofar as it looks like 狂妄的穩健損失，誰是穩健的損失是奇怪的，因為它看起來像 the absolute value function on the branches so the blue line on the outside 分支上的絕對值函數，所以藍線在外面 is a straight line and on the inside it's just a parabola so what it is it's 是一條直線，在裡面它只是一條拋物線，所以它 just a parabola which is then extended continuously with a straight line and 只是一條拋物線，然後以直線連續延伸，如果你繪製， you need to squint really hard to figure out exactly where it crosses over now if 你需要瞇著眼睛很難弄清楚它現在穿過的確切位置 you plot the derivative so that's the orange curve you can see it very easily 導數，所以這是橙色曲線，您可以很容易地看到它，它 and again that's automatically generated with blue on this way I don't have to do 是用藍色自動生成的，這樣我就不必一起做 anything particularly fancy together and the green line again is the 任何特別花哨的事情，綠色線再次是 corresponding density so what is special about the robust loss well actually a 相應的密度，所以穩健損失的特別之處實際上 lot because this one ensures if you look at that that you essentially perform 很多，因為這可以確保如果你看到你基本上執行 trimming and you throw the largest and the smallest terms away and then you 修剪並且你扔掉最大和最小的項，然後你 compute the mean within that there's large the smallest terms the gradients 計算其中的平均值有很大的最小的術語梯度 cancel out anything in the middle well we have your standard you know calcium 抵消了中間的任何東西我們有你的標准你知道鈣 loss and everything just averages out there yes okay okay it's a very good 損失並且一切都只是平均在那裡是的好吧好吧這是一個很好的 question namely what's the relation to outliers now a trim mean estimator 問題即現在與異常值的關係是什麼修剪均值估計器 effectively performs robust estimation now the little trick that I didn't 有效地執行穩健現在估計我在這裡沒有提到的小技巧，我的 mention here that would be something I mean that the good stats class will 意思是好的統計類將 cover you don't necessarily pick the thresholds of one and minus one for what 涵蓋你不一定選擇一和負一的閾值 you cross over as a hard constraint you have them dynamically to the estimation 作為你擁有的硬約束動態地解決估計 problem that you're just trimming away the smallest and the largest terms and 問題，您只是修剪掉最小和最大的項， that effectively performs outlier removal except that when you're doing a 並且有效地執行離群值r 移除除了當你做 regression it doesn't exactly remove the out lab you're just bounding the 回歸時它並沒有完全移除你只是限制 influence and and with that the gradient but any single observation can have so 影響和梯度但是任何單個觀察都可以有 this way it's not that you're ignoring points that are at the extremes 這樣的方式並不是你忽略了點在極端情況下， you're just making sure that they don't they cannot push things too hard now 您只是確保他們不這樣做，他們現在不能把事情推得太緊 this is actually a very common technique that's being used in deep learning 這實際上是一種非常常見的技術，已用於深度學習 training it's called gradient clipping now 訓練它被稱為梯度裁剪現在 gradient clipping sounds infinitely cooler than who was robust loss but 梯度裁剪聽起來比魯棒的人要酷得多損失， that's really what it does right so I covered it here such that later on when 但這確實是正確的，所以我在這裡介紹了它，以便稍後當 we do things like gradient clipping you understand what's really going on there 我們執行漸變剪輯之類的操作時，您會了解實際發生的情況， are some reasons why you'll need to do that because if your gradients are too 有一些原因需要您這樣做，因為如果您的漸變 large and the optimization can diverge but it also simply means that you 太大並且優化可能會發散，但這也僅僅意味著您 shouldn't be giving individual observations too much weight so this 不應該對個別觀察給予太大的權重，因此這 ranges a lot of optimization procedures a lot more stable and that here's the 會使許多優化程序更加穩定，並且這是它的 statistical reason for it any other questions 統計原因任何其他問題 yes so so we just looked at different loss functions and that's because well 是的所以我們只是查看了不同的損失函數，這是因為 besides the least mean squares loss you might end up adding lots of different 除了最小均方損失之外，您最終可能會在優化問題中添加許多不同的 other loss functions to the optimization problem to perform different types of 其他損失函數來執行不同類型的 estimation and what we're doing here is we are just covering the really really 估計，我們在這裡所做的是我們只是首先覆蓋真正非常 simple losses first namely for regression because here I can draw nice 簡單的損失，即回歸，因為在這裡我可以畫出漂亮的 pictures once you go to you know structured multi-class losses and so on 圖片，一旦你知道結構化的多類損失等等， it's a lot harder to visualize what's going on the other reason is I wanted to 很難想像什麼是另一個原因是我想 cover the connection between gradient clipping and what you do otherwise 涵蓋梯度裁剪和你所做的事情之間的聯繫， namely that you do constrain the upper bound on you know the magnitude of the 即你確實限制了你知道損失的大小的上限， loss so if you have to know large Victoria loss you basically just make 所以如果你必須知道大的維多利亞損失，你基本上只需 sure that the two norm of that vector doesn't exceed a certain number and 確保該向量的兩個範數不超過某個數字，並且 since this is an undergrad class we're doing things the slightly quick and 由於這是本科生課程，我們正在做的事情有點快和 dirty way by giving you more intuition than necessarily all the math stairs the 骯髒通過給你更多的直覺而不是所有的數學樓梯 tension between how much we can cover and how deep we go and that's where we 我們可以覆蓋多少和我們走多深之間的緊張關係，這就是我們 give you the intuition but we can't really dive into all the details to 給你直覺的地方，但我們不能真正深入到所有細節 quite the extent that a graduate level class would do okay so this is pretty 的程度研究生水平的課程還可以，所以這 much all that we have for regression losses 幾乎就是我們用於回歸損失的全部內容