使用 binary_crossentropy與categorical_crossentropy的loss function區別

2019/07/10 上午 10:30

機器學習共學討論版

觀看數：1036

回答數：3

收藏數：0

keras

loss

binary_crossentropy

categorical_crossentropy

ml100-2-d71

ml100-2

在這次作業 compile 過程中用這兩個 loss 時，發現 binary_crossentropy 比 categorical_crossentropy 的 accuracy 好很多：

- 使用 categorical_crossentropy

- 使用 binary_crossentropy

查看這兩種 loss 的區別：

1. 以輸入格式來看似乎都是轉成OHE ，並無不同

### Keras binary_crossentropy vs categorical_crossentropy performance? https://stackoverflow.com/questions/42081257/keras-binary-crossentropy-vs-categorical-crossentropy-performance

I'm trying to train a CNN to categorize text by topic. When I use binary_crossentropy I get ~80% acc, with categorical_crossentrop I get ~50% acc. I don't understand why this is. It's a multiclass problem, does that mean I have to use categorical and the binary results are meaningless?

> If it is a multiclass problem, you have to use categorical_crossentropy. Also labels need to converted into the categorical format.

> ### See [to_categorical](https://keras.io/utils/) to do this. Also see definitions of categorical and binary crossentropies [here](http://deeplearning.net/software/theano/library/tensor/nnet/nnet.html#theano.tensor.nnet.nnet.binary_crossentropy)

# Consider an array of 5 labels out of a set of 3 classes {0, 1, 2}:

> labels

array([0, 2, 1, 2, 0])

# `to_categorical` converts this into a matrix with as many

# columns as there are classes. The number of rows

# stays the same.

> to_categorical(labels)

array([[ 1., 0., 0.],

[ 0., 0., 1.],

[ 0., 1., 0.],

[ 0., 0., 1.],

[ 1., 0., 0.]], dtype=float32)

2. 由這兩篇回應得知使用 activation function 不同

(1) https://www.cupoy.com/qa/kwassist/ai_tw/0000016A0CE5806C000000306375706F795F72656C656173655155455354

binary cross-entropy和categorical cross-entropy主要的差別是他們採用的輸出層採用的激活函數不同，前者是sigmoid後者是softmax。

binary cross-entropy 跟 sigmoid 也可以用在多分類的問題下，其概念就是把多分類當成很多個二元問題來處理。其最主要的差別是：用softmax解多分類問題，會預設一個資料只會屬於一種類別，所以其計算上會比較嚴謹，但如果使用sigmoid的話就不會有這樣的假設，計算上比較不嚴謹。

(2) https://stackoverflow.com/questions/42081257/keras-binary-crossentropy-vs-categorical-crossentropy-performance

I came across an "inverted" issue — I was getting good results with categorical_crossentropy (with 2 classes) and poor with binary_crossentropy. It seems that problem was with wrong activation function. The correct settings were:

- for binary_crossentropy: sigmoid activation, scalar target

- for categorical_crossentropy: softmax activation, one-hot encoded target

而 MSE 執行結果如下

--------------

請問：

1. 由上述回答感覺 categorical_crossentropy 的計算會更嚴謹，但在 cifar10 這樣的資料集（應為多元分類），為什麼使用 binary_crossentropy 比 categorical_crossentropy 的結果好這麼多？和 activation function、資料集的特性或是什麼因素有關呢？

2. 由MSE執行結果可以看到，validation loss值已經來到0.06左右，比其他兩者小很多，但accuracy卻還是很糟，

(1) 一般來說 loss值小，不是代表模型好嗎？

(2) 為什麼這個資料集應是分類問題，但使用MSE(回歸的評估方式)會得到這麼小的loss值呢？

謝謝！

回答列表

2019/07/11 下午 00:49

Jeffrey

贊同數：1

不贊同數：0

留言數：0
2019/07/11 下午 06:13

Jimmy

贊同數：2

不贊同數：0

留言數：0

Hi Js!

非常好的觀察！要成為一名好的資料科學家就是要透過實驗來觀察各種現象並思考問題出在哪裡。

第一個問題：你應該已經了解 binary_crossentropy 與 categorical_crossentropy 的差異，一個是當成 multi-label，另一個則是 multi-class，這兩個已經是完全不同的 task，手寫資料集當然是屬於 multi-class 的 task (一張圖只有一個答案)。因此，不太應該存在 multi-label 比較好的情況，我會建議你直接從 model 的 prediction 來看，然後使用 sklearn 的 accuracy score 來自己評估結果，才會了解到底哪裡出問題喔！

第二個問題：這是分類問題，所以 Groun truth 最大的數值是 1，Prediction 最小的數值是 0，一個樣本計算出來 Sqaure error 最大就是 1 左右，所以你永遠不會取得大於 1 的 Loss，而 crossentropy 的最大值是無限大。因此這兩個的 Loss 本來就不太能夠互相比較喔！不過 MSE 的確也是個可以使用的 Loss function，或許是訓練時間或是學習率不對，才會得到不好的結果。
2019/07/15 上午 10:12

張維元 (WeiYuan)

贊同數：1

不贊同數：0

留言數：0

1. 由上述回答感覺 categorical_crossentropy 的計算會更嚴謹，但在 cifar10 這樣的資料集（應為多元分類），為什麼使用 binary_crossentropy 比 categorical_crossentropy 的結果好這麼多？和 activation function、資料集的特性或是什麼因素有關呢？

=> 如果前面幾位專家講的這兩個分法的本質跟用途不一樣。這邊分享這一個比較直觀的想法，你覺得分兩堆（binary_crossentropy）跟分很多堆（categorical_crossentropy）哪個比較容易？