L3/1 Setting up a GPU instance on AWS(在AWS上利用GPU運算) - YouTube - 沉浸在深度學習-柏克萊大學開放式課程- P1(機率) - Cupoy
Dive into Deep LearningUC Berkeley, STAT 157Slides
影片原始連結:https://www.youtube.com/watch?v=50PMYG...
Dive into Deep LearningUC Berkeley, STAT 157Slides
影片原始連結:https://www.youtube.com/watch?v=50PMYG_l0Us&list=PLZSO_6-bSqHQHBCoGaObUljoXAyyqhpFW&index=14
我們要做的第一件事就是訪問 aws.amazon.com,
每個人都登錄之後,我們會實際啟動一個實例,
確保有啟動了正確的實例,
之後我們就可以開始深入地去跑網路訓練...等,
系統方面,選擇讓我們好應用深度學習並加快速度的Ubuntu版本...
字幕[此字幕為機器自動翻譯]
------
okay so the first thing we do is we go to aws.amazon.com and everybody logged
好的,所以我們要做的第一件事就是訪問 aws.amazon.com,每個人都登錄
in and let's actually launch an instance
,然後我們實際啟動一個實例,
so this time I've taken care making sure that I launched the right instance so
所以這次我注意確保我啟動了正確的實例,如此
deep running and I select let's say the deep learning pace army Ubuntu version
深入地運行,然後我選擇讓我們說深度學習速度軍隊 Ubuntu 版本
15 point oh now if I want to run things on a GPU I need to select an instance
15 點哦,現在如果我想在 GPU 上運行東西,我需要選擇一個
that actually has a GPU right so don't select something like a t2 instance
實際上有 GPU 的實例,所以不要選擇像 t2 實例這樣的東西,
because it does not have a GPU as a matter of fact your raspberry pi at home
因為它沒有 GPU事實上,你家裡
is probably faster than a t2 instance on average or well okay something like that
的樹莓派平均可能比 t2 實例快,或者類似的東西還可以,
so t2 nano is about the you know smallest you can get anyway so let's
所以 t2 nano 大約是你知道的最小的,所以讓我們
take a GPU instance and one that I'm going to pick is a p2 x-large okay now
拿一個 GPU 實例和一個我要去的實例 選擇是 p2 x-large 好的,現在
let's quickly illustrate the issue of the spot prices so
讓我們快速說明現貨價格的問題,所以
search for that and let's look for on-demand instances on demand pricing
搜索它,讓我們尋找按需定價的按需實例
and this is in Ohio and if you were to go somewhere else it'll cost more so
,這是在俄亥俄州,如果你要去其他地方,它會 成本更高,所以
these are GPU instances so P to 8x large has 40 as ATP use P to 16x large has 16
這些是 GPU 實例,所以 P 到 8 倍 large有 40 個,因為 ATP 使用 P 到 16x large 有 16 個
GPUs the one that we interested in here is a p2 x large that'll be probably good
GPU 我們在這裡感興趣的是 p2 x large 如果你有需要大量計算的項目,它可能
enough for most of what you are doing if you have projects that need a lot of
足以滿足你正在做的大部分
compute talk to us we can help you with that under no circumstances should you
事情 對我們來說,我們可以在任何情況下為您
have to pay all right so we will help you will get you credits but we're not
提供幫助,您無需支付所有費用,因此我們將幫助您獲得積分,但我們不會
gonna hand out blank checks to it everybody because at
向每個人發放空白支票,因為
least we want to know what you're gonna use it for right
至少我們想知道您的情況' 重新使用它是正確的,
so no Bitcoin mining on 8 August right I mean it's it's a bad it's a bad return
所以在 8 月 8 日沒有比特幣挖礦,對我的意思是,這很糟糕,現在無論如何都是一個糟糕的回報
right now anyway but still in any case so if you look at the P 2 X large that
,但無論如何,如果你看看
costs tonight around 90 cents an hour and let's just look at that so that's
今晚大約 90 美分每小時的 P 2 X large讓我們看一下,這
what we pick now let's look at spot instances so those spot instances are a
就是我們現在選擇的內容 讓我們看一下現場實例,這些現場實例
lot cheaper right there something between 37 and 30 cents an hour now you
要便宜得多,每小時 37 到 30 美分之間,現在你
might wonder you know why on earth would people pay different amounts for that
可能想知道為什麼人們會為此支付不同的金額
and where do those spot instances come from in the first place spot instances
那些現場實例在哪里以及在哪裡首先,現貨
are essentially surplus instances that currently nobody's really using at full
實例本質上是剩餘實例,目前沒有人真正以全價使用,
price because AWS tries very hard to make sure that when anybody comes and
因為 AWS 非常努力地確保當有人
says hey I want to have a machine you need to have excess capacity around so
來說嘿我想要一台機器時,你需要有多餘的容量,
therefore you know we have some spares and rather than having the spare sit
因此 你知道我們有一些備件,而不是讓備件閒置,
idly Amazon auctions them off essentially eBay style and you get quite
亞馬遜基本上以 eBay 的方式拍賣它們,你會得到
a bargain but there's a cache caches if at some point somebody else comes along
相當便宜的東西,但是如果在某個時候有人出現
and is willing to pay either more on the auction or full price while you're out
並願意在拍賣中支付更多費用,那麼會有一個緩存緩存或全價,而你出去
the good thing is you don't pay for that hour for that last hour where you got
的好處是你不用為你被踢出去的最後一小時支付那個小時,
kicked out so in any case we actually don't want to get kicked out so we're
所以無論如何我們實際上不想被踢出去,所以我們
gonna beat a reckless 50 cents don't make it a person that's still a
會打敗一個 魯莽的 50 美分不會使它成為一個
bargain relative to 90 cents that you'd be paying otherwise okay now next and
相對於 90 美分仍然便宜的
storage and here it's 50 gigabytes mind you when that machine goes down those 50
人
gigabytes are gone so let's add a new volume and I'm going to well okay let's
我走了,所以讓我們添加一個新卷,我會好起來的,我們
just create a new one for now so and I could snapshot this and so on but in any
現在就創建一個新卷,這樣我就可以快照這個等等,但
case let's do that so this is going to be persistent so I can reuse it think of
無論如何,讓我們這樣做,這樣這將是持久的,所以我可以 重用它 把
it like a USB key that you're attaching to it right going to add some tags no
它想像成一個 USB 密鑰,你要附加到它 正確 會添加一些標籤 沒有
security group fine insecure open to the world
安全組 罰款 不安全 對世界開放
okay this is telling me that it's gonna cost and so now I don't have the key
好吧,這告訴我它會花費,所以現在我沒有密鑰
pair on this machine so start 157 and I download this key pair
在這台機器上配對,所以啟動 157,然後我下載這個密鑰對
okay this is now going to request the spot instance okay his spot instance is
好的,現在將請求現場實例 好的,現在正在創建他的現場實例
now being created that's good we use spot requests and lo and behold this was
,這很好,我們使用現場請求,瞧,這
actually fulfilled great so now I can click on it and it will tell me that
實際上已經完成了,所以現在我可以了點擊它,它會告訴我
it's initializing and doing its thing in the meantime let's quickly go and do two
它正在初始化並同時做它的事情讓我們快速去做兩
things
just quickly create an SSH directory so this is the H key gen and then I would
件事,快速創建一個 SSH 目錄,所以這是 H 密鑰生成,然後我會
go and move this key into mr. sage
去把這個密鑰移動到 mr。 sage
and it may not agree with that simply because right now this is not actually
,它可能不同意這一點,因為現在這實際上並不
too big it should be fine we'll find out okay so here if we look
太大,應該沒問題,我們會發現好的,所以在這裡如果我們
at the instance we'll see that it's initializing so it's doing its thing and
查看實例,我們會看到它正在初始化,所以它正在做它的事情,
here's the public IP number
這裡是 公共 IP 號碼
okay so SSH - identity start fit 157 m12 at okay
可以,所以 SSH- 身份開始適合 157 m12 可以,
so since the first time I'm connecting to this machine it'll you know ask me
所以自從我第一次連接到這台機器時,你會知道問我,
but I really want to connect to this and
但我真的想連接到這個,
yes
是的
and basically it you know complaints that my SSH key is too insecure so it'll
,基本上你知道抱怨我的 SSH 密鑰太不安全了,所以它會
force me to fix this and so I'm going to use chmod - read a
迫使我解決這個問題,所以我將使用 chmod - 讀得
okay so
沒問題,所以
chmod 400
chmod 400
yep now only I can read it okay good if we associations in the machine
是的,現在只有我可以讀得好,如果我們現在在機器中進行關聯
now I mean so now at this point we can go and follow the install instructions
我的意思是 所以現在我們可以按照
that are posted for instance on the forum this is really just combining
論壇上發布的安裝說明進行操作,這實際上只是
things from a couple of slides now here's one important thing and this is
結合了幾張幻燈片中的內容,現在這是一件重要的事情,這
where a couple of people got stuck so this machine has different versions of
就是幾個人被卡住的地方,所以這台機器 安裝了不同版本的
CUDA installed user local you'll see that it has CUDA eight nine nine point
CUDA 使用 r local 你會看到它安裝了 CUDA 八九
to include attain installed and in particular
九點,特別是
it currently defaults to CUDA 9.0 right so that's not necessarily what we want
它目前默認為 CUDA 9.0,所以這不一定是我們想要的,
so we need to change it let's say to CUDA 10 so actually the instructions
所以我們需要更改它,比如說CUDA 10 所以實際上是說明
have could have 9.2 pointing to it but anyway let's do this so I just created
本來可以有 9.2 指向它,但無論如何讓我們這樣做,所以
some link if you've never seen sudo before that just means you're doing that
如果您之前從未見過 sudo,我只是創建了一些鏈接,這只是意味著您
as root okay fine so now we have CUDA working next thing is we need to get
以 root 身份進行操作,好吧,所以現在我們讓 CUDA工作,接下來我們需要 得到
Condor
Condor
okay let's execute it it will not ask me whether I accept the license terms yes
好吧,讓我們執行它它不會問我是否接受許可條款是的,
you can change all that with default flags but this does it automatically
你可以使用默認標誌更改所有內容,但這會自動
okay fine it's it's this all up yes okay now
好吧,這一切都好了是的,現在
the reason why I have to do this is because now it's updated my path and now
我必須這樣做的原因是因為現在它更新了我的路徑,現在
if I want to find out whether Khan dies in my path well there we go good so now
如果我想知道 Khan 是否在我的路徑中死去,我們做得很好,所以現在
we need to set up the appropriate environment and we're going to do this
我們需要設置適當的環境,我們將這樣做
and the only difference is that now we need to pick khuda one ten point also
,唯一的區別是現在 我們需要 pi ck khuda 一個 10 點也
see 100 so now if you look at mo file for glue on for the environment you'll
看到 100 所以現在如果你查看 mo 文件以了解環境的膠水,你會
see that it needs poison Jupiter matplotlib pandas and then khuda 100 so
發現它需要毒 Jupitermatplotlib pandas 然後 khuda 100 所以
if you're on a CPU version you do not want to install the CUDA version right
如果你使用的是你不想要的 CPU 版本
on the other hand if you need two GPUs then what we need to do that
另一方面,如果您需要兩個 GPU,則要安裝 CUDA 版本,那麼我們需要這樣做,
so in vidya SMI shows me that I have one GPU right now on this machine
所以在 vidya SMI 中向我顯示我現在在這台機器上有一個 GPU,
it's a ka T and it also shows that it's currently using the CUDA ten-pointer
它是 ka T,它還顯示它當前正在使用 CUDA 十指針
version okay everybody comfortable with that so far
版本 好的到目前為止每個人
okay good so let's do this
quanta in create and then source activate
so it now you know does this thing and installs quite happily as it would it
takes a little while
but it's doing that actually let's just open another shell you know reset it up
都對此感到滿意 實際上,讓我們打開另一個你知道的 shell
with port forwarding such that we can do this automatically so if you don't know
用端口轉發重置它,這樣我們就可以自動執行此操作,所以如果你不知道
how to do port forwarding let's go to d2l dot ai and in the appendix it
如何進行端口轉發,讓我們轉到d2l dot ai,在附錄中它
actually will give you fairly detailed instructions on how to use Jupiter
實際上會給你相當詳細的信息說明 關於如何使用 Jupiter 的 ns,
so what we'll have to do is we'll have to add this to my SSH command SSH - Oy
所以我們必須將其添加到我的 SSH 命令
okay
okay does anybody remember the IP number no 125 41 99 and now okay actually I'll
have to do something a little bit differently here the reason is that I
中在這裡做一些不同的事情,原因是我
already have a local version of Jupiter running right and that's re running on
已經有一個本地版本的 Jupiter正在運行,並且它在端口 8000 888 上重新運行,
port 8000 888 so if I set up port forwarding to the same port then I
所以如果我將端口轉發設置到同一個端口,那麼我
basically have two services going to the same port that doesn't work very well so
基本上有兩個服務去同一個 端口不太好,所以
let's just pick eight hundred nine eight thousand eight hundred ninety okay and
讓我們選擇 81988890
should be fine and I used the argument in the wrong order so let's log out and
try this again
and now things are fine
before that I had the error message let's just pull it up here channel setup
有錯誤消息讓我們把它拉到這里通道設置
forward listener tcp/ip cannot listen to port 8888 these days are a server
轉發監聽器 tcp/ip 不能監聽端口 8888 這些天是一個服務器
running anyway local tribute a server okay so we're almost done
運行無論如何本地致敬服務器好的所以我們幾乎完成了
please very have tune installed so actually I can just do this here okay so
請非常安裝調諧所以實際上我可以 在這裡做 好的,所以
activating go on and now trip it a notebook okay and of course it's
激活繼續,現在把它當作筆記本好了,當然它
supremely unhappy because there's no local machine there no local browser
非常不開心,因為那裡沒有本地機器,也沒有本地瀏覽器
there so now we need to take this line here copy it into the browser and make
,所以現在我們需要在這裡把這條線複製到瀏覽器中,然後讓它成為
one minor it it because we need to connect on that new port and lo and
一個次要的,因為我們 需要連接到那個新端口,瞧,
behold we're now connected to our machine in the cloud there are a couple
我們現在已經連接到我們在雲中的機器了,這有
of pitfalls to this right so remember we launched the spot instance so that's
幾個陷阱,所以請記住我們啟動了現場實例,這
what instance unfortunately well means that it's it could go down any time so
就是不幸的實例,這意味著它可以去 隨時關閉,所以
this is not a nice situation to be in but well fortunately we allocate as an
這不是一個好的情況,但幸運的是,我們分配了一個
extra disk right
額外的磁盤,
so we have this disc XV DB so if you were to type in free you would see that
所以我們有這個磁盤 XV DB,所以如果你要免費輸入,你會看到
XV da is our standard basically harddrive number one and harddrive
XV da 是我們的標準硬盤驅動器號 一個和硬盤驅動器
number P is XV DB and right now it's not set up yet so pseudo fdisk def X V DB
號 P 是 XV DB 現在它還沒有設置所以偽 fdisk def X V DB
right now there's nothing there so let's create a new partition so remember that
現在那裡什麼都沒有所以讓我們創建一個新分區所以記住那
was the 8 gigabytes partition new partition primary partition number 1
是 8 GB 分區新分區 pr imary 分區號 1
make it all defaults ok and yeah and it's already set up as a linux partition
使其全部默認為 ok,是的,它已經設置為 linux 分區,
so everything's good so I just write it back good now I need to create a yes
所以一切都很好,所以我把它寫回來,現在我需要創建一個 yes
that's the next line that I'm going to type in so right now it's just a
,這是我要輸入的下一行所以 現在它只是一個
partition that's tagged as a Linux partition the next thing I need to do is
被標記為 Linux分區的分區我需要做的下一件事是
I need to set up next for Dave X V DB 1 because there was the primary partition
我需要為 Dave X V DB 1 設置下一個,因為有主分區
1 and of course it's root because I'm formatting something now I need to do
1,當然它是 root,因為我現在正在格式化一些東西 我還需要做
one more thing I need to create a local directory
一件事我需要創建一個本地目錄
payload let's say and now I'm going to mount V DB one
有效負載假設現在我要安裝 V DB
and so now we have this thing mounted on home ubuntu payload this is our second
一個所以現在我們將這個東西安裝在家庭 ubuntu 有效負載上這是我們的第二
disc that disc will not die if the machine dies ok so now let's go to
張光盤不會死 如果機器沒問題,那麼現在讓我們去
payload and of course this fails the reason why it fails is because i mounted
有效載荷,當然這失敗了它失敗的原因是因為我
it as root so now the last thing that I need to do is CH not a plus RV X this is
以root身份安裝它所以現在我需要做的最後一件事是CH而不是加號RV X這是
insecure but fine disparity means anybody can write on it now it works
不安全但很好 差異意味著任何人都可以在上面寫字現在效果
good so now you can write on it this creates partition and so that way if you
很好 所以現在你可以在上面寫這會創建分區,所以如果你
have to do longer experiments and you want to make UAW as crates go a little
必須做更長的實驗,並且你想讓 UAW 因為板條箱的時間更長一點,
bit longer maybe a factor of two or three more do that now last thing about
可能會再增加兩到三個因素,現在關於成本的最後一件事
the costs and the pricing in the air and the bids you're not necessarily going to
以及空中定價和出價你不一定會
pay your maximum bid it's just that you're not never going to pay more than
支付最高出價只是你永遠不會支付
your maximum bid so it's just like eBay let's say you want to buy that new
超過最高出價所以就像 eBay假設你想購買那台新
laptop for up to a thousand dollars in the second highest bidder is $700 you're
筆記本電腦 最高一千美元的第二高出價者是 700 美元,您
going to pay you know $700 for this laptop so this is a second price auction
將支付 700 美元購買這台筆記本電腦,所以這是第二次價格拍賣
and you can prove that it's incentive compatible what happens on AWS with
,您可以證明它與 AWS 上發生的激勵兼容,
support prices a little bit more complicated is there multiple machines
支持價格有點 更複雜的是有多台機器
and so on and so the math works out a little bit more interestingly if you're
等等,所以如果你有
interested in that take an ec class like electronic commerce class computational
興趣參加電子商務課程計算
advertising and other things it's a lot of really nice stuff in there way beyond
廣告等電子商務課程,那麼數學計算會更有趣一些,這非常好 那裡的東西超出
the scope of this class okay and so with that we're going to do one thing namely
了這個類的範圍,好吧,所以我們要做一件事,就是
we're going to kill it all and then I hand over to move shut down this
我們要把它全部殺掉,然後我移交給移動關閉這個
notebook server okay fine logged out and now so I can always click
筆記本服務器,好吧,很好註銷,現在 所以我總是可以
on instances and I can see this one's running and actions instant state
點擊實例,我可以看到這個正在運行並且動作即時狀態
terminate terminate this instance now it's gone
終止這個實例現在它已經
okay so that was the very quick overview of how to set up a machine you can
好了,所以這是關於如何設置機器的非常快速的概述,你可以
actually script all of this such that you don't need to do this to the command
實際編寫所有這些腳本,這樣你 不需要每次都對命令行執行此操作,
line every time right so you basically build a script and then
因此您基本上構建了一個腳本,然後
you can actually launch spot instances from the command line execute that
您可以從命令行實際啟動現場實例,在那裡執行該
script there and then five minutes later after you've come back from making
腳本,然後在您自己製作回來後五分鐘後
yourself a cup of coffee after machine ready I strongly recommend that you do
機器準備好後喝杯咖啡我強烈建議你這樣做
that that's fairly straightforward scripting okay yes
腳本編寫相當
well yes there the only thing is you have to put that
into you know some executable file you'll probably need to put like you
簡單 需要像你
know use a pin bash or whatever in there you probably want to combine this with
知道的那樣使用 pin bash 或任何你可能想將它與 boto 結合起來
boto those scripts are not available but I'm fairly sure that there are tons of
的腳本,但我很確定如果你搜索 set,互聯網上有大量
such similar scripts available on the internet if you search for set up a spot
類似的腳本可用
instance on AWS script with boto well we can do that okay and so the first link
用 boto 在 AWS 腳本上創建一個現場實例,我們可以做到這一點,所以第一個鏈接裡面
has something in there okay this is a little bit more detailed there's a whole
有一些東西,好吧,這有點更詳細,還有
bunch of other interesting things because you can pick instances and so on
很多其他有趣的東西,因為你可以選擇實例等等,
but this is what this was the let me google that for you type of solution
但是這個 這就是讓我在谷歌上找到適合你的解決方案,
there's tons of stuff out there yeah use one that works for you okay good so with
那裡有很多東西,是的,使用一個對你有用的,好吧,
that I hand over to the Grand Master himself mu and he'll talk about
所以我把它交給大師本人 mu,他會談論
autograph and related things
親筆簽名和相關的 事物