Professional Documents
Culture Documents
User Guide: Pre-Requisites
User Guide: Pre-Requisites
Pre-requisites:
Note
Currently these MIG Profiles are created.
III. Check the availability of particular NVIDIA Docker images you want to
use(administrator needs to update)
$ docker images | grep nvcr.io
$ docker images | grep boot
Note
For this BOOTCAMP: use the image: iit_bootcamp/pyt:21.08
use this image: iit_bootcamp/pyt:21.09
This step is followed/updated by the administrator.
If a particular docker image is not available, request your administrator to pull that image.
STEPS TO CREATE KUBERNETES DEPLOYMENT/JOB:
a. launch-job
i. To launch JOB to run python script.
b. launch-jupyter
i. To launch a Jupyter notebook instance for utilising a complete 40GB GPU
profile.
For a complete GPU profile: no MIG devices will be created as well as have only one
unique UUID.
c. launch-jupyter-mig
i. To launch a Jupyter notebook instance for utilising a given MIG profile.
d. launch-jupyter-mig-rapids
i. To launch a Jupyter notebook instance for utilising a given MIG profile for
RAPIDS container.
5. Get the url to access jupyter notebook
$ cat jupyter-login-<instance-name>.txt
Nvidia 40 GB 0 0
MIG 1g.5gb 1 1
MIG 2g.10gb 1 1
MIG 3g.20gb 0 0
MIG 4g.20gb 0 0
MIG 7g.40gb 0 0
Note
● Based on the given resource for the user and available GPU/MIG profile choose
it when launching your jupyter instance or jobs.
● If the error in the event section is: “0/1 nodes are available: 1 Insufficient
nvidia.com/mig-1g.5gb ”
● If the error in the event section is:
○ Error while pulling new docker image
■ From launch-jupyter script:
Error from server (BadRequest): container "test" in pod "test-58994b899c-6mgpt" is
waiting to start: ContainerCreating
Go to browser:
1. Go to link: XXX:XX:X:X:30731
Where XXX:XX:X:X → Server IP address
30731 → POD Port
DOS DON’TS
● Verify the docker image you want ● Launching the script for creating
to use for your project is present or the instance will not generate the
not. link for connecting the jupyter-lab
● If a certain docker image is not server.
present then ask the admin to pull ● If the link is not generated follow
those images. the 8th and 9th step for
● Also if the required image is not Troubleshooting.
available on NGC, check the
image in Docker Hub. Request
your admin with details of that
image.
● Have basic understanding of ● Having a basic understanding of
Docker images and containers, these concepts will help in
Kubernetes PODs. debugging the issues.
FAQs
4. When we exit from the ssh session, will the jupyter instance exist or
not?
● When a user exits from an ssh session, it will not break the instance. You should
keep in mind that jupyter should be used for quick prototyping. It’s not meant to
run large scale jobs.
● It's possible that you may not have access all the time. A lot of times, GPUs are
allocated to other users. You may not find a jupyter instance ready. Running a
batch job is more appropriate as it will launch the job, put it into queue and as
soon as resources are available, it’ll launch the job.
kubectl describe pod <pod_name> To check the details of kubernetes PODs or deployment.
kubectl logs <pod_name/job_name> To check the logs of the Kubernetes PODs or JOBs.
kubectl get quota To check the resources assigned for a particular user
account.