Data … as usual

All things about data by Laurent Leturgez

Category Archives: Machine Learning

Playing with Python and Machine Learning in the Cloud … for Free

If you are interested by the Python language and Machine learning programming (which is usually linked), you will probably think about configuring an environment on your laptop or instantiate an environment hosted by a Cloud provider, which is a bit expensive especially if you want doing some very basic tests.

If you want to try (and build) those very trendy neural networks, you will need a GPU to speed up your programs (and some related boring stuff like installing and configuring Cuda etc.). Same thing if you want to play with spark (and specifically with pyspark)

That can be a boring stuff to do. But do you know that it’s possible to quickly set up an environment in the cloud for free … yes for free. So let’s have a look to two solutions : Google colaboratory (named colab) and Kaggle.

But before we start, we need to know what a notebook is, because these platforms use python notebook as playground.

What is a notebook ?

A notebook is a file which embed code, markup language (HTML) and equations. Each notebook is divided by cells and each cell can be executed individually inside a kernel.

When the code is python, the file extension is usually ipynb. Please note, that notebooks can run other languages than python, each kernel run a specific language for example, Python or Scala.

If you want to know more about notebook, follow these links: https://jupyter.org/ or https://en.wikipedia.org/wiki/Notebook_interface.

 

Google Colaboratory (Colab notebooks)

Google colab is a free notebook environment hosted by Google. To access it, you only need a free google account. Once you created your notebook, you have the possibility to save it on a Google drive file (with ipynb extension) and, optionally, export it on github.

To create your first notebook, you have to go to https://colab.research.google.com/notebooks/welcome.ipynb and then click the “File” Menu and Select “new Python 3 notebook” (or New Python 2 Notebook, if you want to deal with python 2).

Colaboratory Notebook 1

This will create a new folder in your google drive home root directory named “Colab Notebooks” with the file you created in it.

 

Once in your notebook, you can add some cells and write your first python lines.

But, what you have to know is that you are in a remote environment with packages installed (by default you have many python packages already installed), and once instantiated, you can even modify your kernel by installing new softwares etc.

For example, let’s say … we want to set up a pyspark environment. We first need to install pyspark with pip and then run a bunch of pyspark code to test everything is ok.

The notebook is available at this URL (I saved it on github): https://github.com/lolo115/python/blob/master/notebooks/google_colab/blog/pyspark1.ipynb, and the result is represented below:

 

Colaboratory Notebook 2

You can even load files from your local disk to your runtime, and then run code on it. In the example given below (and integrated in the notebook linked above), I used the google API to do that:

Colaboratory notebook 3

Of course, this environment is for testing purpose only, you don’t have a lot of power behind but it’s useful if you want to start learning Python, or test a bunch of script without any local infrastructure and … it’s free.

Kaggle

The second platform to start playing python is more machine learning oriented. Indeed kaggle is a platform for data scientists who are allowed to share and find some data sets, build model, enter in datascience challenges etc.

Accessing to kaggle is free, you just have to subscribe at www.kaggle.com and then log in.

Once logged into the system, you have to go to www.kaggle.com/kernels and click on “New Kernel” and select your preferred style, and you will have access to a notebook with default packages loaded (like numpy and pandas) :

Kaggle

Basically, kaggle is not very different from Google Colaboratory … but kaggle is interesting because you can enable a GPU for your notebook.

To do that, you can go to the “settings” area (in the bottom right corner) and set “GPU” to “ON”.

Kaggle

This will enable a GPU (ok it’s not a farm 😉 ) but this can help you to work on small workload and why not on a small neural network.

Below, a small bunch of code that use tensorflow and gives you the information about GPU enablement.

 

Please note that you can easily upload your datasets (or use datasets provided by kaggle) by clicking on the “+ Add Dataset” Button

That’s it for today 😉

 

Advertisements