Data … as usual

All things about data by Laurent Leturgez

Category Archives: Development

creating a VM instance on the OCI with Python (prework)

1 Comment Posted by Laurent on May 6, 2019

In a previous post, I explained how to deal with python and the OCI (Oracle Cloud Infrastructure). In this previous post, I wrote a basic script to put and get some files stored into an object storage bucket.

But, with Python and OCI, you can also create some VM instances (and much more funny stuff).

In this article I will present the basics to do before creating a VM Instance (The VM instance creation process will be described in a next blog post).

Principles

As previously written, creating a resource in OCI with Python is almost always the same :

Configure your client environment (more to read here: https://laurent-leturgez.com/2019/03/22/dealing-with-oracle-cloud-infrastructure-and-python/)


$ cat ~/.oci/config
[DEFAULT]
user=ocid1.user.oc1..aaaaaaaamcel7xygkvhe....aaaaaaaaaaaaaaaaaaaaa
fingerprint=35:aa:aa:aa:aa:aa:aa:aa:aa:aa:aa:aa:aa:aa:aa:aa
key_file=~/.oci/oci_api_key.pem
tenancy=ocid1.tenancy.oc1..aaaaaaaahgagkf7xygkvhe....aaaaaaaaaaaaaaaaaaaaa
region=eu-frankfurt-1

Load it into your python program

import oci
config=oci.config.from_file()

Use the correct OCI Python SDK Client

In the Python SDK, you have various clients you will be able to use to deal with specific task. Each client is part of a service (Core, Compute, Database etc.)

For example:

- VirtualNetworkClient for every stuff related to network management (https://oracle-cloud-infrastructure-python-sdk.readthedocs.io/en/latest/api/core/client/oci.core.VirtualNetworkClient.html)
- DatabaseClient for managing all the things related to database management (https://oracle-cloud-infrastructure-python-sdk.readthedocs.io/en/latest/api/database/client/oci.database.DatabaseClient.html)
- etc…

In addition, in each service, you have some models that are used to configure your client or the resource you will create through the use this client. For example, when creating a subnet, we will use the VirtualNetworkClient object and specially the create_subnet function. But as parameters, we will give a model object for this subnet (this object will be configured with all the details of the subnet : CIDR, availability domain, tags etc.) … If it’s more understandable for you to read python code … we’ll see this is the next sections.

Ids Ids Ids … you will have to know all the required IDs (or OCID) for each container that will own the resource you will create.

The main OCIDs to get are:

- the compartment OCID where you will create your VCN
- The VCN OCID you will get once created

Of course, all the OCIDs required to configure your OCI client have to be known.. but already used in the first step.

Virtual Cloud Network (VCN) creation

To create a VCN, you will need various information. The main ones are :

The compartment OCID where your VCN will be created
A display name
and a CIDR block for the VCN

Once you got this, you can execute the VCN Creation (you can notice that we used a model to describe our VCN):

import oci
config=oci.config.from_file()
virtual_network_client = oci.core.VirtualNetworkClient(config)
vcn_name = 'PREMISEO-VCN'
my_compartment_id="ocid1.compartment.oc1..aaaaaaaawbbbbbbbbbbbbccccccFAKE_OCID_sa3p6q"
cidr_block='10.10.10.0/24'

result = virtual_network.create_vcn(
        oci.core.models.CreateVcnDetails(
            cidr_block=cidr_block,
            display_name=vcn_name,
            compartment_id=my_compartment_id
        )
    )

Once the command is executed, the process of creating the VCN is executed asynchronously.
So, if you want to be acknowledged when the process is terminated, it’s better to use the wait_until function from the oci package:

v_response = oci.wait_until(
        virtual_network,
        virtual_network.get_vcn(result.data.id),
        'lifecycle_state',
        'AVAILABLE'
    )
print('The VCN has been created with ID: ',v_response.data.id)

If you are curious, you can print the v_response.data associative array content and get all the details of your VCN.

Finally, note the returned OCID that identifies your VCN (You can get it from the OCI web console). It’s the only identifier for your VCN, if your rerun the same block code, you will create another VCN with the same properties (name, CIDR etc). The only difference will be the OCID.

Subnet Creation

Once you created a VCN, you can now create a subnet inside this VCN.

It’s the same principle as creating a VCN, the only difference is that you will need to give the VCN OCID and the compartment OCID, because these components host the subnet you will created.

Subnet creation is done through the VirtualNetworkClient object, like this:


config=oci.config.from_file()
virtual_network_client = oci.core.VirtualNetworkClient(config)
my_compartment_id="ocid1.compartment.oc1..aaaaaaaawbbbbbbbbbbbbccccccFAKE_OCID_sa3p6q"
my_vcn_id="ocid1.compartment.oc1..bbbbbbbbrccccccccccccddddddFAKE_OCID_bsu79z"
sub_cidr_block='10.10.10.0/24'
my_availability_domain='EUUz:EU-FRANKFURT-1-AD-1'
subnet_name='PREMISEO-VCN-subnet1'

result = virtual_network.create_subnet(
        oci.core.models.CreateSubnetDetails(
            compartment_id=my_compartment_id,
            availability_domain=my_availability_domain,
            display_name=subnet_name,
            vcn_id=my_vcn_id,
            cidr_block=sub_cidr_block
        )
    )
# Same thing here, we are waiting for the response property "lifecycle_state" to be set on "AVAILABLE". 
# This will keep us informed the resource is now available. 
s_response = oci.wait_until(
    virtual_network,
    virtual_network.get_subnet(result.data.id),
    'lifecycle_state',
    'AVAILABLE'
)
    print("The subnet has been created with ID: ", s_response.data.id)

Internet Gateway (IG) Creation

The last step to do before creating new VM instance is optional but needed when you want to connect your VM on the internet. It’s to create an Internet Gateway on your VCN and add a network rule to target this gateway.

Gateway creation

The IG creation follows the same process as previously used for VCN and subnet creation.

To create it, you will need some OCIDs like compartment OCID and VCN OCID, it’s done though the VirtualNetworkClient object, and you will a display name to configure it with the use of IntenetGateway model.


internet_gateway_name='premiseo_GW'
config=oci.config.from_file()
virtual_network_client = oci.core.VirtualNetworkClient(config)
my_compartment_id="ocid1.compartment.oc1..aaaaaaaawbbbbbbbbbbbbccccccFAKE_OCID_sa3p6q"
vcn_id="ocid1.compartment.oc1..bbbbbbbbrccccccccccccddddddFAKE_OCID_bsu79z"

result = virtual_network.create_internet_gateway(
        oci.core.models.CreateInternetGatewayDetails(
            display_name=internet_gateway_name,
            compartment_id=my_compartment_id,
            is_enabled=True,
            vcn_id=vcn.id
        )
    )
IG_response = oci.wait_until(
    virtual_network,
    virtual_network.get_internet_gateway(result.data.id),
    'lifecycle_state',
    'AVAILABLE'
)
print("Created internet gateway: ", IG_response.data.id)

Network rule creation and association with IG

Create a network rule is done through a model called RouteRule made of two properties :

the CIDR Block associated with this rule
the Object OCID associated with this rule (Here the Internet Gateway OCID)

After creating the RouteRule model, we need to get the route table and append the new rule to it, and then update the new route table.

This is done by the following code block:


internet_gateway_name='premiseo_GW'
config=oci.config.from_file()
virtual_network_client = oci.core.VirtualNetworkClient(config)
my_compartment_id="ocid1.compartment.oc1..aaaaaaaawbbbbbbbbbbbbccccccFAKE_OCID_sa3p6q"
vcn_id="ocid1.compartment.oc1..bbbbbbbbrccccccccccccddddddFAKE_OCID_bsu79z"

# #### Routing rules retrieving
# First, we get the VCN Object from the vcn OCID
vcn=virtual_network_client.get_vcn(vcn_id=vcn_id)
# Then the route table (as response Object)
route_table_resp= virtual_network_client.get_route_table(vcn.data.default_route_table_id)
# And Finally the route rules
route_rules = route_table_resp.data.route_rules

# Another way to proceed ... a one line command but a bit more complex to read 😉
route_rules = virtual_network_client.get_route_table(virtual_network_client.get_vcn(vcn_id=vcn_id).data.default_route_table_id).data.route_rules

# #### Now we append the new network rule to the previously route_rules object
route_rules.append(
        oci.core.models.RouteRule(
            cidr_block='0.0.0.0/0',
            network_entity_id=IG_response.data.id
        )
    )

# #### Finally we update the VCN with the new routing table
virtual_network.update_route_table(
        vcn.default_route_table_id,
        oci.core.models.UpdateRouteTableDetails(route_rules=route_rules)
    )

get_route_table_response = oci.wait_until(
    virtual_network,
    virtual_network.get_route_table(vcn.default_route_table_id),
    'lifecycle_state',
    'AVAILABLE'
)

Ok, now we have an environment ready to host our VM instance. In a next post, I will describe how to create a VM instance on the OCI by using Python SDK.

That’s it for today 🙂

Cloud, Development, OCI, Oracle, Python, tools

Playing with Python and Machine Learning in the Cloud … for Free

Leave a comment Posted by Laurent on April 16, 2019

If you are interested by the Python language and Machine learning programming (which is usually linked), you will probably think about configuring an environment on your laptop or instantiate an environment hosted by a Cloud provider, which is a bit expensive especially if you want doing some very basic tests.

If you want to try (and build) those very trendy neural networks, you will need a GPU to speed up your programs (and some related boring stuff like installing and configuring Cuda etc.). Same thing if you want to play with spark (and specifically with pyspark)

That can be a boring stuff to do. But do you know that it’s possible to quickly set up an environment in the cloud for free … yes for free. So let’s have a look to two solutions : Google colaboratory (named colab) and Kaggle.

But before we start, we need to know what a notebook is, because these platforms use python notebook as playground.

What is a notebook ?

A notebook is a file which embed code, markup language (HTML) and equations. Each notebook is divided by cells and each cell can be executed individually inside a kernel.

When the code is python, the file extension is usually ipynb. Please note, that notebooks can run other languages than python, each kernel run a specific language for example, Python or Scala.

If you want to know more about notebook, follow these links: https://jupyter.org/ or https://en.wikipedia.org/wiki/Notebook_interface.

Google Colaboratory (Colab notebooks)

Google colab is a free notebook environment hosted by Google. To access it, you only need a free google account. Once you created your notebook, you have the possibility to save it on a Google drive file (with ipynb extension) and, optionally, export it on github.

To create your first notebook, you have to go to https://colab.research.google.com/notebooks/welcome.ipynb and then click the “File” Menu and Select “new Python 3 notebook” (or New Python 2 Notebook, if you want to deal with python 2).

This will create a new folder in your google drive home root directory named “Colab Notebooks” with the file you created in it.

Once in your notebook, you can add some cells and write your first python lines.

But, what you have to know is that you are in a remote environment with packages installed (by default you have many python packages already installed), and once instantiated, you can even modify your kernel by installing new softwares etc.

For example, let’s say … we want to set up a pyspark environment. We first need to install pyspark with pip and then run a bunch of pyspark code to test everything is ok.

The notebook is available at this URL (I saved it on github): https://github.com/lolo115/python/blob/master/notebooks/google_colab/blog/pyspark1.ipynb, and the result is represented below:

You can even load files from your local disk to your runtime, and then run code on it. In the example given below (and integrated in the notebook linked above), I used the google API to do that:

Of course, this environment is for testing purpose only, you don’t have a lot of power behind but it’s useful if you want to start learning Python, or test a bunch of script without any local infrastructure and … it’s free.

Kaggle

The second platform to start playing python is more machine learning oriented. Indeed kaggle is a platform for data scientists who are allowed to share and find some data sets, build model, enter in datascience challenges etc.

Accessing to kaggle is free, you just have to subscribe at www.kaggle.com and then log in.

Once logged into the system, you have to go to www.kaggle.com/kernels and click on “New Kernel” and select your preferred style, and you will have access to a notebook with default packages loaded (like numpy and pandas) :

Basically, kaggle is not very different from Google Colaboratory … but kaggle is interesting because you can enable a GPU for your notebook.

To do that, you can go to the “settings” area (in the bottom right corner) and set “GPU” to “ON”.

This will enable a GPU (ok it’s not a farm 😉 ) but this can help you to work on small workload and why not on a small neural network.

Below, a small bunch of code that use tensorflow and gives you the information about GPU enablement.

Please note that you can easily upload your datasets (or use datasets provided by kaggle) by clicking on the “+ Add Dataset” Button

That’s it for today 😉

Cloud, Development, java, Machine Learning, Python, Spark

Colors identification for images stored in the Cloud with Python

Leave a comment Posted by Laurent on April 1, 2019

I recently worked on some Python code to detect which are the main colors in an image.

To do that, my images were stored in an Oracle Cloud Infrastructure block storage bucket.

The process had to be done in 3 steps:

I had first to extract them by using the “oci” python package.
Then I had to convert the unstructured binary image to a structured numpy array.
And finally, I used an unsupervised ML routine (KMeans Clustering) to analyze the numpy array and detect which were the main colors in this image.

Reading Images stored in an OCI block storage bucket

To read images, or more generally, files store in an OCI block storage bucket. You need to have configured your client environment to access the OCI.

To do that, you will need various OCIDs (user, tenant), some keys (private and public). I will not develop this part because I already did it in a previous post … see here !

Once your configuration is ok, you have to load it into your python script, get an ObjectStorageClient object from the configuration, and request the namespace data of your ObjectStorageClient.

After that, it becomes easy to read an object (file) inside a bucket referenced inside the namespace.

This is done by the following code


compartment_id = config["tenancy"]
object_storage = oci.object_storage.ObjectStorageClient(config)
namespace = object_storage.get_namespace().data

bucket_name="python-bucket"
object_name="union_jack.jpg"
my_object = object_storage.get_object(namespace,
                                      bucket_name,
                                      object_name)

print("type(my_object.data.content) = ",type(my_object.data.content))

As you can see, I printed the class type of the object content … and without any surprise, it’s a “bytes” class.

type(my_object.data.content) =  <class 'bytes'>

Note: If your images are stored by another cloud provider. They usually have a Python SDK in order to do the same things 😉

Converting an unstructured binary image to a numpy array

Once I did that, if I want to process my image I have to convert it in a usable data structure. And, with Python, the best data structure to process images is a numpy array, so I had to find a way to convert my binary soap (Bytes) to a structures numpy array.

As I don’t want to use a temporary file to do that stuff, I used a BytesIO object to process them directly in memory. At the end of the stream, I used a pillow Image (new name for the deprecated PIL package) from the BytesIO stream.

After that, a conversion to a numy array was possible. Please note that I had to convert a bit my numpy array structure. As you may know, an image file is represented in a multi-dimension array.

The first two dimensions represent the pixels of your Image. Added to that, you have 3rd dimension which encode for Red, Green and Blue values of each pixel. Sometimes a fourth value is added for what is called “Alpha” which is intended in transparency encoding. As I don’t know how were encoded Images, and as I don’t need to process the Alpha layer, I converted my 3 or 4 layers array into a 3 layers array (R,G and B encoding only).

The following code do the stuff:


from PIL import Image
from io import BytesIO

im=Image.open(BytesIO(my_object.data.content))
img=np.array(im)[:,:,:3]
print("img.shape=",img.shape)

This will produce the result below:

img.shape= (640, 1280, 3)

So my image is represented by a numpy array (ndarray). my image width is 640 pixels, height is 1280 pixels and each pixel is encoded by 3 values for Red, Green and Blue.

Using a clustering ML algorithm to detect colors

Next step, but not least. We have to choose a method to detect colors in the image.

First, I thought about getting the “average” color, but doing this is not a good way, because in the case of your image is equally colored by yellow, blue, red, and green … your average color will be a crappy brown which is not realistic.

The best way to get colors is to run a unsupervised machine learning algorithm (K-Means) to group all your colors into clusters based on R, G and B values. No matter the ML framework you will use to execute the KMeans, after execute your program you will get, the center point of each cluster which represent the color associated with the cluster and the differents labels for your clusters. Then you will be able to count the number of occurence of your label, and you will get the number of points inside your cluster.

It becomes easy to count the number of points in each color, this is for the most important thing in this algorithm. The other key point is how to structure your data as input for your KMeans.

This is simply resolved by flattening your image representation (in the numpy array). The array is flatten to a one-dimension list of triplets (reprensenting your RGB values).

In the following code, I used opencv (cv2 package) which is often used for image detection and capturing. This package is delivered with a kmeans algorithm that is optimized for image processing.


import cv2

# pixels is the 1D array, results of the img flattening process (made by reshape function)
pixels = np.float32(img.reshape(-1, 3))
print("Pixel shape = ", pixels.shape)

# Here is the number of colors we are trying to detect.
n_colors = 5

# Opencv kmeans parameters (See the following URL for more information: 
# https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/py_kmeans_opencv.html
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 200, .1)
flags = cv2.KMEANS_RANDOM_CENTERS

# palette represents clusters centers
# Labels represents the cluster labels.
#   As we have 5 colors, labels are 0,1,2,3,4 
_, labels, palette = cv2.kmeans(pixels, n_colors, None, criteria, 10, flags)
# And counts represents the number of occurence for each label
_, counts = np.unique(labels, return_counts=True)

# Our dominant color is the color that have the maximum number of occurence in the "counts" array
dominant = palette[np.argmax(counts)]
print("dominant color (RVB) =",dominant)

If you prefer to use tensorflow, the code below will do the stuff


import tensorflow as tf
# this is for removing all the tensorflow INFO and WARN messages
tf.logging.set_verbosity(tf.logging.ERROR)

# pixels is the 1D array, results of the img flattening process (made by reshape function)
pixels = np.float32(img.reshape(-1, 3))
print("Pixel shape = ", pixels.shape)

def input_fn():
    return tf.train.limit_epochs(tf.convert_to_tensor(pixels, dtype=tf.float32), num_epochs=1)

n_colors = 5

kmeans = tf.contrib.factorization.KMeansClustering(num_clusters=n_colors, 
                                                   use_mini_batch=False)

num_iterations = 20
for _ in range(num_iterations):
    kmeans.train(input_fn)
    print('Training ... score:', kmeans.score(input_fn))
    cluster_centers = kmeans.cluster_centers()

cluster_indices = list(kmeans.predict_cluster_index(input_fn))
counts=np.unique(cluster_indices, return_counts=True)[1]
palette=cluster_centers

dominant = palette[np.argmax(counts)]
print("dominant =",dominant)

Now we have our results, we are able to produce a nice plot with:

the initial picture,
the dominant colors gradient,
the main dominant color
the second dominant color (I did that because In the code I worked on, many pictures had a white background which was detected and the main color in 99% of the cases)

And to do that, I used the matplotlib library:


import matplotlib as mpl
%matplotlib notebook
from matplotlib import pyplot as plt

indices = np.argsort(counts)[::-1]  
freqs = np.cumsum(np.hstack([[0], counts[indices]/counts.sum()]))
rows = np.int_(img.shape[0]*freqs)

dom_patch = np.zeros(shape=img.shape, dtype=np.uint8)
main_patch=np.ones(shape=img.shape, dtype=np.uint8)*np.uint8(palette[indices[0]])
second_patch=np.ones(shape=img.shape, dtype=np.uint8)*np.uint8(palette[indices[1]])

for i in range(len(rows) - 1):
    dom_patch[rows[i]:rows[i + 1], :, :] += np.uint8(palette[indices[i]])

fig, (ax0, ax1, ax2, ax3 ) = plt.subplots(1, 4 , figsize=(9,6))
ax0.imshow(img)
ax0.set_title('Original')
ax0.axis('off')

ax1.imshow(dom_patch)
ax1.set_title('Dominant colors')
ax1.yaxis.set_major_locator(plt.NullLocator())
ax1.xaxis.set_major_locator(plt.NullLocator())

ax2.imshow(main_patch)
ax2.set_title('Main color')
ax2.yaxis.set_major_locator(plt.NullLocator())
ax2.xaxis.set_major_locator(plt.NullLocator())

ax3.imshow(second_patch)
ax3.set_title('Second color')
ax3.yaxis.set_major_locator(plt.NullLocator())
ax3.xaxis.set_major_locator(plt.NullLocator())
                                                                                                              
plt.show(fig)

Please note that, this code was running inside a jupyter notebook … so adapt the code if you want to run it in another context.

This will produce that kind of result :

Cloud, Development, OCI, Oracle, Python

Calling a PLSQL procedure or function from a Python script

Leave a comment Posted by Laurent on February 12, 2019

There’s a couple of days I didn’t write here, and specially in the Python for Oracle series.

In this blog post, I will write a bit on how calling a plsql procedure or function from a Python script using the cx_Oracle package.

There’s nothing very difficult to do that, you just have to consider 3 cases :

Calling a stored procedure with IN arguments
Calling a stored procedure with IN and OUT arguments
Calling a function and getting the result.

For all these cases, there are two common operations to do:

Connect to Oracle database
Create a cursor from this connection

Once you get the cursor … it depends from what you want to do:

Calling a stored procedure with IN arguments

In the case you have a simple call to do with only IN arguments, you have to call the callproc function from the cx_Oracle cursor object, and give an array of parameters … and that’s it.

In real, and in the case you call a stored procedure which have this declaration :


create or replace procedure t(a in number, b in varchar2)

.../...

it gives something like that:


c=cx_Oracle.Connection

try:
    c=connectToOracle("192.168.99.3:1521/orcl","laurent","laurent")
    curs=c.cursor();
    curs.callproc("T", [2,"parameter1"])
    curs.close()
except cx_Oracle.DatabaseError as ex:
    err, =ex.args
    print("Error code    = ",err.code)
    print("Error Message = ",err.message)
    os._exit(1)

connectToOracle is a function I coded previously and which is not detailed here, but its code is the one below:


def connectToOracle(url, username, password, mode=None):
    if mode is not None:
        connection = cx.Connection (user=username, password=password, dsn=url, mode=mode)
    else:
        connection = cx.Connection (user=username, password=password, dsn=url)
    return connection

Calling a stored procedure with IN and OUT parameters

In that case, it’s not really different from the previous example because you will give the same array as procedure parameter. The only difference is one (or many parameters) will be modified by the procedure execution.

But let’s see an example.

Consider this procedure definition (very simple ok 😉 )


create or replace procedure t(a in number, b out number) as
begin
  b := a*a;
end;
/

Calling this procedure in a python script will look like this


c=cx_Oracle.Connection
try:
    c=connectToOracle("192.168.99.3:1521/orcl","laurent","laurent")
    curs=c.cursor();

    r=curs.var(cx_Oracle.NUMBER)
    curs.callproc("T", [10,r])
    print("Result = ",r.getvalue(pos=0))

    curs.close()
except cx_Oracle.DatabaseError as ex:
    err, =ex.args
    print("Error code    = ",err.code)
    print("Error Message = ",err.message)
    os._exit(1)

You have remark that, you have to create a variable with the same cx_Oracle internal data type as you plsql out parameter, and then convert it into the python type with getvalue function. (Every function related to cx_Oracle variables are documented here: https://cx-oracle.readthedocs.io/en/latest/variable.html)

Calling a function and get the result into a python script

The principle is basically the same, once you get a connection, you create a cursor from it, and then you will call the “callfunc” function with these parameters:

Name of the plsql function
Result type, explained with cx_Oracle types (cx_Oracle.NUMBER, or other types declared here https://cx-oracle.readthedocs.io/en/latest/module.html#types)
plsql function parameter list

And this call will return the result of this function.

For example, if we consider a PLSQL function declared like this, and that return the salary of an employee:


CREATE OR REPLACE EDITIONABLE FUNCTION "LAURENT"."GET_SAL" (fname in varchar2, lname in varchar2) return number
is
 sal number;
 l_ln varchar2(128) := lower(lname);
 l_fn varchar2(128) := lower(fname);
begin
 select salary into sal from emp where lower(first_name) like '%'||l_fn||'%' and lower(last_name) like '%'||l_ln||'%';
 return sal;
exception
   when NO_DATA_FOUND then raise_application_error(-20000,'No employee with this last_name and first_name');
end;
/

It will be called like this in a python script:


c=cx_Oracle.Connection

try:
    c=connectToOracle("192.168.99.3:1521/orcl","laurent","laurent")
    curs=c.cursor();
    
    janette_king_sal = curs.callfunc("GET_SAL", cx_Oracle.NUMBER, ["Janette", "King"]);
    print("janette king sal", janette_king_sal)
    
curs.close()
except cx_Oracle.DatabaseError as ex:
    err, =ex.args
    print("Error code    = ",err.code)
    print("Error Message = ",err.message)
    os._exit(1)

As you can see, dealing with PLSQL and Python is not very difficult, and let you keeping the code close to your data.

That’s it for today 🙂

12c, Development, PL/SQL, Python

Executing a SQL Statement with bind variables on Oracle with Python

Leave a comment Posted by Laurent on September 21, 2018

In the previous blog post, I detailed how to execute sql statement on an Oracle database with Python. (See here for more details: https://laurent-leturgez.com/2018/08/30/executing-a-sql-statement-on-oracle-with-python/)

As you have seen, I used two kind of data structure to get the results: numpy array, and pandas array and for each, it’s possible to get the results in these structures and filter it directly in the Python script. The problem is that it will cause some issues regarding the dataset size in memory, network roundtrips etc.

So, as usual it’s always better to use Oracle capacity to filter data, joins them (in case of tables that resides in the Oracle database). And if you want to reduce parsing inside Oracle, you will probably want to use bind variables inside your SQL Statements … so how do we do this with python?

That will mostly depend on your result set data structure (numpy or pandas).

Just before we start the details, I consider that you are now able to get a Connection from an Oracle Database using cx_Oracle package (if not … see: https://laurent-leturgez.com/2018/07/31/connecting-python-to-an-oracle-database/).

Using numpy

For this purpose, we will have to declare an associative array of bind variables.

For example, the code below will declare, in an associative array “p” the bind variable named “param” and its value “log_archive_dest_1” :


p = {'param': "log_archive_dest_1"}

Of course, in your statement declaration, you have to name your bind variable the same as declared in your associative array (in our case, param). If you have many bind variables in the same statement, let’s say p1 and p2, you will have to declare your associative array as follows:

p = {'p1': "log_archive_dest_1", 'p2': "log_archive_dest_2"}

Once declared, you just have to execute the statement and associate the array as parameters:


from __future__ import print_function

import cx_Oracle
import os
import numpy as np
import pandas as pd


oh="D:/tools/Oracle/instantclient_12_2_x8664"
os.environ["ORACLE_HOME"]=oh
os.environ["PATH"]=oh+os.pathsep+os.environ["PATH"]
os.environ["NLS_LANG"]="AMERICAN_AMERICA.AL32UTF8"

.../...

def executeStmt(conn, stmt, parameters):
    if conn is not None and isinstance (conn, cx_Oracle.Connection):
        cur = conn.cursor()
        if parameters is None:
            cur.execute (stmt)
        else:
            cur.execute(stmt,parameters)
    return cur

.../...

if __name__ == '__main__':
    c=cx_Oracle.Connection
    stmt="select name,value from v$parameter where name = :param"
    try:
        c=connectToOracle("192.168.99.2:1521/orcl","sys","oracle",mode=cx_Oracle.SYSDBA)
        p = {'param': "log_archive_dest_1"}
        
        print (">>>>>>    NUMPY STYLE")
        curs=executeStmt(c,stmt,p)
        if curs.rowcount!=0:
            curs.scroll(value=0)
        r = curs.fetchall()
        print("type(r) = ",type(r))
        n = np.array (r)
        print("type(n) =",type(n))
        print("n=", n)

        curs.close()
    except cx_Oracle.DatabaseError as ex:
        err, =ex.args
        print("Error code    = ",err.code)
        print("Error Message = ",err.message)
        os._exit(1)
    c.close()

Please notice that I converted the list return by fetchall call to a numpy array. If you prefer to use list, you can work directly on it without converting it to a numpy array.

This piece of code will produce the results below:

>>>>>>    NUMPY STYLE
type(r) =  <type 'list'>
type(n) = <type 'numpy.ndarray'>
n= [['log_archive_dest_1' None]]

Using Pandas

Using pandas dataframe is a bit different and, as mentioned in the previous post, I prefer to use pandas dataframes when dealing with a rdbms (like Oracle). Why? because of its capabilities (filtering, joining etc.)

Ok, but now, how do we deal with SQL Statement and bind variable with python and pandas dataframes ?

The principle is nearly the same as sql statement without bind variables.

You have to call read_sql function, and pass the same associative array as parameters … and that’s it 🙂 … See below


from __future__ import print_function

import cx_Oracle
import os
import numpy as np
import pandas as pd


oh="D:/tools/Oracle/instantclient_12_2_x8664"
os.environ["ORACLE_HOME"]=oh
os.environ["PATH"]=oh+os.pathsep+os.environ["PATH"]
os.environ["NLS_LANG"]="AMERICAN_AMERICA.AL32UTF8"

.../...

if __name__ == '__main__':
    c=cx_Oracle.Connection
    stmt="select name,value from v$parameter where name = :param"
    try:
        c=connectToOracle("192.168.99.2:1521/orcl","sys","oracle",mode=cx_Oracle.SYSDBA)
        p = {'param': "log_archive_dest_1"}

        print(">>>>>>    PANDAS STYLE")
        dataframe=pd.read_sql(stmt,con=c,params=p)
        print("type(dataframe)",type(dataframe))
        print(dataframe)

        # Panda conversion to numpy is always possible.
        # n=dataframe.values
        curs.close()
    except cx_Oracle.DatabaseError as ex:
        err, =ex.args
        print("Error code    = ",err.code)
        print("Error Message = ",err.message)
        os._exit(1)
    c.close()

That will produce the same kind of results:

>>>>>>    PANDAS STYLE
type(dataframe) <class 'pandas.core.frame.DataFrame'>
                 NAME VALUE
0  log_archive_dest_1  None

On the database side

On the database side, you will find this kind of statement on the shared pool:

SQL> select sql_id,sql_fulltext,executions
  2  from v$sql
  3  where sql_text like 'select name,value from v%';

SQL_ID        SQL_FULLTEXT                                                                     EXECUTIONS
------------- -------------------------------------------------------------------------------- ----------
4zqq6y8cdbqmu select name,value from v$parameter where name = :param                                   17

You can see the bind variable name can be found in the statement and not an anonymous :1 or :B1 😉

In a next blogpost, I will show how to call a PLSQL stored procedure / function / package from Python.

Source code is available on my gitlab: https://gitlab.com/lolo115/python/tree/master/oracle