Data … as usual

All things about data by Laurent Leturgez

Monthly Archives: August 2018

Executing a SQL Statement on Oracle with Python

In the first blogpost of this series dedicated to Oracle and Python, I described how to connect a Python script to an Oracle Database (see. connecting python to an oracle database). In this second post, I will describe how to query an Oracle database and gets some results by using most popular Python libraries for this stuff: numpy and pandas.

So, first thing to do, you have to install pandas and numpy libraries in your work environnement (virtualenv, miniconda etc.). And of course, cx_Oracle has to be installed too 😉

In my case, libraries are deployed in my virtualenv hosted in PyCharm project:

In the program I write, in order to load packages and configure basics, I will use the header below:

from __future__ import print_function

import cx_Oracle
import os
import numpy as np
import pandas as pd


oh="D:/tools/Oracle/instantclient_12_2_x8664"
os.environ["ORACLE_HOME"]=oh
os.environ["PATH"]=oh+os.pathsep+os.environ["PATH"]
os.environ["NLS_LANG"]="AMERICAN_AMERICA.AL32UTF8"

First step, write a function to connect the database

So, first thing to write is a function that will return a cx_Oracle connection object (or reuse the function used in the first blogpost 😉 ).

This function will user parameters like username, password, datasource name (or url) and optionaly the mode used (SYSDBA for example).

def connectToOracle(url, username, password, mode=None):
    if mode is not None:
        connection = cx_Oracle.Connection (user=username, password=password, dsn=url, mode=mode)
    else:
        connection = cx_Oracle.Connection (user=username, password=password, dsn=url)
    return connection

 

Get a cursor and execute a statement

Once we have a cx_Oracle connection object, we can create a cursor by executing the cursor() function and then execute a statement.

To do this, I wrote a function with two parameters: the connection object and the statement text, and this returns the cursor that has been executed in the function:


def executeStmt(conn, stmt):
    if conn is not None and isinstance (conn, cx_Oracle.Connection):
        cur = conn.cursor()
        cur.execute(stmt)
    return cur

Once the cursor will be executed, we will be able to fetch one, many or all rows, and to describe it to get more metadata.

 

Describe executed cursor to get some metadata

By describing the cursor, we are able to get some information of its structure (column name and types, number precision and scale, nullable column etc.).

To do that, we use a read only attribute of the cursor (named description). In this attribute, there are 7 items that can be read.

These items are the ones below:

  • Item 1: Column names
  • Item 2: Column types (types are cx_Oracle data types)
  • Item 3: Column displayed sizes
  • Item 4: Column internal sizes
  • Item 5: Column precision
  • Item 6: Column scale
  • Item 7: Nullable column

This function code is the one below:


def describeCursor(cur):
    if cur is not None and isinstance (cur, cx_Oracle.Cursor):
        colnames = [row[0] for row in cur.description]
        coltypes = [row[1] for row in cur.description]
        coldisplay_sz = [row[2] for row in cur.description]
        colinternal_sz = [row[3] for row in cur.description]
        colprecision = [row[4] for row in cur.description]
        colscale = [row[5] for row in cur.description]
        colnullok = [row[6] for row in cur.description]
    print("Column names     : ",colnames)
    print("Column types     : ",coltypes)
    print("Display Size     : ",coldisplay_sz)
    print("Internal Size    : ",colinternal_sz)
    print("Column precision : ",colprecision)
    print("Column Scale     : ",colscale)
    print("Null OK?         : ",colnullok)

It can be used like this:


if __name__ == '__main__':
    c=cx_Oracle.Connection
    stmt="select name,value from v$parameter where name like 'log_archive%' and name not like 'log_archive_dest%' "
    try:
        c=connectToOracle("192.168.99.2:1521/orcl","sys","oracle",mode=cx_Oracle.SYSDBA)
        curs=executeStmt(c,stmt)
        describeCursor(curs)
...
        curs.close()
    except cx_Oracle.DatabaseError as ex:
        err, =ex.args
        print("Error code    = ",err.code)
        print("Error Message = ",err.message)
        os._exit(1)
    c.close()

And will produce this kind of result:


Column names     :  ['NAME', 'VALUE']
Column types     :  [<type 'cx_Oracle.STRING'>, <type 'cx_Oracle.STRING'>]
Display Size     :  [80, 4000]
Internal Size    :  [80, 4000]
Column precision :  [None, None]
Column Scale     :  [None, None]
Null OK?         :  [1, 1]

 

Get results of the cursor execution

Once the cursor executed, you can fetch your cursor row by row, you can fetch a set of n rows, or fetch all the rows.

Fetching the cursor

To do that, you have to use these function (from cursor)

  • fetchone() will return a list of one tuple
  • fetchmany(n) will return  a list of n tuples
  • fetchall() will return a list of n tuples (n is equal to the number of rows in the cursor)

for example, this piece of code:





<pre>if __name__ == '__main__':
    c=cx_Oracle.Connection
    stmt="select name,value from v$parameter where name like 'log_archive%' and name not like 'log_archive_dest%' "
    try:
        c=connectToOracle("192.168.99.2:1521/orcl","sys","oracle",mode=cx_Oracle.SYSDBA)
        curs=executeStmt(c,stmt)
        r=curs.fetchmany(2)
        print("R=",r)
        curs.close()
    except cx_Oracle.DatabaseError as ex:
        err, =ex.args
        print("Error code    = ",err.code)
        print("Error Message = ",err.message)
        os._exit(1)
    c.close()</pre>

will produce this result:

R= [('log_archive_start', 'FALSE'), ('log_archive_duplex_dest', None)]

 

Displaying the cursor results

Once the cursor has been executed and the rows fetched, you can use numpy or pandas to work on the data.

Numpy

Numpy is a library which is primarly used for scientific computing. So, it’s not the best if your resultset is made of string datatypes.

A better way to proceed is:

  1. fetch your cursor
  2. convert the result into a numpy array. And decribe the types of your array.

For example:


if __name__ == '__main__':
    c=cx_Oracle.Connection
    stmt="select SIZE_FOR_ESTIMATE,SIZE_FACTOR,ESTD_PHYSICAL_READS from v$db_cache_advice"
    try:
        c=connectToOracle("192.168.99.2:1521/orcl","sys","oracle",mode=cx_Oracle.SYSDBA)
        curs=executeStmt(c,stmt)
        # Fetch all rows from the cursor
        r=curs.fetchall()
        # convert the r variable into a numpy array and describe it
        n=np.array(r,dtype=[('SIZE_FOR_ESTIMATE','float64'),('SIZE_FACTOR','float64'),('ESTD_PHYSICAL_READS','float64')])
        print("ndim=",n.ndim)
        print("n=",n)
        curs.close()
    except cx_Oracle.DatabaseError as ex:
        err, =ex.args
        print("Error code    = ",err.code)
        print("Error Message = ",err.message)
        os._exit(1)
    c.close()

This piece of code will produce the result above:

n= [(  96., 0.0896, 9690.) ( 192., 0.1791, 9690.) ( 288., 0.2687, 9690.)
 ( 384., 0.3582, 9690.) ( 480., 0.4478, 9690.) ( 576., 0.5373, 9690.)
 ( 672., 0.6269, 9690.) ( 768., 0.7164, 9690.) ( 864., 0.806 , 9690.)
 ( 960., 0.8955, 9690.) (1056., 0.9851, 9690.) (1072., 1.    , 9690.)
 (1152., 1.0746, 9690.) (1248., 1.1642, 9690.) (1344., 1.2537, 9690.)
 (1440., 1.3433, 9690.) (1536., 1.4328, 9690.) (1632., 1.5224, 9690.)
 (1728., 1.6119, 9690.) (1824., 1.7015, 9690.) (1920., 1.791 , 9690.)]

Note: If you want to get more information about type conversion into numpy data type see: 

 

A better way to proceed, specially if you want to execute some joins between two datasets is to use Pandas.

Pandas

Pandas is a python library which provide easy to use data structures. It uses a central data structure named dataframe with which you can execute filters, joins etc.

When you decide to create pandas dataframes from data coming from Oracle (or another database, CSV file, JSON etc.), you have to use pandas function written specifically for this purpose. And in the case of an Oracle database, you have a function read_sql() that can be executed with a sql statement given in parameter. It will create a pandas dataframe and then you will be able to proceed this structure in your python script.

One advantage of pandas dataframes is in its data types because they are converted directly depending on the types returned by the cursor, and you don’y need to create a cursor, execute it and then fetch the rows … everything is automatic.

But let’s have a look to a piece of code (I will publish the complete script to show we don’t need explicit cursors etc.):


from __future__ import print_function

import cx_Oracle
import os
import pandas as pd


oh="D:/tools/Oracle/instantclient_12_2_x8664"
os.environ["ORACLE_HOME"]=oh
os.environ["PATH"]=oh+os.pathsep+os.environ["PATH"]
os.environ["NLS_LANG"]="AMERICAN_AMERICA.AL32UTF8"


def connectToOracle(url, username, password, mode=None):
    if mode is not None:
        connection = cx_Oracle.Connection (user=username, password=password, dsn=url, mode=mode)
    else:
        connection = cx_Oracle.Connection (user=username, password=password, dsn=url)
    return connection

# main
if __name__ == '__main__':
    c=cx_Oracle.Connection

    stmt="select SIZE_FOR_ESTIMATE,SIZE_FACTOR,ESTD_PHYSICAL_READS from v$db_cache_advice"
    try:
        c=connectToOracle("192.168.99.2:1521/orcl","sys","oracle",mode=cx_Oracle.SYSDBA)
        dataframe=pd.read_sql(stmt,con=c)
        print("------\nDF : \n ",dataframe)
        print("------\nDF Data types : \n",dataframe.dtypes)
        print("------\nDF Filtered   : \n",dataframe[dataframe['SIZE_FOR_ESTIMATE']>1440])
    except cx_Oracle.DatabaseError as ex:
        err, =ex.args
        print("Error code    = ",err.code)
        print("Error Message = ",err.message)
        os._exit(1)
    c.close()

We just have to instantiate a cx_Oracle connection, and then use pandas read_sql function to get a pandas dataframe and then process it.

In the previous basic example, I printed it, its datatypes and then run a filter which produces this kind of results:

------
DF : 
      SIZE_FOR_ESTIMATE  SIZE_FACTOR  ESTD_PHYSICAL_READS
0                  96       0.0896                 9766
1                 192       0.1791                 9766
2                 288       0.2687                 9766
3                 384       0.3582                 9766
4                 480       0.4478                 9766
5                 576       0.5373                 9766
6                 672       0.6269                 9766
7                 768       0.7164                 9766
8                 864       0.8060                 9766
9                 960       0.8955                 9766
10               1056       0.9851                 9766
11               1072       1.0000                 9766
12               1152       1.0746                 9766
13               1248       1.1642                 9766
14               1344       1.2537                 9766
15               1440       1.3433                 9766
16               1536       1.4328                 9766
17               1632       1.5224                 9766
18               1728       1.6119                 9766
19               1824       1.7015                 9766
20               1920       1.7910                 9766
------
DF Data types : 
 SIZE_FOR_ESTIMATE        int64
SIZE_FACTOR            float64
ESTD_PHYSICAL_READS      int64
dtype: object
------
DF Filtered   : 
     SIZE_FOR_ESTIMATE  SIZE_FACTOR  ESTD_PHYSICAL_READS
16               1536       1.4328                 9766
17               1632       1.5224                 9766
18               1728       1.6119                 9766
19               1824       1.7015                 9766
20               1920       1.7910                 9766

More information on pandas dataframes and their capacities is available at the following url: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html

In a next blogpost (probably shorter than this one), I will explain on how to use bind variables in statement.

Source code is fully available on my gitlab: https://gitlab.com/lolo115/python/tree/master/oracle

 

That’s it for today !!! 😉