Python and SAP: Part 2 - Getting data from SAP

So now you're connected to SAP (see part 1 here), but what do you do with it?

Well, as the name implies, PyRFC allows you to perform Remote Function Calls from Python. In theory, any valid Function Module in SAP should work. Use Transaction SE37 to view your function module and check the inputs. 

In my case, I wanted to read data from SAP. Therefore, the function module that interested me was RFC_READ_TABLE. In SE37, you can see the inputs for this function module (but I've done the work for you below).

Also, worth noting: You can do a search and see the general fear this solution produces. People get quite alarmist about anything that "queries" SAP. However, let's remember that there are serious controls that you can put in place (see this link). With a good basis team, calling functions and reading data from SAP can be locked down tight. It's clearly a concern, but completely manageable. 

On to the fun part! Let's create a little class for getting data out of SAP. We're going to use a class, rather than a script, so that we can build on it in the next lesson.

The connection class

First, we need to initialize our connection again. 

from pyrfc import Connection
import re

class main():
    def __init__(self):
        ASHOST='my.server.name'
        CLIENT='000'
        SYSNR='00'
        USER='mouser'
        PASSWD='mypassword'
        self.conn = Connection(ashost=ASHOST, sysnr=SYSNR, client=CLIENT, user=USER, passwd=PASSWD)

Ok - pretty simple When we initialize the class, we connect to SAP. You could pass the login parameters to the class - might even be a good idea.

Now we want to create a function to call RFC_READ_TABLE. To do this, we need to pass:

  • Fields: the fields we want to return
  • SQLTable: The table we want to query
  • Where: Any WHERE conditions
  • MaxRows: The maximum number of rows to return
  • FromRow: The starting row of the result set to use

Therefore, our query function should look like this:


    def qry(self, Fields, SQLTable, Where = '', MaxRows=50, FromRow=0):
        """A function to query SAP with RFC_READ_TABLE"""

        # By default, if you send a blank value for fields, you get all of them
        # Therefore, we add a select all option, to better mimic SQL.
        if Fields[0] == '*':
            Fields = ''
        else:
            Fields = [{'FIELDNAME':x} for x in Fields] # Notice the format

        # the WHERE part of the query is called "options"
        options = [{'TEXT': x} for x in Where] # again, notice the format

        # we set a maximum number of rows to return, because it's easy to do and
        # greatly speeds up testing queries.
        rowcount = MaxRows

        # Here is the call to SAP's RFC_READ_TABLE
        tables = self.conn.call("RFC_READ_TABLE", QUERY_TABLE=SQLTable, DELIMITER='|', FIELDS = Fields, \ 
                                OPTIONS=options, ROWCOUNT = MaxRows, ROWSKIPS=FromRow)

        # We split out fields and fields_name to hold the data and the column names
        fields = []
        fields_name = []

        data_fields = tables["DATA"] # pull the data part of the result set
        data_names = tables["FIELDS"] # pull the field name part of the result set

        headers = [x['FIELDNAME'] for x in data_names] # headers extraction
        long_fields = len(data_fields) # data extraction
        long_names = len(data_names) # full headers extraction if you want it

        # now parse the data fields into a list
        for line in range(0, long_fields):
            fields.append(data_fields[line]["WA"].strip())

        # for each line, split the list by the '|' separator
        fields = [x.strip().split('|') for x in fields ]

        # return the 2D list and the headers
        return fields, headers

Easy right? The comments should give you an idea what is happening in each step. The result from SAP is in text, so much of the function is just parsing the data.

So if you wanted to use your new class, you could do something like this:

# Init the class and connect
# I find this can be very slow to do... 
s = main() 

# Choose your fields and table
fields = ['MATNR', 'EAN11']
table = 'MEAN'
# you need to put a where condition in there... could be anything
where = ['MATNR <> 0']

# max number of rows to return
maxrows = 10

# starting row to return
fromrow = 0

# query SAP
results, headers = s.qry(fields, table, where, maxrows, fromrow)

print headers
print results

You can download the full class from here. I suggest you modify it and learn what each portion does before you do any testing on your own system. I've left out error checking and assertions for simplicity.

That's it for now. In the next segment, I'll show you how to write standard SQL and have it converted to the RFC format. In the meantime, ping me if you have questions in the comments.

-ab

 

 

Python: Running more than one process at once

Running many scripts at the same time may not come up on a daily basis and may not even be a good idea, but here is a great little code snippet. I had a few scripts that could run in parallel to save time without too much of a performance hit (the scripts grab data from external sources).

from subprocess import Popen
files = ['file1.py',
         'file2.py',
         'file3.py',
         'file4.py',
         'file5.py']

threads = []
for file in files:
    t = Popen(file, shell=True)
    threads.append(t)

[x.wait() for x in threads]

 

This code runs five files at the same time and waits for all five to finish before continuing... Lovely.

Enjoy !

Network Diagrams and Python Web Crawlers

Output 27k 600px 1000zoom invert I'm fascinated by networks and data visualization. I've always wanted to try my hand at making some of the inspiring images I see on blogs like flowing data. This network diagram is my first amateur attempt.

The code

I started by writing a rather simple web crawler in Python. The logic for the bot was:

1. Open a page

2. Create a list of all the links on that page (capture the total number of links)

3. For each link, create a new bot to follow the link and start the whole process again.

This was a great chance to use the Threading module in Python. I am not an expert in threading or multiprocessing. However, threading allowed me to create a new bot for each link I wanted to follow.

Here is the code for my spider class:

'''

Created on Jun 13, 2012

@author: Alex Baker

'''

#imports

import urllib2,BeautifulSoup,time

from threading import Thread

classspider1():

    def scan(self,url,mem, f):

        # Get the url

        usock = urllib2.urlopen(url)

        # Your current URL is now your "old" url and

        # all the new ones come from the page

        old_url = url

        # Read the data to a variable

        data = usock.read()

        usock.close()

        # Create a Beautiful Soup object to parse the contents

        soup = BeautifulSoup.BeautifulSoup(data)

        # Get the title

        title = soup.title.string

        # Get the total number of links

        count = len(soup.findAll('a'))

        # For each link, create a new bot and follow it.

        for link in soup.findAll('a'):

            try:

                # Cleaning up the url

                url = link.get('href').strip()

                # Avoid some types of link like # and javascript

                if url[:1] in ['#', '/','','?','j']:

                    continue

                # Also, avoid following the same link

                elif url == old_url:

                    continue

                else:

                    # Get the domain - not interested in other links

                    url_domain = url.split('/')[2]

                    # Build a domain link for our bot to follow

                    url = "http://%s/" % (url_domain)

                    # Make sure that you have not gone to this domain already

                    if self.check_mem(url, mem)==0:

                        try:

                            # Create your string to write to file

                            text = "%s,%s,%s\n" % (old_url, url, count)

                            # Write to your file object

                            f.write(text)

                            print text

                            # Add the domain to the "memory" to avoid it going forward

                            mem.append(url)

                            # Spawn a new bot to follow the link

                            spawn = spider1()

                            # Set it loose!

                            Thread(target=spawn.scan, args=(url, mem, f)).start()

                        except Exception, errtxt:

                            # For Threading errors print the error.

                            print errtxt

                        except:

                            # For any other type of error, give the url.

                            print 'error with url %s' % (url)

            except:

                # Just keep going - avoids allowing the thread to end in error.

                continue

    def check_mem(self, url,mem):

        # Quick function to check in the "member" if the domain has already been visited.

        try:

            mem.index(url)

            return 1

        except:

            return 0

As you can see, the code is simplistic - it only considers the domain/sub-domain rather than each individual link. Also, because it checks to make sure that no domain is used twice

To run the class, I used something like this:

mem = []

f = open('output.txt', 'w')

url = 'http://justanasterisk.com'# write the url here

s = spider1()

s.scan(url, mem, f)

Once started, it doesn't stop - so kill it after a while (or build that in). Running this on my MacBook, I recorded 27,000 links in about 10 minutes.

The data

The number of data points is small in comparison to some of the sets I've explored using BigQuery or Amazon SimpleDB. However, I wanted to make a visualization and I realized that the number of pixels would really define how many data point were useful. I figured that 10 minutes would give me the structure that I wanted. I used my blog justanasterisk.com as the starting point. I won't attach the data (you can create that yourself) but suffice to say that each line was:

source, destination, # of links on source page

The visualization

Here is where I was out of my element. I browsed a few different tools and the best (read: easiest) solution for my needs was Cytoscape. It is simple to use and has several presets included to make you feel like you've done some serious analysis. For the image above, I used one of the built in layouts (modified slightly) and a custom visual style.

NewImage

Screen Shot 2012 06 18 at 09 38 PM

Screen Shot 2012 06 18 at 09 39 PM

Screen Shot 2012 06 18 at 10 40 AM

I won't underwhelm you with further details, but shoot me an email if you want more. I'll probably add a few more images to this post when I get them rendered.

Best,

~ab

SMS Workflow Madness: Twilio to PHP to Python to Dropbox to Autohotkey to Conquer The World

Recently, I've been trying to trigger some python code using a text message. It has been a complicated little journey, so I thought I'd write it up for you. If you don't want to read through it all, the summary is - twilio to PHP to launch Python to put a file in Dropbox, autohotkey to monitor dropbox and run a python script. Away we go... First, Twilio is a great service if you want to develop anything with text messages. At first, I built a quick fix using If This Then That (which you should check out either way). However, I soon realized that the benefit of a text message is that it is nearly instant. IFTTT only checks tasks every 15 minutes and in a crunch, I would want a response back before then...

So I signed up for Twilio and created my application. The applications can be very complex, but for my purposes, I just needed a few lines of PHP to receive the text from the SMS and then use that information. Here is my test script:

<?php
header("content-type: text/xml");
echo "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n";
$body = $_REQUEST['Body'];
$from = $_REQUEST['From'];
$path = "/var/www/cgi-bin/addfile.py";
$command = "python ".$path." '$body'";
$command = escapeshellcmd($command);
exec($command,$result);
echo "<Response>
<Sms>Thanks for the message:".$body." your num:".$from." </Sms>
</Response>";
?>

There's a lot going on there, but here is the gist. The first two lines format the document as XML for Twilio to understand what should be done. No surprises here.

header("content-type: text/xml");
echo "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n";

The next part pulls the data from the text message into body and from and then passes these values to a python script I wrote to interact with dropbox.

$body = $_REQUEST['Body'];
$from = $_REQUEST['From'];
$path = "/var/www/cgi-bin/addfile.py";
$command = "python ".$path." '$body'";
$command = escapeshellcmd($command);
exec($command,$result);

The final part is the xml. This I pulled straight from the Twilio getting started guide

echo "<Response>
<Sms>Thanks for the message:".$body." your num:".$from." </Sms>
</Response>";

Ok so now we have a file for Twilio to interact with. Next we need to put some content in that python file. Before you try this out, you'll need to install the dropbox api libraries. I used the command

easy_install dropbox

but you might have to do that differently based on your operating system.

#!/usr/bin/python

# Include the Dropbox SDK libraries
from dropbox import client, rest, session
import sys

name = sys.argv[1]

# Get your app key and secret from the Dropbox developer website
APP_KEY = 'xxxxxxxxxxxxxxx'
APP_SECRET = 'xxxxxxxxxxxxxxx'

# Access type will be defined in your dropbox settings
ACCESS_TYPE = 'app_folder'
sess = session.DropboxSession(APP_KEY, APP_SECRET, ACCESS_TYPE)

# I removed this section after obtaining my access_token
# and access_token_secret, but you'll need to do it once.
# The return value will be a string that you can parse.
#request_token = sess.obtain_request_token()
#url = sess.build_authorize_url(request_token)
#print "url:", url
#print "Please visit this website and press the 'Allow' button.
#raw_input()

access_token = "xxxxxxxxxxxxxxx"
access_token_secret= "xxxxxxxxxxxxxxx"

sess.set_token(access_token, access_token_secret)

client = client.DropboxClient(sess)
print "linked account:", client.account_info()

#create the file if it doesn't exist
#f = open('file.txt', "w")
#f.close()

#open it for reading only...
f = open('file.txt')
# put the file to the app_folder in dropbox
response = client.put_file('/'+name+'.txt', f)
# this is the response passed back to PHP for debugging.
print "uploaded:", response

The file above is a bit of a mess but the idea is simple, take an argument as the command, authenticate with dropbox and put a file in dropbox with that name. I've tried a few different ways to do this a Dropbox PHP class or two... The python script turned out to be much easier for me - perhaps you have had better luck?

So now, with all that lovely code above, when I send a text message to my twilio account number, the php file takes the SMS message as a command and launches the python dropbox script, putting a file with that command name in my folder. The last part is an autohotkey script that I have to monitor the app_folder (it's actually sitting in the app folder for simplicity). Here is that file:

#persistent
setTimer check_file,1000
return

check_file:
IfExist, command.txt
{
 filemove command.txt, %A_ScriptDir%\processed\command%A_Now%.txt
 run myprogram.py command
}

This script checks my folder for a file called "command.txt" and then if it finds it, runs a script and moves the file to a processed folder with a time stamp. It's not perfect, as it requires a separate "look" for each command that you want to run, but it was perfect for my needs.

So that's my system. It's not pretty and it has a few more steps than I'd like for efficiency and safety, but it does work. Fast. In fact, a text message can trigger a program on my remote machine within 10 seconds. That is not bad...

Let me know if you've tried something similar or have suggestions on improvements. I'd love to hear it.

-ab