Mostly Linux & Python syntax notes and hyperlinks.

Thursday, December 4, 2014

Windows Batch Script: substring to generate output name from input name without extension

If you want your batch script to create an output file name from an input file name by removing its 3-character extension, then you can use the batch substring method.
The substring syntax :~ is inserted between the opening and closing % signs that surround your variable.

Thus, instead of %complete_variable_name%,you have
%complete_variable_name:~[chars-to-skip],[chars-to-collect]%
  •  if [chars-to-skip] is negative, then it starts from the end of the string
  •  if [chars-to-collect] is negative, then it ends that many characters from the end of the string
e.g.
C:> set filename=abcde.txt 
C:> echo %filename% 
abcde.txt 
C:> echo %filename:~0,-3% 
abcde. 
C:> echo %filename:~-3,3% 
txt 
C:> set outname=%filename:~0,-4%.output 
C:> echo %outname% 
abcde.output

Windows Batch Script: Collecting Y/N inside FOR loop via subroutine

In Windows scripting, any variable such as %yn% is expanded before a FOR loop is activated.

To collect data inside a FOR loop you need to do two things:
  1. Use ! ! around your variable instead of % %
  2. At the top of the file: setlocal enabledelayedexpansion 
REM next line needed for !yn! to work: 
setlocal enabledelayedexpansion  

@echo off 
FOR %%G IN (a,b) DO (
    echo how about %%G? 
    call :YorN
    if !yn!==Y ( 
        echo yes
        call :eab %%G 
    ) else (echo no) 

pause 
goto:EOF 

:YorN
    echo Y or N
    set /P yn=Y/N:
    IF /I %yn%==y(
        set yn=Y
    ) ELSE (
        set yn=N
    )
    echo YorN %yn%
    goto:EOF
:EOF 

References:

http://stackoverflow.com/questions/2514476/the-value-returned-from-a-function-in-a-windows-batch-script-is-ignored-in-a-for
http://stackoverflow.com/questions/12021033/how-do-i-ask-a-for-user-input-and-receive-user-input-in-a-bat-and-use-it-to-run
http://ss64.com/nt/if.html

Thursday, November 6, 2014

sql: notes on CJ Date SQL & Relational Algebra I: The original operators, part 1 (safari video)

SQL is a language which is a user interface for a DBMS. It is not a DBMS itself.

Relational Closure means that the output of one operator can be input to the next operator.
You can have nested algebraic expressions.

JOIN reads relation values and outputs another relation value

Distinguish "relational operators" from "relational algebra operators"
relational operators are all the SQL operators like update, insert, delete.

relational algebra operators have closure and are read-only.
Thus JOIN, SELECT, UNION are relational algebra operators but INSERT, UPDATE, DELETE are not.

Any operation on a relation that does not produce a relation is not a relational operation, by definition, since it would violate closure.
Avoid operations that violate closure, except for the relational inclusion operation, which returns T/F.

A relation has 2 parts: Header + Body
If you know the headers of 2 relations then you can infer the header of the result of their JOIN.

Join is done on attributes of the same name (or correlation name or renamed name)

Relations are JOINABLE iff their attributes of the same name are of the same type
Relations are JOINABLE iff the set theory union of their headings is a legal heading

in SQL: P join S  !=  S join P because the result of the 2 join operations have different column orders. This is not good.

Intersection is a special case of join where the two input relations have the same heading.

The zero tuple is a tuple that contains no components.
The Cartesian Product is a special kind of join.

Table-Dee is a table (relation) with no attributes and one tuple, the zero tuple.

Table-Dee is an identity w.r.t. the Cartesian product.

0 + x = x + 0 = x means that 0 is an identity w.r.t. addition
1 * x = x * 1 = x means that 1 is an identity w.r.t. multiplication.

r * table-dee = table-dee * r = r

join{r,table-dee} = r
join{} = table-dee

t1 JOIN t2 using (C1...CN) -> resulting table is column ordered with the common columns (C1..CN) first, followed by the other columns of t1 followed by the other columns of t2

recommend:
  • Columns of same name should be of the same type.
  • If you do this, then you can and should use "natural join"
    natural join means join on columns that have the same name

  • Never write code that relies on left to right ordering.
  • Use corresponding (it's part of the standard) if your product supports it.
  • Make sure corresponding columns have the same name and type.
  • Don't use the 'BY' option
  • Never specify 'ALL'.
    (ALL was initially added as an option to UNION as a performance tweak to signal that there were no duplicates to search for & eliminate. It was never supposed to produce duplicates.)


Ajax: Intro notes

 Summary of Wikipedia page

from http://en.wikipedia.org/wiki/Ajax_(programming) :

Asynchronous Javascript And XML

except doesn't need XML: JSON often used instead
Also, doesn't have to be Asynchronous

Group of Web development techniques/technologies.

Javascript accesses DOM to allow user to dynamically interact with information displayed.

  • Data exchanged asynchronously between browser and server via JavaScript and XMLHttpRequest object.
  • Avoid full page reloads.
Technologies used:
  • presentation uses [HTML or XHTML] + CSS
  • Dynamic interactive data uses DOM
  • Interchange of data via XML
  • Manipulation of data vi XSLT
  • Asynchronous communication via XMLHttpRequest object
  • JavaScript to tie it all together
Drawbacks
  • Dynamically updated web pages difficult to bookmark & save in history
  • Web crawlers usually don't execute Javascript so need separate way to get into search engine indices.
  • Asynchronous callback-style programming can be complex, hard to test & debug. 

Wednesday, November 5, 2014

sql: scattered notes on start of C.J.Date "SQL & Relational Theory"


TYPES = set of things we can talk about (like NOUNS)
RELATIONS = true statements about the TYPES (like SENTENCES)

TYPES and RELATIONS are sufficient and necessary to represent all DATA

Information Principle = The entire information content of the database is represented in only one way. Relations are the only way to represent information. There is no documented meaning in a duplicated row.

Use of null violates the Information Principle.

It is a logical flaw to pretend that a TYPE is a certain type of RELATION. (Some Object oriented products do this & thus fail.)

A database with its operators is a Logical System like Euclidean Geometry.
  • Base relations correspond to Axioms
  • Rules of Inference derive new Truths
  • A Query is equivalent to getting the system to prove a Theory.

Optimizers rephrase queries, that is they perform expression transformation.

variable == can be updated

Assignment Principle: After you assign a value v to a variable V, then v==V is True
  • All operations are at the level of a set
  • Check integrity only after applying the set of updates.
A key is a set of attributes (often a set of 1 attribute), that is, a tuple.

A key must be unique and irreducible.
That is, if you say the key is the combination of [K,L], but [K] by itself is also unique, then [K,L] is reducible, so the key is only [K]. Though [K,L] is a superkey of [K], as is [K] itself.

There can be more than one "candidate key".

There really is no logical reason why you must always choose a "primary key" from among a set of valid "candidate keys".

Entity Integrity Rule: A primary key value can not be null.

A "foreign key" is one that references another table/relation.
Referential Integrity Rule: Every foreign key value must exist in the foreign table. You can't have a foreign key that is not matched.

Use of NULL/UNKNOWN means you need 3 value logic. The 3 values are T, F, Unknown.

T & T = T          T | T = T
T & F = F          T | F = T
T & U = U          T | U = T
F & F = F          F | F = F
F & U = F          F | U = U
U & U = U          U | U = U

not T = F
not F = T
not U = U

Closed World Assumption: Everything stated or implied by the DB is true. Everything else is False.

Open World Assumption: Everything else is UNKNOWN. (This leads to nulls & 3value logic & trouble)

Using Closed World Assumption.

predicate = A function that returns True or False when invoked.
headings correspond to predicates
The relation is a set of tuples that are instantiations of 'true propositions'

tortoise svn: drag & drop with right mouse button to move a file from one directory to another

How to do this wasn't obvious so I'd been doing these file moves from within the Repo Browser.

This morning I finally Googled it to see how it could/should be done, and it was so EASY! Just select the file to move from one directory with the RIGHT mouse button and then when you drag & drop it to a new directory, you can get the option to move & rename it.

http://tortoisesvn.net/mostforgottenfeature.html
http://tortoisesvn.net/docs/release/TortoiseSVN_en/tsvn-dug-copy.html

Note that when you next do a SVN Commit, you need to Commit from the parent directory of the two directories that have changed--the parent of both the source and the destination directory from/to which you moved the file. Otherwise there will be a complaint that you need to do both commits together.

Sunday, August 17, 2014

Tuesday, August 12, 2014

python : write a message with a dashed line of the same length above &/or below

testdash.py
def writeWithDashedLine(space_before,msg,before=False,after=True,dash='-'):
    string = ""
    dashes = dash * len(msg)
    if before:
        string += "%s%s\n" % (space_before,dashes)
    string += "%s%s\n" % (space_before,msg)
    if after:
        string += "%s%s\n" % (space_before,dashes)
    return string
mystring = writeWithDashedLine('     ','Hi this is a test')
print mystring

mystring = writeWithDashedLine('   ','Here is another',True,True,'=')
print mystring

output:
$ python testdash.py
     Hi this is a test
     -----------------

   ===============
   Here is another
   ===============

Sunday, August 10, 2014

python code moved to github

Um, OK, sensible thing is to move the code over to a website already set up to store versions of code.

The python for my PyGotham talk "Building flexible tools to store sums and report on CSV data" is now in:  https://github.com/pargery/csv_utilities


python: Newer version of ReportWithLevels class

"""
ReportWithLevels
__author__ = 'Margery Harrison'
__license__ = "Public Domain"
__version__ = "1.1"

To use this:
1) create an object of this class
2) if you don't want to report to stdout, then call open_outfile()
   --> pass to open_outfile the path to the output file
3) Call write_line() to write header lines or any other explanations
4) For each set of messages and sums,
    call print_level(level,message,sum)
    level indicates amount to indent
    message is the part that explains the sum
    sum is right justified as a numeric field.
"""

import sys

class ReportWithLevels():

    def __init__(self):
        """
        Sets the default values for the class
        """
        self.fdout=sys.stdout  #default - writes report to stdout
        self.number_width=10
        self.level_indent=2
        self.total_width=30
        self.debug = False
        self.min_level = 1
        self.max_level = 4

        #levels before which to print a newline
        self.newline_before = [1]

    # open the input path as file to write to
    def open_outfile(self,path):
        try:
            self.fdout = open(path,'w')
        except IOError:
            msg="{0:s} Can't open and write to {1:s}".format(self.__class__.__name__,path)
            sys.stderr.write(msg)

    # print debug statement if debugging turned on
    def debug_print(self,message):
        if self.debug:
            print(message)


    def write_indent(self,message,level):
        if level > 0:
            indent=' ' * (level * self.level_indent)
        else:
            indent='  '

        self.fdout.write('{0:s}{1:s}'.format(indent,message))

    # Write a line out to the output file with newline at end
    # adding indentation level for beginning
    def write_line(self,message,lev=0):
        if lev==0:
            self.fdout.write(message + '\n')
        else:
            self.write_indent(message,lev)
            self.fdout.write('\n')

    # for printing msg,sum pairs when report is getting dense
    def print_on_same_line(self,message,total,level=0):
        """

        :param message: string to go with the total sum
        :param total:   total sum to go with the message
        :param level: OPTIONAL - begins message at level-appropriate indendation
        """
        #create a string of spaces called 'indent'
        if level > 0:
            indent=' ' * (level * self.level_indent)
        else:
            indent='  '
        self.fdout.write('{0:s}{1:s}: {2:d}  '.format(indent,message,total))

    # Print message and total with indentation set by input level
    # default sort of print-message
    def print_on_new_line(self, level, message, total):
        """
        :type level: int indicating indentation level
        :type message: str that goes with int total
        :type total:  int sum that goes with message
        """
        lev=int(level)
        #skip a space before level 1 statements
        if lev in self.newline_before:
            self.write_line('')

        #s1 and s2 are number of spaces for formatting
        s1 = lev * self.level_indent
        s2 = self.total_width - s1

        #initialize fstr to the correct number of spaces
        fstr='{{0:{0:d}s}} {{1:{1:d}s}}'.format(s1,s2)

        # Number format string is right justified within number_width
        number_format='{{2:-{0:d}d}}'.format(self.number_width)
        fstr+=number_format
        self.debug_print('level {0:d} format str= {1:s}'.format(lev,fstr))

        self.write_line(fstr.format(' ', message, int(total)))


    # Print message and total with indentation set by input level
    # dispatches printing to either print_on_same_line or print_on_new_line
    def print_level(self, level, message, total):
        """
        :type level: int
        :type message: str
        :type total:  int
        """
        lev=int(level)
        #assert lev >= self.min_level and lev <= self.max_level,\
        #    "input level not within current limits"
        if lev > self.max_level:
            self.print_on_same_line(message,total)
        else:
            self.print_on_new_line(level,message,total)

if __name__ == '__main__':
    print "Testing ReportWithLevels.printLevel()"
    pl=ReportWithLevels()
    pl.open_outfile("testout.txt")
    pl.write_line("This is my report")
    pl.number_width=12
    tot=7
    for level in [1,2,3,2,2,3,4,3,1]:  #range(1,4):
        msg='level {0:d} msg'.format(level)
        tot=tot * 12
        pl.print_level(level,msg,tot)

Wednesday, August 6, 2014

Linux: find and print the contents of files matching a pattern

find . -iname foo* -exec cat '{}' \;  | more  
To print the name before the printing the file contents:  
find . -iname foo* -print -exec more '{}' \; | more

Sunday, July 27, 2014

python: Write totals at different indentation levels

"""
ReportWithLevels
__author__ = 'Margery Harrison'
__license__ = "Public Domain"
__version__ = "1.0"
"""

import sys

class ReportWithLevels():

    def __init__(self):
        """
        Sets the default values for the class
        """
        self.fdout=sys.stdout  #default - writes report to stdout
        self.number_width=10
        self.level_indent=2
        self.total_width=30
        self.debug = True
        self.min_level = 1
        self.max_level = 4

        #levels before which to print a newline
        self.newline_before = [1]

    # open the input path as file to write to
    def open_outfile(self,path):
        try:
            self.fdout = open(path,'w')
        except IOError:
            msg="{0:s} Can't open and write to {1:s}".format(self.__class__.__name__,path)
            sys.stderr.write(msg)

    # print debug statement if debugging turned on
    def debugPrint(self,message):
        if self.debug:
            print(message)

    # Write a line out to the output file with newline at end
    def writeLine(self,message):
        self.fdout.write(message + '\n')

    # Print message and total with indentation set by input level
    def printLevel(self, level, message, total):
        lev=int(level)
        assert lev >= self.min_level and lev <= self.max_level,\
            "input level not within current limits"

        #skip a space before level 1 statements
        if lev in self.newline_before:
            self.writeLine('')

        #s1 and s2 are number of spaces for formatting
        s1 = lev * self.level_indent
        s2 = self.total_width - s1

        #initialize fstr to the correct number of spaces
        fstr='{{0:{0:d}s}} {{1:{1:d}s}}'.format(s1,s2)

        # Number format string is right justified within number_width
        number_format='{{2:-{0:d}d}}'.format(self.number_width)
        fstr+=number_format
        self.debugPrint('level {0:d} format str= {1:s}'.format(lev,fstr))

        self.writeLine(fstr.format(' ', message, int(total)))


if __name__ == '__main__':
    print "Testing ReportWithLevels.printLevel()"
    pl=ReportWithLevels()
    pl.open_outfile("testout.txt")
    pl.writeLine("This is my report")
    pl.number_width=12
    tot=7
    for level in [1,2,3,2,2,3,4,3,1]:  #range(1,4):
        msg='level {0:d} msg'.format(level)
        tot=tot * 12
        pl.printLevel(level,msg,tot)


    #pl.printLevel(8,"level 8 msg",88)  #test assert error

assert isinstance() not for file stream parameters?

I'm writing methods in PyCharm, and I'd like to follow its hints for the way I should be structuring my code. Here I'm passing in a parameter of type file stream, could be sys.out or a file descriptor. It prompted me to include an "assert isinstance()" for the input parameter. 

    def set_stream(self,fdout):
        """
        :param fdout: 
        """ 
        assert isinstance(fdout,...) 

I was looking up what type to call it for the purposes of isinstance() and I came across  http://dobesland.wordpress.com/2007/10/07/python-isinstance-considered-useful/:
The classic example of this is python’s famous “file-like objects”, which typically implement read and/or write in the same way and are accepted by various python functions.  I believe the DB API is another well-used example of this.  In both cases, there is no common base-class, so it’s impossible to use isinstance() to check whether a particular object is, in fact, file-like or a database object.
I guess I'll leave that assert() out. The code will break reliably enough when someone tries to write to an fdout that isn't the right sort of object.

Sunday, July 20, 2014

python: CSV analysis using Counter and DictReader and format

Here is a simple CSV file:

name,color,size,shape,number
tom,red,big,square,3
mary,blue,big,triangle,5
sally,green,small,square,2
edith,blue,small,triangle,1
wally,red,big,square,7
jon,blue,small,triangle,3

This code reads in the simple CSV and reports on it:


import os.path 
import csv 
import collections


def printLevel(level, message, total):
""" Print message and totals, with spacing determined by level
   
    Keyword arguments:
    level   -- integer from 1 to 4
    message -- string, e.g. "Number of happy tomcats"
    total   -- for this version, should be an integer count 
"""
    lev=int(level)
        

    if lev==1:
        print("")
    #fstr1='{0:5s} {1:35s} {2:-3d}'
    #fstr2='{0:10s} {1:30s} {2:-3d}'
    s1=5*lev  #5 or 10
    s2=40-s1
    fstr='{{0:{0:d}s}} {{1:{1:d}s}}'.format(s1,s2)
    fstr+=' {2:-3d}'

    print fstr.format(' ', message, int(total))

def print_colors_shapes(c):

"""Prints report on number of shapes of different colors
    keyword argument:
    c -- collection that includes values in shape_list, color_list below
"""
    shape_list= ['square', 'triangle','circle']
    color_list= ['red', 'blue', 'green','yellow']
    #first print shapes
    for shape in shape_list:
        msg = 'Number of '+ shape + 's'
        tot=c.get(shape,"0")
        printLevel(1, msg,tot)
        if int(tot) > 0:
            for color in color_list:
                msg = "Number of " + color + " " + shape + 's'
                tot=c.get(color + '_' + shape,"0")
                printLevel(2, msg, tot)
    for color in color_list:
        msg = 'Total {0} shapes'.format(color)
        tot = c.get(color,'0')
        printLevel(1,msg,tot)


def count_color_shape(my_reader):

""" create counters for colors and shapes separate and combined
    keyword argument:
    my_reader -- of type csv.DictReader
"""
    c = collections.Counter()
    for row in my_reader:
        print row
        color = row['color']
        shape = row['shape']
        c[color] += 1
        c[shape] += 1
        c[color + '_' + shape] += 1
    return c



def read_dict(path):
    with open(path) as csv_file:
        my_reader = csv.DictReader(csv_file)
        print my_reader.fieldnames
        #now have the read_dict() method return the Counter datastructure
        c = count_color_shape(my_reader)
        print c
        print_colors_shapes(c)

def test_dict():
    datadir = "/Users/margery/Documents/pystuff/pyGotham/demo/data"
    csv_file = 'simpleCSV.txt'
    path = os.path.join(datadir, csv_file)
    print path
    read_dict(path)

test_dict()
 
Here is the output:
/usr/bin/python /Users/margery/PycharmProjects/proj3/TestDict1.py
/Users/margery/Documents/pystuff/pyGotham/demo/data/simpleCSV.txt
['name', 'color', 'size', 'shape', 'number']
{'color': 'red', 'shape': 'square', 'number': '3', 'name': 'tom', 'size': 'big'}
{'color': 'blue', 'shape': 'triangle', 'number': '5', 'name': 'mary', 'size': 'big'}
{'color': 'green', 'shape': 'square', 'number': '2', 'name': 'sally', 'size': 'small'}
{'color': 'blue', 'shape': 'triangle', 'number': '1', 'name': 'edith', 'size': 'small'}
{'color': 'red', 'shape': 'square', 'number': '7', 'name': 'wally', 'size': 'big'}
{'color': 'blue', 'shape': 'triangle', 'number': '3', 'name': 'jon', 'size': 'small'}
Counter({'blue': 3, 'square': 3, 'triangle': 3, 'blue_triangle': 3, 'red_square': 2, 'red': 2, 'green': 1, 'green_square': 1})

      Number of squares                     3
           Number of red squares            2
           Number of blue squares           0
           Number of green squares          1
           Number of yellow squares         0

      Number of triangles                   3
           Number of red triangles          0
           Number of blue triangles         3
           Number of green triangles        0
           Number of yellow triangles       0

      Number of circles                     0

      Total red shapes                      2

      Total blue shapes                     3

      Total green shapes                    1

      Total yellow shapes                   0


Saturday, July 19, 2014

python: csv.dictreader - That's what I should have been using

I learned late Wednesday that I'm supposed to give a talk at PyGotham about my forays into python dictionary land. So I did some research. And what I really should have been using is a csv reader flavor called dictreader. This does exist in python 2.6, so being stuck there is no excuse.

There's a good explanation at http://pymotw.com/2/csv/#using-field-names



Friday, June 20, 2014

python: Links to my posts about python dictionary objects

purpose

The reason for my interest is that we generate output tab-separated files containing values that need to be tabulated into reports. I was looking for a way to generalize the code for creating these reports. In other words:
Given a list of strings describing the counts saved in each column of a CSV file, what would be the best way to automate the creation of reports of counts for individual and common categories?

For example, if a list of voters had values like Party Affiliation (D/R/O/U), Gender, Zip Code, etc, I should be able to feed in these values and get a report listing totals for Female Republicans voters[R][F], Male Democrats with Zip Code 12345 voters[D][M][12345], total unaffiliated voters voters[U], etc. And then I should be able to use the same tool to automate reports for my next list of zoo animals, with columns tabulating Species, Location, Diet...

input

  1. list of column ID's
  2. dictionary mapping column ID's to corresponding strings to use in report
  3. collection of totals desired for the report

output

  1. From input 1. and the CSV file you should be able to produce the dictionary structure containing all the totals.
  2. From 1 and 2 you should be able to produce a function to generate strings for a report line
  3. From 1, 2, 3 you should be able to produce a general sort of report. (I haven't bothered with that yet.)

links

I know there is a better way to organize this blog to enable easier searching by topic but I haven't investigated how yet. Until then, collected here are the posts where I was exploring the use of multidimensional dictionaries:
At some point I thought I had been clever, and posted the way to nest dictionaries. Later, I found that it did not work--my test data had had too many 1's and I hadn't caught that I was adding up the wrong values.

I ended up using a 1-D dictionary to save my multidimensional values. This worked so long as I could use a single character to express all the possibilities for each dimension, with no re-use.

Friday, March 28, 2014

python: 1-d dictionary for counting heirarchical attributes

This dictionary method uses concatenated characters instead of sub-dictionaries to track the hierarchical counts. e.g Number of Dogs could be pets['D'], Number of red dogs could be pets['DR']. Number of red dogs living in Cambridge could be pets['DRC'].

What follows is the general dictionary-building method, followed by a method written to illustrate how it could be used, followed by the printout from the test method.
# Given a list of lists, build a set of keys that combine fields from each list
# return a 1-d dictionary of these keys initialized to input_default
#
# @param list_of_lists = list of key-lists - Should be single unique characters.
# @param input_default [optional] Value to initialize each dictionary entry
# if input_default parameter not given, then 0 is used
def build1Dictionary(list_of_lists,input_default=0):
    number_of_lists=len(list_of_lists)
    if number_of_lists==0:
        return dict()
    keylist=list_of_lists[0]
    if number_of_lists==1:
        return dict.fromkeys(keylist,input_default)
    for next_list in list_of_lists[1:]:
        new_list=list(keylist)
        for key in keylist:
            for xkey in next_list:
                new_list.append(key+xkey)
        keylist=new_list
    return dict.fromkeys(keylist,input_default)

# Here is a method to test the dictionary:
def count_pets():
    #Create the dictionary
    pets=['D','C','F','B']
    gender=['g','b']
    home=['R','S','U']
    words=dict(D="Dog",C="Cat",F="Fish",B="Bird",\
       g="Girl",b="Boy",R="Rural",S="Suburban",U="City")
    pets_dict=build1Dictionary([pets,gender,home])
   
    #fill the dictionary with some counts
    data=["DgS","DgS","DbR","CgU","FgS","CgR","BbR"]
    for pet in data:
        p=pet[0:1]
        g=pet[1:2]
        h=pet[2:3]
        pets_dict[p]+=1
        pets_dict[p+g]+=1
        pets_dict[p+g+h]+=1
   
    #Print the counts indented by level of heiarchy
    for p in pets:
        print ("%d %s(s)" % (pets_dict[p],words[p]))
        for g in gender:
            if pets_dict[p+g]>0:
                print ("  %d owned by a %s" % \
                   (pets_dict[p+g],words[g]))
                for h in home:
                    if pets_dict[p+g+h] > 0:
                        print("    %d in %s environment." % \
                            (pets_dict[p+g+h],words[h]))
   

if __name__ == '__main__':
    count_pets()
Then when I run it, it prints out:
3 Dog(s)
  2 owned by a Girl
    2 in Suburban environment.
  1 owned by a Boy
    1 in Rural environment.
2 Cat(s)
  2 owned by a Girl
    1 in Rural environment.
    1 in City environment.
1 Fish(s)
  1 owned by a Girl
    1 in Suburban environment.
1 Bird(s)
  1 owned by a Boy
    1 in Rural environment.

Tuesday, March 11, 2014

Python: build and initialize a dictionary from a list of lists

OK, so building on the last post. Here's a general purpose function that takes a list of desired key-lists, and returns a multi-dimensional data dictionary.
##
## Takes dictionary and a list of desired keys
## adds the keys to the dictionary, initializing them to input parameter
#
# @param input_dictionary Name of input dictionary
# @param input_keys List of desired keys
# @param input_default [optional] Value to initialize each dictionary entry
# if input_default parameter not given, then 0 is used
#
#
def setDictionaryDefaults(input_dictionary,input_keys,input_default=0):
    for key in input_keys:
      input_dictionary.setdefault(key,input_default)


##
# Given a list of lists, build a multidimensional dictionary, initialized to optional input parameter
#
# @param list_of_lists = list of key-lists
# @param input_default [optional] Value to initialize each dictionary entry
# if input_default parameter not given, then 0 is used
def buildDictionary(list_of_lists,input_default=0):
    d=dict()
    number_of_lists=len(list_of_lists)
    if number_of_lists==0:
        return d
    setDictionaryDefaults(d,list_of_lists[0],input_default)
    if number_of_lists==1:
        return d
    next_d=dict()
    for next_list in list_of_lists[1:]:
        setDictionaryDefaults(next_d,next_list,d)
        d=next_d
        next_d=dict()
    return d
Here I test it:
 Python 2.4.1 (#2, Jul 24 2007, 12:14:31)
[GCC 3.2.3 20030502 (Red Hat Linux 3.2.3-34)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lists_to_dict import buildDictionary
>>> a=['a','b']
>>> c=[a]
>>> d=buildDictionary(c,1)
>>> d
{'a': 1, 'b': 1}
>>> q=['R','S','T']
>>> x=['X','XX','XXX']
>>> y=['Y','YY']
>>> c=[a,q,x,y]
>>> c
[['a', 'b'], ['R', 'S', 'T'], ['X', 'XX', 'XXX'], ['Y', 'YY']]
>>> d=buildDictionary(c,1)
>>> d
{'Y': {'X': {'S': {'a': 1, 'b': 1}, 'R': {'a': 1, 'b': 1}, 'T': {'a': 1, 'b': 1}}, 'XX': {'S': {'a': 1, 'b': 1}, 'R': {'a': 1, 'b': 1}, 'T': {'a': 1, 'b': 1}}, 'XXX': {'S': {'a': 1, 'b': 1}, 'R': {'a': 1, 'b': 1}, 'T': {'a': 1, 'b': 1}}}, 'YY': {'X': {'S': {'a': 1, 'b': 1}, 'R': {'a': 1, 'b': 1}, 'T': {'a': 1, 'b': 1}}, 'XX': {'S': {'a': 1, 'b': 1}, 'R': {'a': 1, 'b': 1}, 'T': {'a': 1, 'b': 1}}, 'XXX': {'S': {'a': 1, 'b': 1}, 'R': {'a': 1, 'b': 1}, 'T': {'a': 1, 'b': 1}}}}
>>>
>>> c=[]
>>> d=buildDictionary(c,1)
>>> d
{}
>>> a=['q','r']
>>> c=[a]
>>> buildDictionary(c)
{'q': 0, 'r': 0}
>>> d=buildDictionary(c)
>>> d
{'q': 0, 'r': 0}
>>>

python: building 3-D dictionary

We're using 2.4 so don't have defaultdict. Her's a way to create one using dict() and setdefault():

>>> hid=dict()
>>> hid.setdefault('H',0)
0
>>> hid.setdefault('I',0)
0
>>> bkd=dict()
>>> bkd
{}
>>> bkd.setdefault('Z',hid)
{'I': 0, 'H': 0}
>>> bkd
{'Z': {'I': 0, 'H': 0}}
>>> bkd.setdefault('CH',hid)
{'I': 0, 'H': 0}
>>> bkd
{'CH': {'I': 0, 'H': 0}, 'Z': {'I': 0, 'H': 0}}
>>> epd=dict()
>>> epd.setdefault('E',bkd)
{'CH': {'I': 0, 'H': 0}, 'Z': {'I': 0, 'H': 0}}
>>> epd.setdefault('P',bkd)
{'CH': {'I': 0, 'H': 0}, 'Z': {'I': 0, 'H': 0}}
>>> epd
{'P': {'CH': {'I': 0, 'H': 0}, 'Z': {'I': 0, 'H': 0}}, 'E': {'CH': {'I': 0, 'H': 0}, 'Z': {'I': 0, 'H': 0}}}
Now use it:
>>> ep=['E','P']
>>> bk=['Z','CH']
>>> hi=['H','I']
>>> i=0
>>> for e in ep:
...   for b in bk:
...     for h in hi:
...       epd[e][b][h]=i
...       i+=1
...
>>> epd
{'P': {'CH': {'I': 7, 'H': 6}, 'Z': {'I': 7, 'H': 6}}, 'E': {'CH': {'I': 7, 'H': 6}, 'Z': {'I': 7, 'H': 6}}}
>>> epd['E']['Z']['H']
6
I want to start with lists and build dictionaries from there. Here's the start of that automation:
>>> def setDefaults(k,d):
...   for key in k:
...     d.setdefault(key,0)
...
>>> kay=['A','B','C','D']
>>> dee=dict()
>>> setDefaults(kay,dee)
>>> dee
{'A': 0, 'C': 0, 'B': 0, 'D': 0}
Now update setDefaults to take the default initializer as a parameter:
>>> def setDefaults(k,d,i):
...   for key in k:
...     d.setdefault(key,i)
...
>>> kay
['A', 'B', 'C', 'D']
>>> d2=dict()
>>> setDefaults(kay,d2,2)
>>> d2
{'A': 2, 'C': 2, 'B': 2, 'D': 2}
OK, now here we start with three lists and build a 3-d dictionary using the setDefaults() method:
>>> kay
['A', 'B', 'C', 'D']
>>> ell=['X','Y','Z']
>>> grk=['pi','ro','lamtha','tau']
>>> kay_ell=dict()
>>> kay_d=dict()
>>> setDefaults(kay,kay_d,0)
>>> kay_d
{'A': 0, 'C': 0, 'B': 0, 'D': 0}
>>> setDefaults(ell,kay_ell,kay_d)
>>> ell
['X', 'Y', 'Z']
>>> kay_ell
{'Y': {'A': 0, 'C': 0, 'B': 0, 'D': 0}, 'X': {'A': 0, 'C': 0, 'B': 0, 'D': 0}, 'Z': {'A': 0, 'C': 0, 'B': 0, 'D': 0}}
>>> kay_ell_grk=dict()
>>> setDefaults(grk,kay_ell_grk,kay_ell)
>>> kay_ell_grk
{'tau': {'Y': {'A': 0, 'C': 0, 'B': 0, 'D': 0}, 'X': {'A': 0, 'C': 0, 'B': 0, 'D': 0}, 'Z': {'A': 0, 'C': 0, 'B': 0, 'D': 0}}, 'pi': {'Y': {'A': 0, 'C': 0, 'B': 0, 'D': 0}, 'X': {'A': 0, 'C': 0, 'B': 0, 'D': 0}, 'Z': {'A': 0, 'C': 0, 'B': 0, 'D': 0}}, 'ro': {'Y': {'A': 0, 'C': 0, 'B': 0, 'D': 0}, 'X': {'A': 0, 'C': 0, 'B': 0, 'D': 0}, 'Z': {'A': 0, 'C': 0, 'B': 0, 'D': 0}}, 'lamtha': {'Y': {'A': 0, 'C': 0, 'B': 0, 'D': 0}, 'X': {'A': 0, 'C': 0, 'B': 0, 'D': 0}, 'Z': {'A': 0, 'C': 0, 'B': 0, 'D': 0}}}
>>>

Thursday, January 23, 2014

python: "splat" operator * to unpack a list to use it as method parameters

I had a long list of parameters to be re-used in different methods. 
if condition1:
    method1( a,b,c,d,e,f,g,h,i,j)
elif condition2:
    method2( a,b,c,d,e,f,g,h,i,j)
I wanted to define the list only once.
 parameter_list=a,b,c,d,e,f,g,h,i,j
I didn't know how to use the values in the method call. If you use it directly, i.e.
method1(parameter_list) <- br="" nope="">
it's only one parameter, instead of the number of parameters in the list.

I found the answer in:
http://stackoverflow.com/questions/4979542/python-use-list-as-function-parameters

Apply the "splat" operator (*) to the parameter list, and it will work.
Here's an example:
>>> def sum(a,b,c):
...   print a+b+c
...
>>> sum(1,2,3)
6
>>> q=1,2,3
>>> sum(q)
Traceback (most recent call last):
  File "", line 1, in ?
TypeError: sum() takes exactly 3 arguments (1 given)
>>> sum(*q)
6
 

Blog Archive