Roundup Tracker

FirebirdSQL Backend

1. It uses fbd-1.4

2. It bypasses Roundup’s schema.py and initial_data.py. Back_firebirdsql.py will automatically create a new FirebirdSQL database if one isn’t detected. To start again, just delete the current database (FirebirdSQL databases use a single file like SQLite).

3. I took the SQLite Classic database schema and adapted it to work with FirebirdSQL.

4. It will happily import data from another database using the roundup-admin.bat export/import routines.

5. Note that indexer_rdbms.py must also be changed.

Here's the 'config.ini'::

   1 # Settings in this section are used by RDBMS backends only
   2 [rdbms]
   3 
   4 # Name of the database to use.
   5 # Default: roundup
   6 name = C:/Python27/Scripts/test/db/roundup.fdb
   7 
   8 # Database server host.
   9 # Default: localhost
  10 host = localhost
  11 
  12 # TCP port number of the database server.
  13 # Postgresql usually resides on port 5432 (if any),
  14 # for MySQL default port number is 3306.
  15 # Leave this option empty to use backend default
  16 # Default: 
  17 port = 
  18 
  19 # Database user name that Roundup should use.
  20 # Default: roundup
  21 user = sysdba
  22 
  23 # Database user password.
  24 # Default: roundup
  25 password = masterkey
  26 
  27 # Name of the MySQL defaults file.
  28 # Only used in MySQL connections.
  29 # Default: ~/.my.cnf
  30 read_default_file = ~/.my.cnf
  31 
  32 # Name of the group to use in the MySQL defaults file (.my.cnf).
  33 # Only used in MySQL connections.
  34 # Default: roundup
  35 read_default_group = roundup
  36 
  37 # Number of seconds to wait when the SQLite database is locked
  38 # Default: use a 30 second timeout (extraordinarily generous)
  39 # Only used in SQLite connections.
  40 # Default: 30
  41 sqlite_timeout = 30
  42 
  43 # Size of the node cache (in elements)
  44 # Default: 100
  45 cache_size = 100
  46 
  47 # Setting this option to 'no' protects the database against table creations.
  48 # Allowed values: yes, no
  49 # Default: yes
  50 allow_create = no
  51 
  52 # Setting this option to 'no' protects the database against table alterations.
  53 # Allowed values: yes, no
  54 # Default: yes
  55 allow_alter = no
  56 
  57 # Setting this option to 'no' protects the database against table drops.
  58 # Allowed values: yes, no
  59 # Default: yes
  60 allow_drop = no
  61 
  62 # Name of the PostgreSQL template for database creation.
  63 # For database creation the template used has to match
  64 # the character encoding used (UTF8), there are different
  65 # PostgreSQL installations using different templates with
  66 # different encodings. If you get an error:
  67 #   new encoding (UTF8) is incompatible with the encoding of
  68 #   the template database (SQL_ASCII)
  69 #   HINT:  Use the same encoding as in the template database,
  70 #   or use template0 as template.
  71 # then set this option to the template name given in the
  72 # error message.
  73 # Default: 
  74 template = 

Here's the 'back_firebirdsql.py' module::

   1 '''
   2 2014-06-01      W. Robert Kellock
   3 
   4 FirebirdSQL backend via FDB for Roundup. No provision has been made for database schema changes.
   5 You are expected to edit the database directly for any schema alterations. Will happily import and export
   6 data between backends. Have successfully imported a SQLite database into FirebirdSQL. Note that you also 
   7 need to change indexer_rdbms.py for the FirebirdSQL backend to work.
   8 
   9 This is an absolute bare bones implementation of a Roundup backend for FirebirdSQL.  It works like this:
  10 
  11 1.      Roundup opens a new database connection, new cursor and new transaction for every "page" request (cgi.client.py or roundup.mailgw.py).
  12 2.      A number of database queries are run to build the page request under the one transaction.
  13 3.      Detectors can change the information to be written to the database or reject it entirely.
  14 4.      If there are no errors or rejections raised by the detectors the transaction is committed and the database is closed.
  15  
  16 Detectors
  17 =========
  18 Auditors - check information before it enters the database.
  19 Reactors - create new information based on changes they see being made to the database. For rejecting information reactors
  20            rely on the database being closed without the transaction being committed, rather than using rollbacks.
  21 '''
  22 import logging
  23 import fdb
  24 from types import *
  25 from roundup.backends import rdbms_common
  26 
  27 def connection_dict(config, dbnamestr=None):
  28     ''' Read_default_group and read_default_file is MySQL-specific, ignore it.
  29     Use name, host, port (port is not used by the FDB library), user & password 
  30     from config.ini where name needs to be the full path to the *.fdb database file '''
  31     d = rdbms_common.connection_dict(config, dbnamestr)
  32     if 'read_default_group' in d:
  33         del d['read_default_group']
  34     if 'read_default_file' in d:
  35         del d['read_default_file']
  36     return d
  37 
  38 def db_checks(config, action = None):
  39     """Single point of connection.  Database creation must be done in two steps.
  40        First a call to exists, then a call to create
  41     """
  42     db = connection_dict(config, 'database')
  43     logging.getLogger('roundup.hyperdb').debug('Database name %r' % db)
  44     try:
  45         conn = fdb.connect(host=db['host'], database=db['database'], user=db['user'], password=db['password'])
  46         success = 1
  47     except:
  48         success = 0
  49     logging.getLogger('roundup.hyperdb').debug('Action = %r and Success = %i' % (action, success))
  50     if action == 'create' and not success:
  51         try:    
  52                 conn = fdb.create_database(dsn=db['database'], user=db['user'], password=db['password'], page_size=8192)
  53         except:
  54                 return 0
  55     elif action == 'drop' and success:
  56         conn.drop_database()
  57 
  58     if action == 'exists' or action == 'drop':
  59         if success:
  60                 conn.close()
  61         return success
  62     else:
  63         return conn
  64 
  65 class Database(rdbms_common.Database):
  66     arg = '?' # positional arguments in SQL statements
  67 
  68     # used by some code to switch styles of query
  69     implements_intersect = 1
  70 
  71     def sql_mangle(self, value):
  72         ''' FirebirdSQL doesn't like leading or trailing _'s anywhere 
  73             in a SQL statement. They must be surounded by "'s
  74         '''
  75         value = value.replace('=', ' = ')
  76         value = value.replace(',', ' , ')
  77         value = value.replace('.', ' . ')
  78         value = value.replace(' date', ' _date')# hack for journal tables.  Might be forced to change rdbms_common instead
  79 
  80         value = value.replace(' __', ' \"__')
  81         value = value.replace(' _', ' \"_')
  82 
  83         value = value.replace('(__', ' (\"__')
  84         value = value.replace('(_', ' (\"_')
  85 
  86         value = value.replace('=?', ' =? ')     # positional arguments
  87 
  88         z = value.split(' ')
  89         for i,x in enumerate(z):
  90                 y = str(x)
  91                 if y.find('\"') > -1:
  92                         p = y.find(')')
  93                         if p  > -1:
  94                                 y = y.replace(')','\")',1)
  95                         else:
  96                                 y = y + '\"'
  97                         z[i] = y
  98         value = ' '.join(z)
  99         value = value.replace(' . ', '.')
 100         value = value.replace(' TRUE', '1')     # boolean datatype replaced by integer
 101         return value
 102 
 103     def check_tuple_args(self, args):
 104         '''     Another hack to catch the boolean datatype to be replaced by integer,
 105                 but this time it applies to positional arguments when supplied as a tuple
 106         '''
 107         new_args = []
 108         for x in args:
 109            if type(x) is StringType:
 110                    y = x
 111                    if x.upper() == 'TRUE':
 112                            y = 1
 113                    new_args.append(y)
 114            else:
 115                    new_args.append(x)
 116         return tuple(new_args)
 117 
 118     def sql(self, sql, args=None, cursor=None):
 119         """ Execute the sql with the optional args.
 120         """
 121         sql = self.sql_mangle(sql)
 122         if type(args) is TupleType:
 123                 args = self.check_tuple_args(args)
 124         self.log_debug('SQL %r %r'%(sql, args))
 125         if not cursor:
 126             cursor = self.cursor
 127         if args:
 128             cursor.execute(sql, args)
 129         else:
 130             cursor.execute(sql)
 131 
 132     def open_connection(self):
 133         if not db_checks(self.config, 'exists'):
 134             self.conn = db_checks(self.config, 'create')
 135             self.cursor = self.conn.cursor()
 136             self.sql("CREATE TABLE __textids ( _class varchar(30), _itemid varchar(255), _prop varchar(255), _textid integer not null, primary key(_textid))")
 137             self.sql("CREATE TABLE __words(_word varchar(30), _textid integer)")
 138             self.sql("CREATE TABLE _file (_activity VARCHAR(30), _actor INTEGER, _content VARCHAR(255), _creation VARCHAR(30), _creator INTEGER, _name VARCHAR(255), _type VARCHAR(255), id INTEGER PRIMARY KEY, __retired__ INTEGER DEFAULT 0)")
 139             self.sql("CREATE TABLE _issue (_activity VARCHAR(30),_actor INTEGER,_assignedto INTEGER,_creation VARCHAR(30),_creator INTEGER,_priority INTEGER,_status INTEGER,_title VARCHAR(255),id INTEGER PRIMARY KEY,__retired__ INTEGER DEFAULT 0)")
 140             self.sql("CREATE TABLE _keyword (_activity VARCHAR(30),_actor INTEGER,_creation VARCHAR(30),_creator INTEGER,_name VARCHAR(255),id INTEGER PRIMARY KEY,__retired__ INTEGER DEFAULT 0)")
 141 # _nosy_global changed to integer from boolean and _summary to varchar(32765) from varchar(255)
 142             self.sql("CREATE TABLE _msg (_activity VARCHAR(30),_actor INTEGER,_author INTEGER,_content VARCHAR(255),_creation VARCHAR(30),_creator INTEGER,_date VARCHAR(30),_inreplyto VARCHAR(255),_messageid VARCHAR(255),_nosy_global integer,_summary VARCHAR(32765),_type VARCHAR(255),id INTEGER PRIMARY KEY,__retired__ INTEGER DEFAULT 0)")
 143             self.sql("CREATE TABLE _priority (_activity VARCHAR(30),_actor INTEGER,_creation VARCHAR(30),_creator INTEGER,_name VARCHAR(255),_order REAL,id INTEGER PRIMARY KEY,__retired__ INTEGER DEFAULT 0)")
 144             self.sql("CREATE TABLE _query (_activity VARCHAR(30),_actor INTEGER,_creation VARCHAR(30),_creator INTEGER,_klass VARCHAR(255),_name VARCHAR(255),_private_for INTEGER,_url VARCHAR(255),id INTEGER PRIMARY KEY,__retired__ INTEGER DEFAULT 0)")
 145             self.sql("CREATE TABLE _status (_activity VARCHAR(30),_actor INTEGER,_creation VARCHAR(30),_creator INTEGER,_name VARCHAR(255),_order REAL,id INTEGER PRIMARY KEY,__retired__ INTEGER DEFAULT 0)")
 146             self.sql("CREATE TABLE _user (_activity VARCHAR(30),_actor INTEGER,_address VARCHAR(255),_alternate_addresses VARCHAR(255),_creation VARCHAR(30),_creator INTEGER,_lastMessage VARCHAR(30),_organisation VARCHAR(255),_password VARCHAR(255),_phone VARCHAR(255),_realname VARCHAR(255),_roles VARCHAR(255),_timezone VARCHAR(255),_username VARCHAR(255),id INTEGER PRIMARY KEY,__retired__ INTEGER DEFAULT 0)")
 147 # date changed to _date and params to varchar(32765) from text
 148             self.sql("CREATE TABLE file__journal (nodeid integer, _date VARCHAR(30), tag varchar(255),action varchar(255), params varchar(32765))")
 149 # name changed to varchar(80) from varchar
 150             self.sql("CREATE TABLE ids (name varchar(80), num integer)")
 151 # date changed to _date and params to varchar(32765) from text
 152             self.sql("CREATE TABLE issue__journal (nodeid integer, _date VARCHAR(30), tag varchar(255), action varchar(255), params varchar(32765))")
 153             self.sql("CREATE TABLE issue_files (linkid INTEGER, nodeid INTEGER)")
 154             self.sql("CREATE TABLE issue_keyword (linkid INTEGER, nodeid INTEGER)")
 155             self.sql("CREATE TABLE issue_messages (linkid INTEGER, nodeid INTEGER)")
 156             self.sql("CREATE TABLE issue_nosy (linkid INTEGER, nodeid INTEGER)")
 157             self.sql("CREATE TABLE issue_superseder (linkid INTEGER, nodeid INTEGER)")
 158 # date changed to _date and params to varchar(32765) from text
 159             self.sql("CREATE TABLE keyword__journal (nodeid integer, _date VARCHAR(30), tag varchar(255),action varchar(255), params varchar(32765))")
 160 # date changed to _date and params to varchar(32765) from text
 161             self.sql("CREATE TABLE msg__journal (nodeid integer, _date VARCHAR(30), tag varchar(255),action varchar(255), params varchar(32765))")
 162             self.sql("CREATE TABLE msg_files (linkid INTEGER, nodeid INTEGER)")
 163             self.sql("CREATE TABLE msg_recipients (linkid INTEGER, nodeid INTEGER)")
 164 # otk_key & otk_value changed to varchar(80) from varchar
 165             self.sql("CREATE TABLE otks (otk_key varchar(80), otk_value varchar(80), otk_time integer)")
 166 # date changed to _date and params to varchar(32765) from text
 167             self.sql("CREATE TABLE priority__journal (nodeid integer, _date VARCHAR(30), tag varchar(255),action varchar(255), params varchar(32765))")
 168 # date changed to _date and params to varchar(32765) from text
 169             self.sql("CREATE TABLE query__journal (nodeid integer, _date VARCHAR(30), tag varchar(255),action varchar(255), params varchar(32765))")
 170 # session_key & session_value changed to varchar(80) from varchar
 171             self.sql("CREATE TABLE sessions (session_key varchar(80), session_time integer, session_value varchar(80))")
 172 # date changed to _date and params to varchar(32765) from text
 173             self.sql("CREATE TABLE status__journal (nodeid integer, _date VARCHAR(30), tag varchar(255),action varchar(255), params varchar(32765))")
 174 # date changed to _date and params to varchar(32765) from text
 175             self.sql("CREATE TABLE user__journal (nodeid integer, _date VARCHAR(30), tag varchar(255),action varchar(255), params varchar(32765))")
 176             self.sql("CREATE TABLE user_queries (linkid INTEGER, nodeid INTEGER)")
 177             self.sql_commit()
 178             self.sql("INSERT INTO _user VALUES(20140528004410.374,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,'Admin, Staff, User',NULL,'admin',1,0)")
 179             self.sql("INSERT INTO _user VALUES(20140528004410.374,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,'Anonymous',NULL,'anonymous',2,0)")
 180             self.sql("INSERT INTO _priority VALUES(20130301221955.763,1,20130301221955.763,1,'critical',1.0,1,0)")
 181             self.sql("INSERT INTO _priority VALUES(20130301221955.763,1,20130301221955.763,1,'urgent',2.0,2,0)")
 182             self.sql("INSERT INTO _priority VALUES(20130301221955.763,1,20130301221955.763,1,'wish',3.0,3,0)")
 183             self.sql("INSERT INTO _status VALUES(20130301221955.763,1,20130301221955.763,1,'unread',1.0,1,0)")
 184             self.sql("INSERT INTO _status VALUES(20130301221955.778,1,20130301221955.778,1,'deferred',2.0,2,0)")
 185             self.sql("INSERT INTO _status VALUES(20130301221955.778,1,20130301221955.778,1,'chatting',3.0,3,0)")
 186             self.sql("INSERT INTO _status VALUES(20130301221955.778,1,20130301221955.778,1,'need-eg',4.0,4,0)")
 187             self.sql("INSERT INTO _status VALUES(20130301221955.778,1,20130301221955.778,1,'in-progress',5.0,5,0)")
 188             self.sql("INSERT INTO _status VALUES(20130301221955.778,1,20130301221955.778,1,'testing',6.0,6,0)")
 189             self.sql("INSERT INTO _status VALUES(20130301221955.778,1,20130301221955.778,1,'done-cbb',7.0,7,0)")
 190             self.sql("INSERT INTO _status VALUES(20130301221955.778,1,20130301221955.778,1,'resolved',8.0,8,0)")
 191             self.sql("INSERT INTO ids VALUES('__textids',1)")
 192             self.sql("INSERT INTO ids VALUES('status',1)")
 193             self.sql("INSERT INTO ids VALUES('keyword',1)")
 194             self.sql("INSERT INTO ids VALUES('priority',1)")
 195             self.sql("INSERT INTO ids VALUES('user',3)")# has to be 3 since 1 and 2 belong to admin & anonymous
 196             self.sql("INSERT INTO ids VALUES('file',1)")
 197             self.sql("INSERT INTO ids VALUES('msg',1)")
 198             self.sql("INSERT INTO ids VALUES('query',1)")
 199             self.sql("INSERT INTO ids VALUES('issue',1)")
 200             self.sql_commit()
 201             self.sql("CREATE UNIQUE INDEX __textids_by_props ON __textids (_class, _itemid, _prop)")
 202             self.sql("CREATE INDEX _file_retired_idx on _file(__retired__)")
 203             self.sql("CREATE INDEX _issue_retired_idx on _issue(__retired__)")
 204             self.sql("CREATE UNIQUE INDEX _keyword_key_retired_idx on _keyword(__retired__, _name)")
 205             self.sql("CREATE INDEX _keyword_name_idx on _keyword(_name)")
 206             self.sql("CREATE INDEX _keyword_retired_idx on _keyword(__retired__)")
 207             self.sql("CREATE INDEX _msg_retired_idx on _msg(__retired__)")
 208             self.sql("CREATE UNIQUE INDEX _priority_key_retired_idx on _priority(__retired__, _name)")
 209             self.sql("CREATE INDEX _priority_name_idx on _priority(_name)")
 210             self.sql("CREATE INDEX _priority_retired_idx on _priority(__retired__)")
 211             self.sql("CREATE INDEX _query_retired_idx on _query(__retired__)")
 212             self.sql("CREATE UNIQUE INDEX _status_key_retired_idx on _status(__retired__, _name)")
 213             self.sql("CREATE INDEX _status_name_idx on _status(_name)")
 214             self.sql("CREATE INDEX _status_retired_idx on _status(__retired__)")
 215             self.sql("CREATE UNIQUE INDEX _user_key_retired_idx on _user(__retired__, _username)")
 216             self.sql("CREATE INDEX _user_retired_idx on _user(__retired__)")
 217             self.sql("CREATE INDEX _user_username_idx on _user(_username)")
 218             self.sql("CREATE INDEX file_journ_idx on file__journal(nodeid)")
 219             self.sql("CREATE INDEX ids_name_idx on ids(name)")
 220             self.sql("CREATE INDEX issue_files_l_idx on issue_files(linkid)")
 221             self.sql("CREATE INDEX issue_files_n_idx on issue_files(nodeid)")
 222             self.sql("CREATE INDEX issue_journ_idx on issue__journal(nodeid)")
 223             self.sql("CREATE INDEX issue_keyword_l_idx on issue_keyword(linkid)")
 224             self.sql("CREATE INDEX issue_keyword_n_idx on issue_keyword(nodeid)")
 225             self.sql("CREATE INDEX issue_messages_l_idx on issue_messages(linkid)")
 226             self.sql("CREATE INDEX issue_messages_n_idx on issue_messages(nodeid)")
 227             self.sql("CREATE INDEX issue_nosy_l_idx on issue_nosy(linkid)")
 228             self.sql("CREATE INDEX issue_nosy_n_idx on issue_nosy(nodeid)")
 229             self.sql("CREATE INDEX issue_superseder_l_idx on issue_superseder(linkid)")
 230             self.sql("CREATE INDEX issue_superseder_n_idx on issue_superseder(nodeid)")
 231             self.sql("CREATE INDEX keyword_journ_idx on keyword__journal(nodeid)")
 232             self.sql("CREATE INDEX msg_files_l_idx on msg_files(linkid)")
 233             self.sql("CREATE INDEX msg_files_n_idx on msg_files(nodeid)")
 234             self.sql("CREATE INDEX msg_journ_idx on msg__journal(nodeid)")
 235             self.sql("CREATE INDEX msg_recipients_l_idx on msg_recipients(linkid)")
 236             self.sql("CREATE INDEX msg_recipients_n_idx on msg_recipients(nodeid)")
 237             self.sql("CREATE INDEX otks_key_idx on otks(otk_key)")
 238             self.sql("CREATE INDEX priority_journ_idx on priority__journal(nodeid)")
 239             self.sql("CREATE INDEX query_journ_idx on query__journal(nodeid)")
 240             self.sql("CREATE INDEX sessions_key_idx on sessions(session_key)")
 241             self.sql("CREATE INDEX status_journ_idx on status__journal(nodeid)")
 242             self.sql("CREATE INDEX user_journ_idx on user__journal(nodeid)")
 243             self.sql("CREATE INDEX user_queries_l_idx on user_queries(linkid)")
 244             self.sql("CREATE INDEX user_queries_n_idx on user_queries(nodeid)")
 245             self.sql("CREATE INDEX words_by_id ON __words (_textid)")
 246             self.sql("CREATE INDEX words_word_ids ON __words(_word)")
 247             self.sql_commit()
 248             self.conn.close()
 249         
 250         self.conn = db_checks(self.config)
 251         self.cursor = self.conn.cursor()
 252 
 253     def post_init(self):
 254         pass
 255 
 256     # old-skool id generation
 257     def newid(self, classname):
 258         """ Generate a new id for the given class
 259         """
 260         # get the next ID
 261         sql = 'select num from ids where name=%s'%self.arg
 262         self.sql(sql, (classname, ))
 263         newid = int(self.cursor.fetchone()[0])
 264 
 265         # update the counter
 266         sql = 'update ids set num=%s where name=%s'%(self.arg, self.arg)
 267         vals = (int(newid)+1, classname)
 268         self.sql(sql, vals)
 269 
 270         # return as string
 271         return str(newid)
 272 
 273     def setid(self, classname, setid):
 274         """ Set the id counter: used during import of database
 275 
 276         We add one to make it behave like the sequences in postgres.
 277         """
 278         sql = 'update ids set num=%s where name=%s'%(self.arg, self.arg)
 279         vals = (int(setid)+1, classname)
 280         self.sql(sql, vals)
 281 
 282 class FirebirdsqlClass:
 283     pass
 284 
 285 class Class(FirebirdsqlClass, rdbms_common.Class):
 286     pass
 287 class IssueClass(FirebirdsqlClass, rdbms_common.IssueClass):
 288     pass
 289 class FileClass(FirebirdsqlClass, rdbms_common.FileClass):
 290     pass

Here's the 'indexer_rdbms.py' module::

   1 """ This implements the full-text indexer over two RDBMS tables. The first
   2 is a mapping of words to occurance IDs. The second maps the IDs to (Class,
   3 propname, itemid) instances.
   4 
   5 2014-06-03      W. Robert Kellock - changed all instances of cursor.execute to sql since we want all sql to go through one routine
   6                 Also look for cursor.executemany which had to have \"'s inserted to make it work with FirebirdSQL
   7 """
   8 import re
   9 # Python 2.3 ... 2.6 compatibility:
  10 from roundup.anypy.sets_ import set
  11 
  12 from roundup.backends.indexer_common import Indexer as IndexerBase
  13 
  14 class Indexer(IndexerBase):
  15     def __init__(self, db):
  16         IndexerBase.__init__(self, db)
  17         self.db = db
  18         self.reindex = 0
  19 
  20     def close(self):
  21         """close the indexing database"""
  22         # just nuke the circular reference
  23         self.db = None
  24 
  25     def save_index(self):
  26         """Save the changes to the index."""
  27         # not necessary - the RDBMS connection will handle this for us
  28         pass
  29 
  30     def force_reindex(self):
  31         """Force a reindexing of the database.  This essentially
  32         empties the tables ids and index and sets a flag so
  33         that the databases are reindexed"""
  34         self.reindex = 1
  35 
  36     def should_reindex(self):
  37         """returns True if the indexes need to be rebuilt"""
  38         return self.reindex
  39 
  40     def add_text(self, identifier, text, mime_type='text/plain'):
  41         """ "identifier" is  (classname, itemid, property) """
  42         if mime_type != 'text/plain':
  43             return
  44 
  45         # Ensure all elements of the identifier are strings 'cos the itemid
  46         # column is varchar even if item ids may be numbers elsewhere in the
  47         # code. ugh.
  48         identifier = tuple(map(str, identifier))
  49 
  50         # first, find the id of the (classname, itemid, property)
  51         a = self.db.arg
  52         sql = 'select _textid from __textids where _class=%s and '\
  53             '_itemid=%s and _prop=%s'%(a, a, a)
  54         self.db.sql(sql, identifier)
  55         r = self.db.cursor.fetchone()
  56         if not r:
  57             # not previously indexed
  58             id = self.db.newid('__textids')
  59             sql = 'insert into __textids (_textid, _class, _itemid, _prop)'\
  60                 ' values (%s, %s, %s, %s)'%(a, a, a, a)
  61             self.db.sql(sql, (id, ) + identifier)
  62         else:
  63             id = int(r[0])
  64             # clear out any existing indexed values
  65             sql = 'delete from __words where _textid=%s'%a
  66             self.db.sql(sql, (id, ))
  67 
  68         # ok, find all the unique words in the text
  69         if not isinstance(text, unicode):
  70             text = unicode(text, "utf-8", "replace")
  71         text = text.upper()
  72         wordlist = [w.encode("utf-8")
  73                     for w in re.findall(r'(?u)\b\w{%d,%d}\b'
  74                                         % (self.minlength, self.maxlength), text)]
  75         words = set()
  76         for word in wordlist:
  77             if self.is_stopword(word): continue
  78             words.add(word)
  79 
  80         # for each word, add an entry in the db
  81         sql = 'insert into \"__words\" (\"_word\", \"_textid\") values (%s, %s)'%(a, a)
  82         #sql = 'insert into __words (_word, _textid) values (%s, %s)'%(a, a)
  83         words = [(word, id) for word in words]
  84         self.db.cursor.executemany(sql, words)
  85 
  86     def find(self, wordlist):
  87         """look up all the words in the wordlist.
  88         If none are found return an empty dictionary
  89         * more rules here
  90         """
  91         if not wordlist:
  92             return []
  93 
  94         l = [word.upper() for word in wordlist
  95              if self.minlength <= len(word) <= self.maxlength]
  96         l = [word for word in l if not self.is_stopword(word)]
  97 
  98         if not l:
  99             return []
 100 
 101         if self.db.implements_intersect:
 102             # simple AND search
 103             sql = 'select distinct(_textid) from __words where _word=%s'%self.db.arg
 104             sql = '\nINTERSECT\n'.join([sql]*len(l))
 105             self.db.sql(sql, tuple(l))
 106             r = self.db.cursor.fetchall()
 107             if not r:
 108                 return []
 109             a = ','.join([self.db.arg] * len(r))
 110             sql = 'select _class, _itemid, _prop from __textids '\
 111                 'where _textid in (%s)'%a
 112             self.db.sql(sql, tuple([int(row[0]) for row in r]))
 113 
 114         else:
 115             # A more complex version for MySQL since it doesn't implement INTERSECT
 116 
 117             # Construct SQL statement to join __words table to itself
 118             # multiple times.
 119             sql = """select distinct(__words1._textid)
 120                         from __words as __words1 %s
 121                         where __words1._word=%s %s"""
 122 
 123             join_tmpl = ' left join __words as __words%d using (_textid) \n'
 124             match_tmpl = ' and __words%d._word=%s \n'
 125 
 126             join_list = []
 127             match_list = []
 128             for n in xrange(len(l) - 1):
 129                 join_list.append(join_tmpl % (n + 2))
 130                 match_list.append(match_tmpl % (n + 2, self.db.arg))
 131 
 132             sql = sql%(' '.join(join_list), self.db.arg, ' '.join(match_list))
 133             self.db.sql(sql, l)
 134 
 135             r = [x[0] for x in self.db.cursor.fetchall()]
 136             if not r:
 137                 return []
 138 
 139             a = ','.join([self.db.arg] * len(r))
 140             sql = 'select _class, _itemid, _prop from __textids '\
 141                 'where _textid in (%s)'%a
 142 
 143             self.db.sql(sql, tuple(map(int, r)))
 144 
 145         return self.db.cursor.fetchall()