List of articles   Choose language


Path notation vs.
CURSOR, Datalog,
SQL/XML, and recursive SELECT

Vs. CURSOR and Datalog

Nobody from us work with isolated records, we manipulate much more complex constructions - formal acknowledgement of it is presence of foreign keys. There was at least two attempts to go from processing of records to processing of constructions, consisting from them.

One was to create pointer and to move it sequently from record to record. It come even worse in reality: restricted variant of pointer, naming CURSOR, have implemented, which can move not around whole database, but only withing one table [1]. Presence of more than one simultaneously open cursor shows, that we work practically with hierarchical data, instead of Cartesian products. And applying of operator LOOP (we do not use it to form Cartesian products!) says, that we force man to trace work of machine, and our operator is calque from assembler mnemonic LOOP. Solution is frankly unacceptable, coming to unproductive [2] wasting of human time. Way out is manipulating of whole tree at once, intead of processing individual records. I propose to blame using of cursors and to not recommend further usage of them.

Other consisted of refusal from notation WHERE field1=field2 and to address to syntax tablename1(_, X), tablename2(X, _). Even fact, that this syntax is difficult both in studing, and in control of intermediate results of program execution, what was yet known on example of first-born Prolog, did not confuse authors [3]. Fail of japan program of computers of fifth generation only confirm non-usability of this solution. Consequence of this second attempt was truly terrible: construction WITH RECURSIVE have been enterd into standard SQL:1999 - now man begin to trace iterator in prolog style. As though calque from assembler mnemonic was not enough. Of course, i propose to blame using of construction WITH RECURSIVE and to not recommend further usage of it.


Why, when we write address on envelope, we do not use pointer for countries, pointer for cities, pointer for streets, but we list "records of different tables" via comma ? Why, when we write path in shell upon file system, we do not create function, calling itself, but we list "records of one table" via slash? We make the same, when we write path in XPath. Why we in those cases leave particular implementation off-camera, but in case of SQL we do not guess??.

What hamper to repeat this obviously good solution, having specified requirements for records in WHERE and having listed tables, in which records are located, via other sign, different from comma and slash - 'tablename1.tablename2.tablename3 WHERE field1a=field2a and field2b=field3b' [4]? And if tree is branching, to take both its branch into brackets 'tablename1.(tablename2.tablename3   tablename7.tablename8)'. And to specify in operator, for example, UPDATE, in which places of tree should updating of fields be performed 'UPDATE tablename1.tablename2.tablename3 SET field2c=5 WHERE field1a=field2a and field2b=field3b)'.

Vs. SQL/XML and recursive SELECT

Let's consider SQL/XML as way to extract hierarchical data (instead of way to form hierarchical data as xml). Usage of SQL/XML-functions, as well as syntax of proprietary web-server [5], gives very bulky code, writing of which is hard. At the same time, usually records are extracted, which already bound by foreign key - we have a tree, already formed in schema of database. I propose elegant 'select a.b.c' to select data from these already bound tables 'a', 'b', 'c' [6] (it's supposed, that tables refer to each other by only one foreign key, and that two tables don't refer to each other simultaneously [7]).

For example, to find the most cheap path between cities

create table city (
  id   number  primary key,
  name varchar
);
create table price (
  id1  number  references city (id),
  id2  number  references city (id),
  cost money
);
we could write laconically and clearly, using

select sum(@cost)
from   city[@id=5].price*.city[@id=700]
where  previous(price)/@id2=next(price)/@id1;
instead of bulky expression

with recursive res (@id1, @id2, @total) as (
  select  @id1, @id2, @cost
    from  price
    where @id1=5
  union
  select  res/@id1, price/@id2, res/@total+price/@cost
    from  res, price
    where res/@id2=price/@id1
) select  @id1, @id2, @total
    from  res
    where @id2=700;

Footnotes

[1] View, in general case

[2] Man must formulate decision, instead of to trace executing of separate commands. We leave so fact aside, that this approach comes to repeated sending of individual records to server and back, overburdens traffic, narrow bandwidth, devours recources of machines

[3] 1978 year, Herve Gallaire and Jack Minker conduct seminar, devoted this topic; 1980 year, David Maier entered term "datalog" (DATAbase + proLOG, obviously)

[4] Desirable, via anything less letter, that free space would naturally separate table names, e.g. via dot

[5] Turning all relational fields not into xml-attributes, but into xml-elements is suitable for browser, but does not suit for mash-up services and much languages, based on XML

[6] But it's very easy to bind two trees (BTT), as it's shown on p.25

[7] If two tables refer to each other, or one table refers to another by several foreign keys, than this ambiguity is solved in request of tree: Sign '#' and name of necessary referring field (not name of constraint, i.e. name of foreign key) are specified after strictly name of table, containing necessary foreign key. It looks like new table name with symbol '#' in the middle of the name (p.12-14). We shall name this specifying of referring field by term 'refinement'. Similarly, if table contains a list and refers to self by several foreign keys, than this ambiguity is also solved in request of tree: Sign '$' and name of necessary referring field (not name of constraint, i.e. name of foreign key) are specified after strictly table name. It also looks like new table name with symbol '$' in the middle of the name (p.15-16). We shall name this specifying of referring field by term 'refinement' too. Different signs - '#' and '$' - are used in two different types of refinement, that it would possible to use both types of refinement simultaneously: 'table#field1$field2'



Dima Turin, dmitryturin@yandex.ru



List of articles   Choose language


Используются технологии uCoz