Distributed and ambient query

SQL is more flexible and convenient for distributed query (gathering data from several databases and scattering them into several databases), than branded programs; including more convenient for replication, than branded programs. But there is no necessary syntax. We try to fill this gap.

At work with several databases, there are two eigen cases: choice of necessary database on basis of its identifier - let's name this case as distributed query, dQ, and on basis of index value for record - let's name this case as ambient query, aQ [1]. To distinguish further mentioned queries from usual, let's name the last as single query, sQ. We shall name tables with identcal name of different databases as general table.

Distributed query

Important difference of dQ is breaking of unique constraint for society, because records can be duplicated in tables with the same names of different databases. As for syntax, i propose:

let's each database has 'nickname'. Nickname is specified in queries before table name through colon

connect         address=remote.bz;  -- default database, saver of society
create nick db1 address=site.com;   -- nicknames are created in default database
create nick db2 address=data.net;
create nick db3 address=store.org;
create nick db4 address=place.ws;

insert into          tab select * from   db1:tab;  -- copying into default database
insert into      db3:tab select * from   db1:tab;

group of databases is named 'society'. Society is also specified before table name through colon, and means nicknames of all databases of group (name of society must not coincide with any nickname) [2]

create society s;
create society c;
append db1, c, default to   s;   -- default database is appended to society 's'
refuse db2, c          from s;

insert into           tab select * from    s:tab;
insert into       db1:tab select * from    s:tab;
insert into (db1+db4):tab select * from    s:tab;
insert into         s:tab select * from    s:tab;
insert into     (s-c):tab select * from    s:tab;

'default' database is database, in which all nicknames and societies are stored

In purpose of security, i propose concept of whole mistrust of one database to another:

database does not store login of other database (not to give foreign login at unauthorized access)
database does not edit (update, insert, delete) data of other database (temporary access to one database, obtained at unauthorized access, cannot destroy other database)
if it's possible, database does not obtain data from other database (data from another database does not become accessible at unauthorized access)

And concept of client simplicity:

client does not know SQL (can't simplify or decompose SQL)

So one database can't create and enter sql-command into other database neither directly, nor indirectly (asking client to forward command). And client cannot construct new sql-command on base of parsing of entered sql-command.

First database sends special command to client, which force client to make substitution inside last sql-command, sent to first database and memorized in client stack, and to send result of transformation into second database. Special commands must be so limited, that to not allow appearance of sql-command, harm for second database (details are on p.135-161 of separate pdf-document).

SQL must not have possibility to install default society (that query is sent by default not into one table, but into all tables of society), because insertion (INSERT) into all databases still has sense, but visualization (SELECT) of identical data of all databases - has not. Thus used society is installed forcibly by prefix before scheme name [3].

Collection of records from databases of society can be implemented in all databases simultaneously (SELECT ... PARALLELLY, p.139), if we want to compare records, containing in them, or do not know, in which database are they located, but want to obtain answer most quickly - sequentially (SELECT ... SEQUENTIALLY, p.140, first returned record stops fingering of databases), if we want to reduce quantity of operations in all computers of society. SELECT without postfix executed simultaneously or sequentially in dependence of state of switcher, e.g. 'hobo': 'SET HOBO PARALLELLY', 'SET HOBO SEQUENTIALLY'.

Modification of data (UPDATE, DELETE) are always executed simultaneously, instead of sequentially, because no one database can be excluded from processing beforehand.

Ambient query

Declaration of necessity of introduction

Default society, used in all queries, can be installed by instruction, e.g. 'SET AUTOSOCIETY societyname', or installed forcibly by prefix before scheme name via exclamation mark [3]. In any case, query is sent to default database at first, which store definition of society, and which give order to client to make substitution in last SQL-command, saved in client stack, and to act in accordance with result of transformation.

We must specify only in default database [4], scatterring of data should occur on basis of which fields of which general table of which society. Thus besides usual indexes for searching of records, there is a need in default database for indexes of separate kind to search another databases - we shall name them as glossaries and to create them by command 'CREATE GLOSSARY glossaryname ON tablename (fieldname) FOR societyname'.

We shall name indexes for the same field of tables with identcal name of different databases (including default database) of one society as (independently of whether indexes have identical name or not) as general index [5]. Unique indexes - as unique general index. Glossary and unique general index provide unique constraint for general table.

Since the same field are indexed by glossary and general index, i propose to store not indexed meanings themselves in glossary, but only ranges of meanings, belonging to some database. These ranges are not cross for any two databases, othewise we obtain dQ instead of aQ [6].

For case, if records with close indexed values will be inserted (they will come mainly into one range, and burden will lie mainly on one database), i propose to index not values themselves, but their hash-sums; to create glossary by command 'create HASH GLOSSARY glossaryname on tablename (fieldname) for societyname', and to store ranges of hash-summs in them. This glossary scatter records with close indexed values into all databases of society evenly. To use non-hash or hash glossary - it is depend of what aQ-queries are mainly executed: calculating correlation of records (better to use 'create GLOSSARY'), or do not making it (better 'create HASH GLOSSARY'). Calculating of correlation upon close indexed values, located in different databases, will produce big traffic.

Proposal of implementation

It will be necessary to ask for records not into one database after appending/withdrawing of database from society, but into two - both into which glossary pointed before appending/withdrawing, and into which after. That change of one database by two databases occur how much rare, it is disirable, that maximum quantity of ranges of glossary address to former database after appending/withdrawing.

After appending of new database into society ('APPEND nick TO societyname'), part of records, which earlier would be entered into another database (INSERT), will be directed into new database - otherwise it will remain un-used. To save evenly distribution of burden, it is necessary, that 1/(M+1) part of stream of records, entered earlier into each of M databases, will be directed into appended database. So ranges, addressing into them, should be splited into two range in proportion M:1. Only INSERT will executed on new set of ranges - SELECT, UPDATE, DELETE will be executed both on old, and on new sets. Than it is necessary to store all previous sets of ranges and URLs, corresponding to them, in glossary [7].

After withdrawing of database from society ('REFUSE nick FROM societyname'), new records, which earlier would be enterd into this database, will be directed into remained M-1 databases - otherwise records will be lost. To save evenly distribution of burden, it is necessary, that 1/(M-1) part of this stream is directed into each of them. So range, addressed into withdrawn database, should be splited into M-1 equal parts. That SELECT, UPDATE, DELETE will be executed in minimum quantity of databases, range should point to minimum quantity of them during all time of own existence, therefore at repeated appending of withdrawn database, i propose first of all to give ranges, earlier belonging to it, instead of ranges of another databases. And to consider databases as identical, if their URLs are coincide [7]. Than it is necessary once again to store all previous sets of ranges and URLs, corresponding to them, in glossary.

Footnotes

[1] It is necessary to note, that insertion (INSERT) of record into random database of society, or into each next database for each next record, although specifies database of basis of meaning, calculated for record, essentially is case of dQ: it is seen on example of another three operators SELECT, UPDATE, DELETE, which do not know, where records are located, coming under their WHERE

[2] Thus one sql-statement with society means a great number of sql-statements with nicknames. Besides this, that nicknames, several societies or several mentions of one society do not specify the same database simultaneously, we place symbol '%' before them;

insert into        %c:tab select * from   %s:tab;
insert into  %default:tab select * from   %c:tab;

and that several mentions of one society always synchronously specify the same database, we place any word (identical) and symbol '%' before this mentions

insert into m%s:tab
  select * from %s:tab
  -- 'm%s' and '%s' are different societies
  where @fld > (select @f from m%s:tab);
  -- 'into m%s:tab' and 'from m%s:tab' are the same database

[3] Different separators must be used for dQ and aQ, e.g. colon for first and exclamation mark for second: 'societyname:tablename' and 'societyname!tablename'

[4] No in any other database, to avoid even possible back transferring of data

[5] It is possible to create it by dQ-query 'CREATE INDEX societyname:indexname ON tablename (fieldname)'

[6] Glossaries are always 'unique' in this sense, there is no need to write 'create UNIQUE glossary'

[7] Thus glossary is set of records of some system table with three columns: bottom and top border of range, URL. System (i.e. inaccasable for commands 'alter index, drop index') index will not redundant for its field 'URL'. Besides that, if external world see each database schema as separate database (p.176-178), than database-schema is idenfified not by only URL, but by three parameters - URL, OS user, schema. And as consequence, databases-schemas are coincided only at coinciding of all three parameters, and all three must be stored in glossary, and mentioned system index should be built upon all three (so mentioned system table together with user name in OS and path to nested schema will have five columns)

Dima Turin, dmitryturin@yandex.ru

List of articles Choose language

Используются технологии uCoz