SyncQL Tutorial

The Syncbase Query Language

John Kline (jkline@google.com)

Version 1.0

October 2015

Overview

Syncbase is Vanadium’s key/value store and provides persistent data with fine grained access and synchronization across syncbase instances.

SyncQL is a SQL-like query language for syncbase.

This tutorial walks one through the setup of a sample database and then dives into teaching syncQL by running command line queries.

How to Take this Tutorial

This tutorial assumes one has a working Vanadium development environment.

All of the steps are independent, so one can pick up anywhere he wishes. Just perform the setup steps to start/restart and the teardown steps to finish/take a break.

It’s OK to cut and paste the setup and teardown steps, but please consider typing the queries rather than cutting and pasting them. Actually typing the queries (and making mistakes and correcting them) is a great way to learn.

Setup

This set of step-by-step instructions assumes one has a working Vanadium development environment, which includes having the JIRI_ROOT environment variable set.

[Note: For the Vanadium and/or Syncbase literate, it’s OK to skip and/or modify some of these steps.]

  1. Build and install the necessary executables (principal program, mount table daemon, syncbase daemon, syncbase command line program):

    1. jiri go install v.io/x/ref/cmd/principal v.io/x/ref/services/mounttable/mounttabled v.io/x/ref/services/syncbase/syncbased v.io/x/ref/cmd/sb
  2. Create a principal using a self-signed blessing:

    1. $JIRI_ROOT/release/go/bin/principal create /tmp/me $JIRI_ROOT/release/go/bin/principal blessself
  3. Start the mounttable and syncbase daemons (start in foreground to enter root password, then send to background):

    1. $JIRI_ROOT/release/go/bin/mounttabled -v23.tcp.address=:8101

    2. <ctrl-z>

    3. bg

    4. $JIRI_ROOT/release/go/bin/syncbased -name ‘/:8101/syncbase’ -v23.credentials=/tmp/me -root-dir=~

    5. <ctrl-z>

    6. bg

  4. Start sb (syncbase command line program) and create a demo database:

    1. $JIRI_ROOT/release/go/bin/sb sh -create-missing -v23.credentials=/tmp/me demoapp demodb

    2. make-demo;

When you are finished the tutorial (or you want to take a break), execute the steps in the Teardown section (below) to clean up!

Executing Queries in SB

If one has performed the setup steps above, he will be sitting in sb at the ‘?’ prompt.

To make sure everything is running properly, dump the database with the following command (be sure in include the semicolon):

dump;

If a bunch of data prints to the screen, everything is properly setup. If not, execute the teardown steps below and then re-execute the steps above.

Don’t try to understand all of the data that was printed with the dump command. The tables in the demo database are overly complicated in order to demonstrate all of the features of syncQL. We’ll take things a step at a time and only explain the data as needed.

Note: The vdl objects stored in the demo database are described in the following file:

$JIRI_ROOT/release/go/src/v.io/x/ref/cmd/sb/internal/demodb/db_objects.vdl

SyncQL 101

The Basics

SyncQL looks a lot like SQL. Each table in a syncbase database looks like a table with exactly two columns: k and v:

  • k

    • the key portion of key/value pairs in the table

    • always of type string

  • v

    • the value portion of key/value pairs in the table

    • always of type vdl.Value

vdl

A vdl.Value can represent the following types:

A Simple Query

The Customers table stores values of type Customer and of type Invoice.

Let’s select all of the keys in the Customer table. Again, note the semicolon. The semicolon is NOT part of the query, but sb uses it as a marker for end of statement.

? select k from Customers;
+--------+
|      k |
+--------+
| 001    |
| 001001 |
| 001002 |
| 001003 |
| 002    |
| 002001 |
| 002002 |
| 002003 |
| 002004 |
+--------+

Checker Time Errors

Let’s do the above query, but use Customer as the table name (i.e., forget to type the ‘s’ at the end):

? select k from Customer;
Error:
select k from Customer
          	^
15: Table Customer does not exist (or cannot be accessed): syncbased:"demoapp/demodb".Exec: Does not exist: $table:Customer.

The query can be fixed by up-arrowing and fixing it or by simply retyping it.

? select k from Customers;
+--------+
|      k |
+--------+
| 001    |
| 001001 |
| 001002 |
| 001003 |
| 002    |
| 002001 |
| 002002 |
| 002003 |
| 002004 |
+--------+

SyncQL will catch and report the following types of errors before attempting to execute:

  • malformed queries (e.g., forgetting the required from clause)

  • mistyped table names

  • wrong number of arguments to a function

  • wrong type of literal (as an argument to a function or to a like expression)

Unfortunately, mistyping field names will not be caught. This is because syncQL doesn’t know the fields of the values in the database. That’s because syncbase allows anything to be stored in values.

Drilling Into the ‘v’ Column

The Customer type has a Name field. Let’s ask for it by using dot notation:

? select k, v.Name from Customers;
+--------+---------------+
|      k |        v.Name |
+--------+---------------+
| 001    | John Smith    |
| 001001 |               |
| 001002 |               |
| 001003 |               |
| 002    | Bat Masterson |
| 002001 |               |
| 002002 |               |
| 002003 |               |
| 002004 |               |
+--------+---------------+

You will notice that only keys “001” and “002” have values for Name. Let’s see why that is.

Discovering Type of Values

Let’s find out the types of the values in the Customers table:

? select k, Type(v) from Customers;
+--------+--------------------------------------------+
|      k |                                       Type |
+--------+--------------------------------------------+
| 001    | v.io/x/ref/cmd/sb/internal/demodb.Customer |
| 001001 | v.io/x/ref/cmd/sb/internal/demodb.Invoice  |
| 001002 | v.io/x/ref/cmd/sb/internal/demodb.Invoice  |
| 001003 | v.io/x/ref/cmd/sb/internal/demodb.Invoice  |
| 002    | v.io/x/ref/cmd/sb/internal/demodb.Customer |
| 002001 | v.io/x/ref/cmd/sb/internal/demodb.Invoice  |
| 002002 | v.io/x/ref/cmd/sb/internal/demodb.Invoice  |
| 002003 | v.io/x/ref/cmd/sb/internal/demodb.Invoice  |
| 002004 | v.io/x/ref/cmd/sb/internal/demodb.Invoice  |
+--------+--------------------------------------------+

We now have an explanation for why the Name column is missing on some rows. The type is not Customer on these rows. It is Invoice and–it just so happens–Invoice has no field with the name of “Name”. Unresolved fields in the select clause return nil. Sb represents nil as empty in output.

Limiting Rows by Type

Let’s print the name of only rows where the value type is Customer.

? select v.Name from Customers where Type(v) = "v.io/x/ref/cmd/sb/internal/demodb.Customer";
+---------------+
|        v.Name |
+---------------+
| John Smith    |
| Bat Masterson |
+---------------+

Type() is a function available in syncQL. It returns a string representing a fully qualified type. It can be used in a where clause to limit the rows (i.e., the k/v pairs) matching a query.

Having to specify a fully qualified type is rarely necessary. Let’s use the like operator to write a simpler query:

? select v.Name from Customers where Type(v) like "%Customer";
+---------------+
|        v.Name |
+---------------+
| John Smith    |
| Bat Masterson |
+---------------+

The like operator takes a string value on the right-hand-side. In this value, ‘%’ matches 0 or more of any character. (Also, ‘_’ matches any single character.)

The Invoice Type

The Invoice type is defined as:

type Invoice struct {
    	CustId     int64
    	InvoiceNum int64
    	Amount     int64
    	ShipTo     AddressInfo
}

Let’s print CustId, InvoiceNum and Amount for values of type Invoice:

? select v.CustId, v.InvoiceNum, v.Amount from Customers where Type(v) like "%Invoice";
+----------+--------------+----------+
| v.CustId | v.InvoiceNum | v.Amount |
+----------+--------------+----------+
|        1 |         1000 |       42 |
|        1 |         1003 |        7 |
|        1 |         1005 |       88 |
|        2 |         1001 |      166 |
|        2 |         1002 |      243 |
|        2 |         1004 |      787 |
|        2 |         1006 |       88 |
+----------+--------------+----------+

The where clause can contain any number of comparison expressions joined with “and” or “or”. Parentheses are used to specify precedence. For example, let’s make a [nonsensical] query for all customer #1 invoices for an amount < 50 and all customer #2 invoices for an amount > 200:

? select v.CustId, v.InvoiceNum, v.Amount from Customers where Type(v) like "%Invoice" and ((v.CustId = 1 and v.Amount < 50) or (v.CustId = 2 and v.Amount > 200));
+----------+--------------+----------+
| v.CustId | v.InvoiceNum | v.Amount |
+----------+--------------+----------+
|        1 |         1000 |       42 |
|        1 |         1003 |        7 |
|        2 |         1002 |      243 |
|        2 |         1004 |      787 |
+----------+--------------+----------+

Unresolved Fields in the Where Clause

We’ve already seen that unresolved fields in the select clause return nil. Unresolved fields in expressions in the where clause result in the expression evaluating to false.

Let’s see this in action:

The Customer type is defined as:

type Customer struct {
    	Name    string
    	Id      int64
    	Active  bool
    	Address AddressInfo
    	Credit  CreditReport
}

Let’s select Customers with a Id > 0:

? select v.Name from Customers where v.Id > 0;
+---------------+
|        v.Name |
+---------------+
| John Smith    |
| Bat Masterson |
+---------------+

Since v.Id is not resolvable for Invoice records (which do not contain an Id field), the expression “v.Id > 0” returns false for Invoice rows. As such, the query excludes invoices and returns only the Customer rows.

SyncQL 201: A Closer Look

Select Clause

Let’s take a close look at what columns can be specified in the select clause.

A column may be of one of the following categories:

  • k – the key column

  • v – the value column

  • field – a specific field in the value

  • function – a function which takes zero or more arguments (k, v, field or function)

Let’s select a column from each category above:

? select k, v, v.ShipTo.City, Lowercase(v.ShipTo.City) from Customers where Type(v) like "%Invoice";
+--------+---------------------------------------------------------------------------------------------------------------------------+---------------+-----------+
|      k |                                                                                                                         v | v.ShipTo.City | Lowercase |
+--------+---------------------------------------------------------------------------------------------------------------------------+---------------+-----------+
| 001001 | {CustId: 1, InvoiceNum: 1000, Amount: 42, ShipTo: {Street: "1 Main St.", City: "Palo Alto", State: "CA", Zip: "94303"}}   | Palo Alto     | palo alto |
| 001002 | {CustId: 1, InvoiceNum: 1003, Amount: 7, ShipTo: {Street: "2 Main St.", City: "Palo Alto", State: "CA", Zip: "94303"}}    | Palo Alto     | palo alto |
| 001003 | {CustId: 1, InvoiceNum: 1005, Amount: 88, ShipTo: {Street: "3 Main St.", City: "Palo Alto", State: "CA", Zip: "94303"}}   | Palo Alto     | palo alto |
| 002001 | {CustId: 2, InvoiceNum: 1001, Amount: 166, ShipTo: {Street: "777 Any St.", City: "Collins", State: "IA", Zip: "50055"}}   | Collins       | collins   |
| 002002 | {CustId: 2, InvoiceNum: 1002, Amount: 243, ShipTo: {Street: "888 Any St.", City: "Collins", State: "IA", Zip: "50055"}}   | Collins       | collins   |
| 002003 | {CustId: 2, InvoiceNum: 1004, Amount: 787, ShipTo: {Street: "999 Any St.", City: "Collins", State: "IA", Zip: "50055"}}   | Collins       | collins   |
| 002004 | {CustId: 2, InvoiceNum: 1006, Amount: 88, ShipTo: {Street: "101010 Any St.", City: "Collins", State: "IA", Zip: "50055"}} | Collins       | collins   |
+--------+---------------------------------------------------------------------------------------------------------------------------+---------------+-----------+

Depending on how wide your window is, the above mayh be a mess to look at. Selecting v in a program is often useful, but selecting such an aggregate from sb is less so. Let’s do that query again without selecting v as a whole:

? select k, v.ShipTo.City, Lowercase(v.ShipTo.City) from Customers where Type(v) like "%Invoice";
+--------+---------------+-----------+
|      k | v.ShipTo.City | Lowercase |
+--------+---------------+-----------+
| 001001 | Palo Alto     | palo alto |
| 001002 | Palo Alto     | palo alto |
| 001003 | Palo Alto     | palo alto |
| 002001 | Collins       | collins   |
| 002002 | Collins       | collins   |
| 002003 | Collins       | collins   |
| 002004 | Collins       | collins   |
+--------+---------------+-----------+

v.ShipTo.City is interesting because it reaches into a nested struct. In this case, ShipTo is a field in type Invoice. ShipTo is of type AddressInfo, which is defined as:

type AddressInfo struct {
    	Street string
    	City   string
    	State  string
    	Zip    string
}

v.ShipTo resolves to an instance of AddressInfo. v.ShipTo.City resolves to an instance of string since City is a string field in AddressInfo.

The Lowercase function takes a single string argument and simply returns a lowercase version of the argument.

Details on Value Field Specifications

Up until now, we’ve used dot notation to specify fields within a struct (e.g., v.Name). Let’s look at the full picture.

As mentioned before, values in syncbase are vdl.Values. A vdl.Value can represent any of the following types:

Primitives

The following vdl types are primitives and cannot be drilled into further:

  • Bool

  • Byte

  • Complex64

  • Complex128

  • Enum

  • Float32

  • Float64

  • Int16

  • Int32

  • Int64

  • String

  • Time (not a vdl primitive, but treated like primitives in syncQL)

  • TypeObject

  • Uint16

  • Uint32

  • Uint64