The Node

Thoughts of a bytelander

18 Aug 2020

Hyper and In-Memory Databases

AWS has made it easy to get the latest and greatest hardware. You can easily get machines O(TB) of DRAM for a few dollars per hour. r5.16x large instance for example has the following specs:

r5.16xlarge	64vCPU	256ECU	512 GiB	EBS Only	$4.032 per Hour

TPC-H benchmark is a standard dataset is a popular OLAP benchmark. Scale factor 1000 of TPC-H benchmark generates 1TB of data which after removing ununsed columns and dictionary compression come to nearly 280GB of data. This dataset has 6 billion rows ! This dataset can easily be operated on using an in-memory database — one of the best in-memory databases is Hyper. Hyper can operate at memory-bandwidth speed (~150GBps on a Xeon today).

Hyper could processs queries on this dataset sub 2s for aggregation queries and sub 10s for more complex SQL queries. I find it surprising that today when most companies (all except big tech) likely have lesser than 6 billion rows and use things like Redshift / Snowflake to run their workloads. They could make their workloads like BI dashboards, ad-hoc analysis, etc much faster by simply using an in-memory database.

To try Hyper, download Tableau Server distribution and install it. You don’t need a license to run Hyper. Once installed:

# cd to Tableau install dir/packages/hyper.<ver>/

./hyperd --log-dir . -d db --init open --skip-license --no-password configure. 

./hyperd --log-dir . -d db --init open --skip-license --no-password run

# connect using psql
psql -h localhost -U root -p <port>

The instructions work on Tableau Server 20.3.