今天,介绍一个快速入门 FusionDB 的一个GitHub工程,使用 FQL 实现跨不同数据源的联邦查询功能。现提供 Docker 版的 FusionDB,安装有 Docker 的机器都可以快速体验 FusionDB 的功能。

FusionDB FQL Training

This repository provides a training for FusionDB FQL.

In this training you will learn to:

  • run SQL queries on FusionDB
  • run SQL++ queries on FusionDB
  • use FusionDB’s SQL JDBC Server
  • write the result of SQL queries to RDBMS (MySQL、PostgreSQL、Oracle)、S3、ADLS and HDFS

Requirements

  • Remote HDFS or RDBMS (MySQL、PostgreSQL、Oracle)/S3/ADLS/GCP/OSS etc.
  • Optional: Jupyter Notebook、PSequel

You only need Docker to run this training.

You don’t need Java, Scala, or an IDE.

For more information, please refer to FusionDB Document

Quickstart

Fusiondb is a simple and powerful federated database engine.

  • Start FusionDB
1
docker run --name fdb -p 54322:54322 -itd fusiondb/fusiondb:0.1.0-beta
  • Check FusionDB server
1
2
3
docker ps -a|grep fdb

lsof -i :54322
  • Psycopg2 connecting FusionDB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import psycopg2
import pandas as pd
connection = psycopg2.connect("host=localhost port=54322 dbname=default user=fdb sslmode=disable")
df = pd.read_sql(sql="SELECT * FROM VALUES (1, 1), (1, 2) AS t(a, b);", con=connection)
df

## Load mysql table

df =pd.read_sql(sql="load 'mysql' options('url'='jdbc:mysql://localhost:53306/fdb_test','dbtable'='person','user'= 'root','password'='root') AS mysql_t2;", con=connection)
df =pd.read_sql(sql="SELECT * FROM mysql_t2;", con=connection)
df.head()

## Load Postgres table

df =pd.read_sql(sql="load 'postgresql' options('url'='jdbc:postgresql://localhost:15430/fdb','dbtable'='person','user'= 'fdb','password'='fdb123') AS gp_t1;", con=connection)
df =pd.read_sql(sql="SELECT * FROM gp_t1;", con=connection)
df.head()


## MySQL Table Join PostgreSQL Table

df =pd.read_sql(sql="CREATE table test as SELECT mysql_t2.* FROM mysql_t2 LEFT JOIN gp_t1 ON mysql_t2.id = gp_t1.id;", con=connection)
df =pd.read_sql(sql="SELECT * FROM test;", con=connection)
df.head()

## Load oracle table

df =pd.read_sql(sql="load oracle options('url'='jdbc:oracle:thin:SYSTEM/oracle@//localhost:49161/xe','dbtable'='FDBTEST20','user'= 'SYSTEM','password'='oracle') AS ora_t1;", con=connection)
df =pd.read_sql(sql="SELECT * FROM ora_t1 limit 10;", con=connection)
df.head()


## Load HDFS parquet

df =pd.read_sql(sql="load 'hdfs://jdp-1:8020/tmp/spark-tpcds-data/web_site' format parquet AS web_site;", con=connection)
df =pd.read_sql(sql="SELECT * FROM web_site limit 10;", con=connection)
df.head()

## Save table to hdfs

df =pd.read_sql(sql="save overwrite web_site TO 'hdfs://jdp-1:8020/tmp/web_site_test' FORMAT parquet;", con=connection)
df =pd.read_sql(sql="load 'hdfs://jpd-1:8020/tmp/web_site_test' format parquet AS mysql_t2_par;", con=connection)
df =pd.read_sql(sql="SELECT * FROM mysql_t2_par limit 10;", con=connection)
df.head()

For more information, please refer to FusionDB Document