1 Problem


Suppose you have two tables, one called ‘orders’, and one called ‘revenue’, as shown below:

  • Orders table:
    Order Id Channel Date Month
    1 Online 2018-09-01 September
    2 Online 2018-09-03 September
    3 In_Store 2018-10-11 October
    4 In_Store 2018-08-21 August
    5 Online 2018-08-13 August
    6 Online 2018-10-29 October
  • Revenue table:
    Order Id Revenue
    1 100
    2 125
    3 200
    4 80
    5 200
    6 100

Using SQL, write a query to show the total revenue by channel for the months of September and October.

2 DB in PostgreSQL


Connecting a database in PostgreSQL to R.

  1. Libraries:
library(DBI)
library(RPostgres)
  1. Connection:
question_61_db <- dbConnect(drv = Postgres(), 
                            dbname = "InterviewQs", 
                            host = "localhost", 
                            user = "postgres", 
                            password = Sys.getenv("PostgreSQL_Password"), 
                            bigint = "integer")
  1. Connecting test:
# Set schema
dbGetQuery(conn = question_61_db, 
           "SET SEARCH_PATH = question_61")
## data frame with 0 columns and 0 rows
# Tables in data base
dbListTables(conn = question_61_db)
## [1] "orders"  "revenue"
# Fields in a table
dbListFields(conn = question_61_db, 
             name = "orders")
## [1] "order_id" "channel"  "date"     "month"
# A simple query
dbGetQuery(conn = question_61_db, 
           "SELECT *
            FROM orders;")
##   order_id  channel       date     month
## 1        1   Online 2018-09-01 September
## 2        2   Online 2018-09-03 September
## 3        3 In_Store 2018-10-11   October
## 4        4 In_Store 2018-08-21    August
## 5        5   Online 2018-08-13    August
## 6        6   Online 2018-10-29   October

3 Solving the Problem


3.1 In SQL

SELECT channel, 
       month,
       SUM(revenue) AS total_revunue
FROM orders
INNER JOIN revenue
  USING(order_id)
WHERE month !~ 'August'
GROUP BY channel, month
ORDER BY channel, month;
3 records
channel month total_revunue
In_Store October 200
Online October 100
Online September 225

3.2 In R

library(dtplyr) # Database manipulation by data.table engine
library(tidyverse)
answer_ques61 <- 
tbl(src = question_61_db, 
    "orders") %>% 
  lazy_dt() %>% 
  inner_join(y = tbl(src = question_61_db, 
                     "revenue") %>% 
               lazy_dt(), 
             by = "order_id") %>% 
  filter(month != "August") %>% 
  group_by(channel, month) %>% 
  summarize(total_revenue = sum(revenue)) %>% 
  ungroup() %>% 
  arrange(month, channel) %>% 
  collect()
Channel Month Total Revenue
In_Store October 200
Online October 100
Online September 225

3.3 In Python

Data:

import pandas as pd

# Importing R object to Python
orders_py = r.orders
revenue_py = r.revenue

answer_ques61_py = pd.merge(left = orders_py, 
                            right = revenue_py, 
                            how = "inner", 
                            on = "order_id")
answer_py = answer_ques61_py \
              .query("month != 'August'") \
              .groupby(by = ['channel', 'month']) \
              .agg(total_revenue = ('revenue', 'sum'))

print(answer_py)
##                     total_revenue
## channel  month                   
## In_Store October            200.0
## Online   October            100.0
##          September          225.0