Suppose you have two tables, one called ‘orders’, and one called ‘revenue’, as shown below:
Order Id | Channel | Date | Month |
---|---|---|---|
1 | Online | 2018-09-01 | September |
2 | Online | 2018-09-03 | September |
3 | In_Store | 2018-10-11 | October |
4 | In_Store | 2018-08-21 | August |
5 | Online | 2018-08-13 | August |
6 | Online | 2018-10-29 | October |
Order Id | Revenue |
---|---|
1 | 100 |
2 | 125 |
3 | 200 |
4 | 80 |
5 | 200 |
6 | 100 |
Using SQL, write a query to show the total revenue by channel for the months of September and October.
Connecting a database in PostgreSQL to R.
library(DBI)
library(RPostgres)
<- dbConnect(drv = Postgres(),
question_61_db dbname = "InterviewQs",
host = "localhost",
user = "postgres",
password = Sys.getenv("PostgreSQL_Password"),
bigint = "integer")
# Set schema
dbGetQuery(conn = question_61_db,
"SET SEARCH_PATH = question_61")
## data frame with 0 columns and 0 rows
# Tables in data base
dbListTables(conn = question_61_db)
## [1] "orders" "revenue"
# Fields in a table
dbListFields(conn = question_61_db,
name = "orders")
## [1] "order_id" "channel" "date" "month"
# A simple query
dbGetQuery(conn = question_61_db,
"SELECT *
FROM orders;")
## order_id channel date month
## 1 1 Online 2018-09-01 September
## 2 2 Online 2018-09-03 September
## 3 3 In_Store 2018-10-11 October
## 4 4 In_Store 2018-08-21 August
## 5 5 Online 2018-08-13 August
## 6 6 Online 2018-10-29 October
SELECT channel,
month,
SUM(revenue) AS total_revunue
FROM orders
INNER JOIN revenue
USING(order_id)
WHERE month !~ 'August'
GROUP BY channel, month
ORDER BY channel, month;
channel | month | total_revunue |
---|---|---|
In_Store | October | 200 |
Online | October | 100 |
Online | September | 225 |
library(dtplyr) # Database manipulation by data.table engine
library(tidyverse)
<-
answer_ques61 tbl(src = question_61_db,
"orders") %>%
lazy_dt() %>%
inner_join(y = tbl(src = question_61_db,
"revenue") %>%
lazy_dt(),
by = "order_id") %>%
filter(month != "August") %>%
group_by(channel, month) %>%
summarize(total_revenue = sum(revenue)) %>%
ungroup() %>%
arrange(month, channel) %>%
collect()
Channel | Month | Total Revenue |
---|---|---|
In_Store | October | 200 |
Online | October | 100 |
Online | September | 225 |
Data:
import pandas as pd
# Importing R object to Python
= r.orders
orders_py = r.revenue
revenue_py
= pd.merge(left = orders_py,
answer_ques61_py = revenue_py,
right = "inner",
how = "order_id") on
= answer_ques61_py \
answer_py "month != 'August'") \
.query(= ['channel', 'month']) \
.groupby(by = ('revenue', 'sum'))
.agg(total_revenue
print(answer_py)
## total_revenue
## channel month
## In_Store October 200.0
## Online October 100.0
## September 225.0