顧客のポイントから評価ごとの人数の推移を抽出する

最近職場で過去2年間の顧客データを抽出する案件があり,自分でも勉強になったので備忘録として記録しておきます.

顧客が宿泊し,宿泊中の顧客のポイントを定期的に記録しています.ポイントは0点から10点までで,4点と7点が評価の境界値となっています.評価ごとの人数の推移を抽出したいというのが今回の案件でした.

その際に以下の条件がつきました.

  1. ポイントは始めの記録日から次のポイント記録日前日まで同じポイントが継続すると見做すこと.
  2. 原則入所日に最初のポイント記録をすることになっていますが,入所日に記録ができなかった場合入所日と最初の記録日の間のポイントは最初のポイントで代用すること.
  3. 退所日はポイントを0点とみなすこと.


SET NOCOUNT ON;
GO
IF OBJECT_ID('T_Admission') IS NOT NULL DROP TABLE dbo.T_Admission;
IF OBJECT_ID('T_TESTSCORE') IS NOT NULL DROP TABLE dbo.T_TESTSCORE;
GO
CREATE TABLE T_Admission
( Customer nchar(8) NOT NULL,
Admdate datetime NOT NULL,
Disdate datetime NOT NULL
);
ALTER TABLE T_Admission
ADD CONSTRAINT PK_Admission PRIMARY KEY (Customer, Admdate);
GO
CREATE TABLE T_TESTSCORE
( Customer nchar(8) NOT NULL,
testdate datetime NOT NULL,
score DEC(3, 1) NOT NULL
);
ALTER TABLE T_TESTSCORE
ADD CONSTRAINT PK_TEST PRIMARY KEY (Customer, testdate);
GO
INSERT INTO T_Admission VALUES ('00000001', '2010/10/01', '2010/10/05');
INSERT INTO T_TESTSCORE VALUES ('00000001', '2010/10/01', 3.0);
INSERT INTO T_TESTSCORE VALUES ('00000001', '2010/10/04', 4.5);

INSERT INTO T_Admission VALUES ('00000001', '2010/10/10', '2010/10/15');
INSERT INTO T_TESTSCORE VALUES ('00000001', '2010/10/10', 3.0);

INSERT INTO T_Admission VALUES ('00000002', '2010/10/07', '2010/10/18');
INSERT INTO T_TESTSCORE VALUES ('00000002', '2010/10/07', 4.5);
INSERT INTO T_TESTSCORE VALUES ('00000002', '2010/10/10', 6.0);
INSERT INTO T_TESTSCORE VALUES ('00000002', '2010/10/15', 0.0);

INSERT INTO T_Admission VALUES ('00000002', '2010/10/20', '2010/10/25');
INSERT INTO T_TESTSCORE VALUES ('00000002', '2010/10/21', 3.0);

INSERT INTO T_Admission VALUES ('00000003', '2010/10/05', '2010/10/09');
INSERT INTO T_TESTSCORE VALUES ('00000003', '2010/10/05', 6.0);

INSERT INTO T_Admission VALUES ('00000003', '2010/10/11', '9999/12/31');
INSERT INTO T_TESTSCORE VALUES ('00000003', '2010/10/11', 7.0);
INSERT INTO T_TESTSCORE VALUES ('00000003', '2010/10/14', 7.0);
INSERT INTO T_TESTSCORE VALUES ('00000003', '2010/10/19', 9.0);
INSERT INTO T_TESTSCORE VALUES ('00000003', '2010/10/31', 10.0);

INSERT INTO T_Admission VALUES ('00000004', '2010/09/25', '2010/10/09');
INSERT INTO T_TESTSCORE VALUES ('00000004', '2010/09/25', 0.0);
INSERT INTO T_TESTSCORE VALUES ('00000004', '2010/10/01', 3.0);

INSERT INTO T_Admission VALUES ('00000005', '2010/09/10', '2010/10/19');
INSERT INTO T_TESTSCORE VALUES ('00000005', '2010/10/08', 4.5);
INSERT INTO T_TESTSCORE VALUES ('00000005', '2010/10/16', 0.0);

INSERT INTO T_Admission VALUES ('00000005', '2010/10/26', '2010/11/03');
INSERT INTO T_TESTSCORE VALUES ('00000005', '2010/10/26', 10.0);

INSERT INTO T_Admission VALUES ('00000006', '2010/10/04', '2010/10/10');
INSERT INTO T_TESTSCORE VALUES ('00000006', '2010/10/05', 6.0);
INSERT INTO T_Admission VALUES ('00000006', '2010/10/14', '2010/10/20');
INSERT INTO T_TESTSCORE VALUES ('00000006', '2010/10/15', 6.0);

INSERT INTO T_Admission VALUES ('00000007', '2010/09/08', '2010/10/12');
INSERT INTO T_TESTSCORE VALUES ('00000007', '2010/10/02', 0.0);
INSERT INTO T_TESTSCORE VALUES ('00000007', '2010/10/09', 0.0);
INSERT INTO T_Admission VALUES ('00000007', '2010/10/14', '2010/10/22');
INSERT INTO T_TESTSCORE VALUES ('00000007', '2010/10/14', 0.0);
INSERT INTO T_TESTSCORE VALUES ('00000007', '2010/10/20', 0.0);
INSERT INTO T_Admission VALUES ('00000007', '2010/10/25', '2010/11/03');
INSERT INTO T_TESTSCORE VALUES ('00000007', '2010/10/27', 0.0);

INSERT INTO T_Admission VALUES ('00000008', '2010/09/21', '2010/10/03');
INSERT INTO T_TESTSCORE VALUES ('00000008', '2010/10/01', 10.0);
INSERT INTO T_Admission VALUES ('00000008', '2010/10/06', '9999/12/31');
INSERT INTO T_TESTSCORE VALUES ('00000008', '2010/10/06', 9.0);
INSERT INTO T_TESTSCORE VALUES ('00000008', '2010/10/13', 10.0);
INSERT INTO T_TESTSCORE VALUES ('00000008', '2010/10/20', 9.0);
INSERT INTO T_TESTSCORE VALUES ('00000008', '2010/10/27', 10.0);

INSERT INTO T_Admission VALUES ('00000009', '2010/09/23', '2010/10/10');
INSERT INTO T_TESTSCORE VALUES ('00000009', '2010/10/02', 4.0);
INSERT INTO T_TESTSCORE VALUES ('00000009', '2010/10/09', 6.0);
INSERT INTO T_Admission VALUES ('00000009', '2010/10/16', '2010/10/25');
INSERT INTO T_TESTSCORE VALUES ('00000009', '2010/10/16', 6.0);
INSERT INTO T_TESTSCORE VALUES ('00000009', '2010/10/23', 7.0);
INSERT INTO T_Admission VALUES ('00000009', '2010/10/29', '2010/11/05');
INSERT INTO T_TESTSCORE VALUES ('00000009', '2010/10/29', 9.0);

INSERT INTO T_Admission VALUES ('00000010', '2010/10/08', '2010/10/12');
INSERT INTO T_TESTSCORE VALUES ('00000010', '2010/10/09', 0.0);
INSERT INTO T_Admission VALUES ('00000010', '2010/10/18', '2010/10/25');
INSERT INTO T_TESTSCORE VALUES ('00000010', '2010/10/18', 3.0);
INSERT INTO T_TESTSCORE VALUES ('00000010', '2010/10/24', 0.0);
INSERT INTO T_Admission VALUES ('00000010', '2010/10/25', '9999/12/31');
INSERT INTO T_TESTSCORE VALUES ('00000010', '2010/10/27', 0.0);

GO
WITH CTE2 (YMD) AS
(SELECT CAST('2010/10/01' AS DATETIME) AS YMD
UNION ALL
SELECT DATEADD(day, 1, YMD)
FROM CTE2
WHERE YMD < '2010/10/31'
)
SELECT YMD
INTO #CALENDAR
FROM CTE2
OPTION (MAXRECURSION 31);
GO

SELECT A.Customer
, A.Admdate
, A.Disdate
, T.testdate
, T.score
INTO #CTE1
FROM T_Admission AS A INNER JOIN T_TESTSCORE AS T
ON A.Customer = T.Customer
AND T.testdate >= A.Admdate
AND T.testdate < A.Disdate;

WITH MINMAXDATE (Customer, admdate, mindate, maxdate) AS
(SELECT Customer
, Admdate
, MIN(testdate)
, MAX(testdate)
FROM #CTE1
GROUP BY Customer, Admdate
),
CTE3 (Customer, Admdate, testdate, UBound) AS
(SELECT C1.Customer
, C1.Admdate
, C1.testdate
, C2.testdate
FROM #CTE1 AS C1 INNER JOIN #CTE1 AS C2
ON C1.Customer = C2.Customer
AND C1.Admdate = C2.Admdate
AND C1.testdate < C2.testdate
WHERE C2.testdate = ( SELECT MIN(testdate)
FROM #CTE1 AS C3
WHERE C3.Customer = C1.Customer
AND C3.Admdate = C1.Admdate
AND C3.testdate > C1.testdate)
),
CTE4 (Customer, Admdate, Disdate, testdate, startdate, enddate, score) AS
(SELECT #CTE1.Customer
, #CTE1.Admdate
, CASE WHEN #CTE1.Disdate = '9999/12/31' THEN GETDATE() ELSE #CTE1.Disdate END
, #CTE1.testdate
, CASE WHEN #CTE1.testdate = MINMAXDATE.mindate
AND #CTE1.Admdate < MINMAXDATE.mindate
THEN #CTE1.Admdate
ELSE #CTE1.testdate END
, CASE WHEN #CTE1.testdate = MINMAXDATE.maxdate
THEN #CTE1.Disdate
ELSE CTE3.UBound END
, #CTE1.score
FROM #CTE1 INNER JOIN MINMAXDATE
ON #CTE1.Customer = MINMAXDATE.Customer
AND #CTE1.Admdate = MINMAXDATE.admdate
LEFT OUTER JOIN CTE3
ON #CTE1.Customer = CTE3.Customer
AND #CTE1.Admdate = CTE3.Admdate
AND #CTE1.testdate = CTE3.testdate
),
CTE5 (YMD, Y, M, Low, Mid, High) AS
(SELECT #CALENDAR.YMD
, MAX(YEAR(#CALENDAR.YMD))
, MAX(MONTH(#CALENDAR.YMD))
, SUM(CASE WHEN CTE4.score >= 0 AND CTE4.score < 4 THEN 1 ELSE 0 END)
, SUM(CASE WHEN CTE4.score >= 4 AND CTE4.score < 7 THEN 1 ELSE 0 END)
, SUM(CASE WHEN CTE4.score >= 7 AND CTE4.score <= 10 THEN 1 ELSE 0 END)
FROM #CALENDAR INNER JOIN CTE4
ON #CALENDAR.YMD >= CTE4.startdate
AND #CALENDAR.YMD < CTE4.enddate
GROUP BY #CALENDAR.YMD
)
SELECT * FROM CTE5;

T_Admissionテーブルは顧客の宿泊を記録し,T_TESTSCOREテーブルは顧客のポイントを記録しています.
CTE2では再帰クエリで2010年10月のカレンダーを作成し一時テーブル#CALENDARに格納しています.
T_AdmissionテーブルとT_TESTSCOREテーブルを結合し,入所中の顧客のポイントを記録日と共に一時テーブル#CTE1に格納します.
MINMAXDATEで最初と最後の記録日を抽出しています.
CTE3では条件1に留意してポイント開始日,ポイント終了日を抽出しています.ここで行間比較により最小上界を求めています.
CTE4では条件2と条件3をCASE式の条件に投入しています.左外部結合を使用するのは該当レコードがない場合でも0件として抽出するためです.
CTE5ではCASE式による特性関数をSUM関数でラップして評価ごとの人数の推移を集計しています.余談ですが,EXCELの論理式をSUMPRODUCT関数の引数にする方法に似ていると思います.

YMD Y M Low Mid High
2010-10-01 00:00:00.000 2010 10 3 2 1
2010-10-02 00:00:00.000 2010 10 3 2 1
2010-10-03 00:00:00.000 2010 10 3 2 0
2010-10-04 00:00:00.000 2010 10 2 4 0
2010-10-05 00:00:00.000 2010 10 2 4 0
2010-10-06 00:00:00.000 2010 10 2 4 1
2010-10-07 00:00:00.000 2010 10 2 5 1
2010-10-08 00:00:00.000 2010 10 3 5 1
2010-10-09 00:00:00.000 2010 10 2 4 1
2010-10-10 00:00:00.000 2010 10 3 2 1
2010-10-11 00:00:00.000 2010 10 3 2 2
2010-10-12 00:00:00.000 2010 10 1 2 2
2010-10-13 00:00:00.000 2010 10 1 2 2
2010-10-14 00:00:00.000 2010 10 2 3 2
2010-10-15 00:00:00.000 2010 10 2 2 2
2010-10-16 00:00:00.000 2010 10 3 2 2
2010-10-17 00:00:00.000 2010 10 3 2 2
2010-10-18 00:00:00.000 2010 10 3 2 2
2010-10-19 00:00:00.000 2010 10 2 2 2
2010-10-20 00:00:00.000 2010 10 3 1 2
2010-10-21 00:00:00.000 2010 10 3 1 2
2010-10-22 00:00:00.000 2010 10 2 1 2
2010-10-23 00:00:00.000 2010 10 2 0 3
2010-10-24 00:00:00.000 2010 10 2 0 3
2010-10-25 00:00:00.000 2010 10 2 0 2
2010-10-26 00:00:00.000 2010 10 2 0 3
2010-10-27 00:00:00.000 2010 10 2 0 3
2010-10-28 00:00:00.000 2010 10 2 0 3
2010-10-29 00:00:00.000 2010 10 2 0 4
2010-10-30 00:00:00.000 2010 10 2 0 4
2010-10-31 00:00:00.000 2010 10 2 0 4