2017-01-14 6 views
2

данные, получаемые из реализации BigQuery из GoogleAnalytics raw data выглядит следующим образом:Фильтрация по нескольким customDimensions, а затем агрегирование

|-visitId 
|- date 
|- (....) 
+- hits 
    |- time 
    |- page 
     |- pagePath 
    |- eventInfo 
     |- eventAction 
    +- customDimensions 
     |- index 
     |- value 

Я ищу, чтобы захватить 3 значения из повторяющихся customDimensions как

+---------+---------+-------+-----------+---------------+ 
| user_id | country | split | page Hits | CTA event hit | 
+---------+---------+-------+-----------+---------------+ 
| 100  | US  | A  | 25000  | 500   | 
+---------+---------+-------+-----------+---------------+ 
| 100  | US  | B  | 8000  | 90   | 
+---------+---------+-------+-----------+---------------+ 
| 200  | ES  | A  | 400  | 2    | 
+---------+---------+-------+-----------+---------------+ 

первые три столбца определены hits.customDimensions.index 1,4,7.

page hit SUM - количество просмотров, которое было сделано, CTA event hit - это сумма события, которое уволено, если они нажмут кнопку на самой странице. Ради простоты SQL мы можем назвать hits.page.pagePath='tshirt' и hits.eventInfo.eventAction='upsell'

У меня возникли трудности с чтением 3 customDimensions из одного и того же поля, но мне трудно найти события, которые произошли в том же сеансе.

Update для тех, кто не знаком с BQ набора данных

В изображении ниже каждая строка является хитом, и несколько хитов могут быть на той же строке. В BigQuery, который называется REPEATED поле. На изображении вы видите 3 более высокие строки. Первая строка имеет 8 ударов. Изображение не содержит нескольких customDimensions, но может быть кратным для одного и того же попадания. Для доступа к образцу DB, установленному на BigQuery read here, он бесплатный.

enter image description here

+0

является то, что данные образца или ожидаемый результат вы показали? –

+0

ожидаемый результат – Pentium10

+0

Вы можете показать некоторые данные образца, пожалуйста? –

ответ

2

Прежде чем ответить, я хотел бы показать фиктивные данные, которые я использовал в качестве ориентира, чтобы придумать решение, мы надеюсь, это будет полезно:

WITH mock_data AS(
select '0' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/home' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '0' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/randompage' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '0' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(1 as index, '000' as value), STRUCT(4 as index, 'US' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '0' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(3 as hitnumber, [STRUCT(1 as index, '000' as value), STRUCT(4 as index, 'US' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 

select '1' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/home' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '1' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/randompage' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '1' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(1 as index, '100' as value), STRUCT(4 as index, 'US' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '1' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(3 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('specific_category' as eventcategory, 'specific_label' as eventlabel, 'upsell' as eventaction) as eventinfo)] hits union all 
select '1' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(4 as hitnumber, [STRUCT(1 as index, '100' as value), STRUCT(4 as index, 'US' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 

select '2' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/home' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '2' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/randompage' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '2' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('specific_category' as eventcategory, 'specific_label' as eventlabel, 'upsell' as eventaction) as eventinfo)] hits union all 

select '3' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/home' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '3' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/randompage' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '3' fullvisitorid, 1 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(1 as index, '300' as value), STRUCT(4 as index, 'US' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '3' fullvisitorid, 1 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(3 as hitnumber, [STRUCT(1 as index, '300' as value), STRUCT(4 as index, 'US' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 

select '4' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/home' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '4' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('/randompage' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '4' fullvisitorid, 0 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(1 as index, '400' as value), STRUCT(4 as index, 'BR' as value), STRUCT(7 as index, 'B' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '4' fullvisitorid, 1 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('specific_category' as eventcategory, 'specific_label' as eventlabel, 'upsell' as eventaction) as eventinfo)] hits union all 
select '4' fullvisitorid, 1 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(1 as index, '400' as value), STRUCT(4 as index, 'BR' as value), STRUCT(7 as index, 'B' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 

select '4' fullvisitorid, 2 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(1 as index, '400' as value), STRUCT(4 as index, 'BR' as value), STRUCT(7 as index, 'B' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '4' fullvisitorid, 2 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('specific_category' as eventcategory, 'specific_label' as eventlabel, 'upsell' as eventaction) as eventinfo)] hits union all 
select '4' fullvisitorid, 2 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('specific_category' as eventcategory, 'specific_label' as eventlabel, 'upsell' as eventaction) as eventinfo)] hits union all 
select '4' fullvisitorid, 3 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(0 as hitnumber, [STRUCT(1 as index, '400' as value), STRUCT(4 as index, 'BR' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits union all 
select '4' fullvisitorid, 3 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(1 as hitnumber, [STRUCT(0 as index, '' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('specific_category' as eventcategory, 'specific_label' as eventlabel, 'upsell' as eventaction) as eventinfo)] hits union all 
select '4' fullvisitorid, 3 visitid, ARRAY<STRUCT< hitnumber INT64, customdimensions ARRAY<STRUCT<index INT64, value STRING>>, page STRUCT<pagepath STRING>, eventinfo STRUCT<eventcategory STRING, eventlabel STRING, eventaction STRING> >> [STRUCT(2 as hitnumber, [STRUCT(1 as index, '400' as value), STRUCT(4 as index, 'BR' as value), STRUCT(7 as index, 'A' as value)] as customdimensions, STRUCT('tshirt' as pagepath) as page, STRUCT('' as eventcategory, '' as eventlabel, '' as eventaction) as eventinfo)] hits 
) 

Я моделируется 4 различных пользователей посещение на веб-сайте, используя ту же схему, что и в таблице BigQuery ga_sessions.

Некоторые из моих предположений могут немного отличаться от ваших фактических данных. Если это так, сообщите мне, и мы можем адаптировать данные макета в качестве руководства для более точных ответов (я действительно использую эти макеты для запуска интеграционных тестов в нашей производственной среде, чтобы они могли быть полезными).

Предположения я сделал (поправьте меня, если я ошибаюсь):

  1. customDimensions увольняют только тогда, когда hits.page.pagepath=tshirt
  2. Они всегда увольняют. То есть каждое посещение страницы «tshirt» соответствует запуску таможни.
  3. Когда eventAction нажмите случается, customDimensions не стреляли в то же время (то есть, событие обожженного в одном hitNumber и настраиваемых события в другой).

Это может дать ожидаемый результат:

select 
    user_id, 
    country, 
    _split, 
    sum(page_hits) page_hits, 
    sum(CTA_event_hit) CTA_event_hit 
from(
select 
    max(user_id) user_id, 
    max(country) country, 
    max(_split) _split, 
    max(page_hits) page_hits, 
    max(CTA_event_hit) CTA_event_hit 
from(
select 
    fv, 
    v, 
    user_id, 
    country, 
    _split, 
    count(case when user_id is not null then 1 end) page_hits, 
    sum(click_flag) CTA_event_hit 
from(
select 
    fullvisitorid fv, 
    visitid v, 
    (select custd.value from unnest(hits.customdimensions) custd where custd.index = 1) user_id, 
    (select custd.value from unnest(hits.customdimensions) custd where custd.index = 4) country, 
    (select custd.value from unnest(hits.customdimensions) custd where custd.index = 7) _split, 
    case when hits.eventinfo.eventcategory = 'specific_category' and hits.eventinfo.eventlabel = 'specific_label' and hits.eventinfo.eventaction = 'upsell' then 1 end click_flag 
from mock_data, 
unnest(hits) hits 
where 1 = 1 
    and hits.page.pagepath = 'tshirt' 
) 
group by fv, v, user_id, country, _split 
) 
group by fv, v 
having user_id is not null 
) 
group by user_id, country, _split 

Это приводит к:

enter image description here

В принципе, некоторые subselect запросов для получения user_id, страны и split. Для каждой сессии (visitid) данные агрегируются с использованием оператора MAX и, наконец, на уровне user_id, country и split конечная агрегация.

В другом запросе на вашем наборе данных вам просто нужно изменить mock_data корреспонденту ga_session желаемый стол.

Не уверен, что это решит вашу проблему, но может помочь.

Как последнее замечание, похоже, что эти данные являются настройкой, возможно, для теста AB или некоторых анализов характеристик для разных вариантов вашего сайта. В этом случае я бы не рекомендовал пользователям изменять значение split, так как это может привести к некоторому отравлению данных (которые могут исказить результаты).

+0

в значительной степени это полный результат, мне просто нужно было настроить имена столбцов. можете ли вы объяснить 'from mock_data, unsest (hits) hits' построить какую логику с этим – Pentium10

+0

рад, что это сработало :)! Операция 'unsest' здесь используется для выравнивания результатов: https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#flattening-arrays. Это в основном способ «открытия» значений внутри массива и возможность запросить их в качестве примера: '[{fullvisitorid: 1, visitid: 1, hits: [{hitNumber: 0, type: ' PAGE '}, {hitsNumber: 1, type:' PAGE '}]] 'становится: ' fullvisitorid: 1, visitid: 1, hits.hitNumber: 0, hits.type:' PAGE'' 'fullvisitorid: 1, visitid: 1, hits.hitNumber: 1, hits.type: 'PAGE'.' Как вы можете видеть, он повторяет значения и открывает массив –

+0

Я знаю, что такое гнездо, но операция запятой - это перекрестное соединение? и зачем это нужно? – Pentium10

1

Чтобы убедиться, что я понял проблему, я только предоставить здесь решение для вычисления пользовательских столбцов и страницы хитов метрики, но (пока) не для попадания события CTA. Использование образца GA таблицы и стандартный SQL, это может выглядеть примерно так:

SELECT 
    ARRAY(SELECT AS STRUCT c.product, c.color, 1 page_hits 
    FROM t.hits hit CROSS JOIN 
     UNNEST(ARRAY(
     SELECT DISTINCT AS STRUCT 
      if(dim.index = 1, dim.value, NULL) product, 
      if(dim.index = 2, dim.value, NULL) color 
     FROM hit.customDimensions dim 
     WHERE dim.index in (1,2))) c 
) 
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910` t 

В основном, во внутреннем ВЫБРАТЬ мы преобразуем customDimensions.index в отдельные столбцы (и цвет продукта в данном примере), а затем наружный SELECT готовится подсчитать их, установив page_hits на 1 для каждого попадания.

+0

спасибо, что это действительно помогло – Pentium10

Смежные вопросы