2016-07-05 3 views
1

Цель

У меня есть столбец «Состояние», который содержит разные символы. Один из персонажей - «Следующий». Я хочу обозначить 1-ю последовательность (последовательные значения) «Следующий» как «1-й», второй - «2-й» и т. Д. Ниже приведен пример данных:Как сгруппировать последовательность значений в столбце в r?

ДАТа

> dput(foo) 
structure(list(Vehicle.ID2 = c("3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588"), Frame.ID = 2110:2146, 
    State = c("Following", "Following", "Following", "Following", 
    "Following", "Following", "Following", "Following", "Following", 
    "Approaching-fastveh", "Approaching-fastveh", "Approaching-fastveh", 
    "Following", "Following", "Following", "Following", "Following", 
    "Following", "Following", "Following", "Following", "Following", 
    "Following", "Following", "Following", "Following", "Following", 
    "Following", "Following", "Following", "Approaching-fastveh", 
    "Approaching-fastveh", "Approaching-fastveh", "Approaching-fastveh", 
    "Approaching-fastveh", "Approaching-fastveh", "Approaching-fastveh" 
    )), .Names = c("Vehicle.ID2", "Frame.ID", "State"), class = c("tbl_df", 
"data.frame"), row.names = c(NA, -37L)) 

Желаемая выход

structure(list(Vehicle.ID2 = c("3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588", "3361-588", "3361-588", 
"3361-588", "3361-588", "3361-588", "3361-588"), Frame.ID = 2110:2146, 
    State = c("Following", "Following", "Following", "Following", 
    "Following", "Following", "Following", "Following", "Following", 
    "Approaching-fastveh", "Approaching-fastveh", "Approaching-fastveh", 
    "Following", "Following", "Following", "Following", "Following", 
    "Following", "Following", "Following", "Following", "Following", 
    "Following", "Following", "Following", "Following", "Following", 
    "Following", "Following", "Following", "Approaching-fastveh", 
    "Approaching-fastveh", "Approaching-fastveh", "Approaching-fastveh", 
    "Approaching-fastveh", "Approaching-fastveh", "Approaching-fastveh" 
    ), grp = c("1st", "1st", "1st", "1st", "1st", "1st", "1st", 
    "1st", "1st", ".", ".", ".", "2nd", "2nd", "2nd", "2nd", 
    "2nd", "2nd", "2nd", "2nd", "2nd", "2nd", "2nd", "2nd", "2nd", 
    "2nd", "2nd", "2nd", "2nd", "2nd", ".", ".", ".", ".", ".", 
    ".", ".")), .Names = c("Vehicle.ID2", "Frame.ID", "State", 
"grp"), row.names = c(NA, -37L), class = c("tbl_df", "data.frame" 
)) 

То, что я пытался

Я попытался с помощью rle, но она может рассчитывать только уникальные значения. Пожалуйста, помогите мне, как я могу решить эту проблему. Обратите внимание, что в исходных данных у меня есть несколько Vehicle.ID2, поэтому я предпочитаю использовать dplyr.

ответ

2

Мы можем использовать rle

inverse.rle(within.list(rle(foo$State=="Following"), { 
       values1 <- values 
       values1[values] <- seq_along(values[values]) 
       values1[!values] <- '.' 
       values <- values1})) 

Если есть разные 'Vehicle.ID2', мы можем использовать ave

with(foo, ave(State == "Following", Vehicle.ID2, FUN = function(x) { 
      inverse.rle(within.list(rle(x), { 
        values1 <- values 
        values1[values] <- seq_along(values[values]) 
        values1[!values] <- '.' 
        values <- values1 
       })) 
      })) 
#[1] "1" "1" "1" "1" "1" "1" "1" "1" "1" "." "." "." "2" "2" "2" "2" "2" 
#[19] "2" "2" "2" "2" "2" "2" "2" "2" "2" "2" "2" "2" "2" "." "." "." "." "." "." "." 
+1

Спасибо! Это быстро! Не знал об этом использовании 'inverse.rle' –

1

В качестве альтернативы @ akrun отвечают (я учусь о rle , спасибо!), вот dplyr -еское предложение:

foo %>% 
    mutate(
    grp = cumsum(State == "Following" & 
       State != lag(State, default="")), 
    grp = ifelse(State == "Following", grp, "-") 
) 
# Source: local data frame [37 x 4] 
# Vehicle.ID2 Frame.ID    State grp 
#   <chr> <int>    <chr> <chr> 
# 1  3361-588  2110   Following  1 
# 2  3361-588  2111   Following  1 
# 3  3361-588  2112   Following  1 
# 4  3361-588  2113   Following  1 
# 5  3361-588  2114   Following  1 
# ..   ...  ...     ... ... 
0

Вы можете сделать это, используя цикл, но это не так элегантно, как размещенные здесь решения rle и dplyr.

temp <- rep(0,37) 
label <- 1 
now_following <- 0 
for (i in 1:length(temp)) { 
    if (foo$State[i] == "Following") { 
    temp[i] <- label 
    now_following <- 1 
    } else { 
    temp[i] <- 0 
    if (now_following) { 
     now_following <- 0 
     label <- label + 1 
    } 
    } 
} 
df <- cbind(foo,temp) 

Результат:

> df 
    Vehicle.ID2 Frame.ID    State temp 
1  3361-588  2110   Following 1 
2  3361-588  2111   Following 1 
3  3361-588  2112   Following 1 
4  3361-588  2113   Following 1 
5  3361-588  2114   Following 1 
6  3361-588  2115   Following 1 
7  3361-588  2116   Following 1 
8  3361-588  2117   Following 1 
9  3361-588  2118   Following 1 
10 3361-588  2119 Approaching-fastveh 0 
11 3361-588  2120 Approaching-fastveh 0 
12 3361-588  2121 Approaching-fastveh 0 
13 3361-588  2122   Following 2 
14 3361-588  2123   Following 2 
15 3361-588  2124   Following 2 
16 3361-588  2125   Following 2 
17 3361-588  2126   Following 2 
18 3361-588  2127   Following 2 
19 3361-588  2128   Following 2 
20 3361-588  2129   Following 2 
21 3361-588  2130   Following 2 
22 3361-588  2131   Following 2 
23 3361-588  2132   Following 2 
24 3361-588  2133   Following 2 
25 3361-588  2134   Following 2 
26 3361-588  2135   Following 2 
27 3361-588  2136   Following 2 
28 3361-588  2137   Following 2 
29 3361-588  2138   Following 2 
30 3361-588  2139   Following 2 
31 3361-588  2140 Approaching-fastveh 0 
32 3361-588  2141 Approaching-fastveh 0 
33 3361-588  2142 Approaching-fastveh 0 
34 3361-588  2143 Approaching-fastveh 0 
35 3361-588  2144 Approaching-fastveh 0 
36 3361-588  2145 Approaching-fastveh 0 
37 3361-588  2146 Approaching-fastveh 0 

Если вы хотите, вы можете упаковать его в функцию:

label_contiguous_strings <- function(string_vector, string) { 
    # Function returns a vector of labels for the string vector input. 
    len_vector <- length(string_vector) 
    temp <- rep(0,len_vector) 
    label <- 1 
    now_following <- 0 
    for (i in 1:len_vector) { 
    if (string_vector[i] == "Following") { 
     temp[i] <- label 
     now_following <- 1 
    } else { 
     temp[i] <- 0 
     if (now_following) { 
     now_following <- 0 
     label <- label + 1 
     } 
    } 
    } 
    return(temp) 
} 

Результат:

> label_contiguous_strings(foo$State) 
[1] 1 1 1 1 1 1 1 1 1 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 
1

Вот является второй метод используя rle:

# get rle 
myRle <- rle(df$State) 
# get categories 
counts <- cumsum(myRle$values == "Following") 
# add in missings 
is.na(counts) <- myRle$values != "Following" 
# build variable 
df$grp <- rep(counts, myRle$lengths) 
Смежные вопросы