2017-01-11 4 views
0

У меня есть следующий набор данных, который я создал, чтобы я мог реплицировать мою проблему. У меня есть Module/Filename, которые являются дубликатами.Удалите дублированные значения в одном столбце и возвращайте новое значение в другом столбце

owaspSample <- data.frame(Module=c("AccessDetails.java","AccessDiverse.java","BgField.java","BgStatus.java","CmdDate.java","CmdGameDate.java","CommentDate.java","CostDate.java","EntranceDetails.java","GameDate.java","LdPopDate.java","LeaseCostDate.java","PastApprovalDate.java","ProvisioningDate.java","ReservationDate.java","RefDate.java","ServiceDate.java","StatusDate.java","ProfileDate.java","UpdateCmdDate.java","ViewDate.java","AccessDetails.java","AccessDiverse.java","AuthenticationDate.java","CmdDate.java","CmdSummaryDate.java","CmdViewDate.java","ChangeOrderDate.java","CommentDate.java","CostDate.java","GameDate.java","LdPopDate.java","LeaseCostDate.java","PastApprovalDate.java","ReservationDate.java","RefDate.java","UnderwaterCmdDate.java","WaveDate.java","XmlFormatter.java"), 
Category = c("SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","XML External Entity Injection"), 
scanDate=c("2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24"), 
VulnCount = c("13","15"," 1"," 3","15"," 2","11","30"," 2"," 2"," 2"," 2"," 4"," 2"," 3"," 9"," 1"," 1"," 1"," 8"," 6","25","28"," 3","30"," 1"," 6"," 5","20","23"," 3"," 3"," 4","10"," 3","17"," 1"," 3"," 2"), 
Owasp = c("A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A01-Injection")) 

Я выполняю следующее, чтобы удалить дубликаты и, похоже, сработает. однако я хочу иметь возможность вернуть дубликат с последней датой. Дата должна быть динамической.

owaspSample <- owaspSample[!duplicated(owaspSample$Module),] 

Например, если вы столкнулись с этим:

Module     Category  Date  VulnCount Owasp 
CostDate.java   SQL Injection 2016-10-23  30  A00-SQL Injection 
EntranceDetails.java SQL Injection 2016-10-23  2  A00-SQL Injection 
GameDate.java   SQL Injection 2016-10-23  2  A00-SQL Injection 
CostDate.java   SQL Injection 2016-10-24  23  A00-SQL Injection 
GameDate.java   SQL Injection 2016-10-24  3  A00-SQL Injection 

Ожидаемый результат должен быть таким:

Module     Category  Date  VulnCount Owasp 
EntranceDetails.java SQL Injection 2016-10-23  2  A00-SQL Injection 
CostDate.java   SQL Injection 2016-10-24  23  A00-SQL Injection 
GameDate.java   SQL Injection 2016-10-24  3  A00-SQL Injection 

Любые идеи, как сделать это?

+1

Смотрите 'fromLast' аргумент' duplicated'. – nicola

+0

nicola, спасибо. Это работает, по крайней мере, возвращает модули на основе последней даты. однако он удаляет файлы, которые не дублируются. Этот набор данных я показал, что тест работает нормально. Понял, что я делаю неправильно –

ответ

0

Я использовал предложение nicola и добавил еще один фрагмент кода unique, и я не теряю имена файлов, которые не дублируются.

owaspSample <- owaspSample[unique(owaspSample$Module),] 

owaspSample <- owaspSample[!duplicated(owaspSample$Module, fromLast = TRUE),] 

Я думал, что они делают то же самое. Однако вместе они дают мне ожидаемые результаты.

0

Мы можем сделать это с помощью dplyr. После того, как группировка по «Модуль», slice последнюю строку в каждой группе

library(dplyr) 
owaspSample %>% 
     group_by(Module) %>% 
     slice(n()) 
Смежные вопросы