Using `strsplit` and `grep`, first I set made an object `para` which was your paragraph.
toMatch <- c("Martin Luther", "Paul", "Melanchthon")
unlist(strsplit(para,split="\\."))[grep(paste(toMatch, collapse="|"),unlist(strsplit(para,split="\\.")))]
> unlist(strsplit(para,split="\\."))[grep(paste(toMatch, collapse="|"),unlist(strsplit(para,split="\\.")))]
[1] "Opposed as a reformer at Tübingen, he accepted a call to the University of Wittenberg by Martin Luther, recommended by his great-uncle Johann Reuchlin"
[2] " Melanchthon became professor of the Greek language in Wittenberg at the age of 21"
[3] " He studied the Scripture, especially of Paul, and Evangelical doctrine"
[4] " Johann Eck having attacked his views, Melanchthon replied based on the authority of Scripture in his Defensio contra Johannem Eckium"
Или немного чище:
sentences<-unlist(strsplit(para,split="\\."))
sentences[grep(paste(toMatch, collapse="|"),sentences)]
Если вы ищете предложения, что каждый человек находится в виде отдельных возвращается тогда:
toMatch <- c("Martin Luther", "Paul", "Melanchthon")
sentences<-unlist(strsplit(para,split="\\."))
foo<-function(Match){sentences[grep(Match,sentences)]}
lapply(toMatch,foo)
[[1]]
[1] "Opposed as a reformer at Tübingen, he accepted a call to the University of Wittenberg by Martin Luther, recommended by his great-uncle Johann Reuchlin"
[[2]]
[1] " He studied the Scripture, especially of Paul, and Evangelical doctrine"
[[3]]
[1] " Melanchthon became professor of the Greek language in Wittenberg at the age of 21"
[2] " Johann Eck having attacked his views, Melanchthon replied based on the authority of Scripture in his Defensio contra Johannem Eckium"
Редактировать 3: Для того, чтобы добавить имена каждого человека, сделать что-то простое, например:
foo<-function(Match){c(Match,sentences[grep(Match,sentences)])}
EDIT 4:
И если вы хотите, чтобы найти предложения, которые имели несколько людей/мест/вещей (слова), а затем просто добавить аргумент для тех, кто два, таких как:
toMatch <- c("Martin Luther", "Paul", "Melanchthon","(?=.*Melanchthon)(?=.*Scripture)")
и изменить perl
в TRUE
:
foo<-function(Match){c(Match,sentences[grep(Match,sentences,perl = T)])}
> lapply(toMatch,foo)
[[1]]
[1] "Martin Luther"
[2] "Opposed as a reformer at Tübingen, he accepted a call to the University of Wittenberg by Martin Luther, recommended by his great-uncle Johann Reuchlin"
[[2]]
[1] "Paul"
[2] " He studied the Scripture, especially of Paul, and Evangelical doctrine"
[[3]]
[1] "Melanchthon"
[2] " Melanchthon became professor of the Greek language in Wittenberg at the age of 21"
[3] " Johann Eck having attacked his views, Melanchthon replied based on the authority of Scripture in his Defensio contra Johannem Eckium"
[[4]]
[1] "(?=.*Melanchthon)(?=.*Scripture)"
[2] " Johann Eck having attacked his views, Melanchthon replied based on the authority of Scripture in his Defensio contra Johannem Eckium"
EDIT 5: Ответы на ваш другой вопрос:
Предоставлено:
sentenceR<-"Opposed as a reformer at [[Tübingen]], he accepted a call to the University of [[Wittenberg]] by [[Martin Luther]], recommended by his great-uncle [[Johann Reuchlin]]"
gsub("\\[\\[|\\]\\]", "", regmatches(sentenceR, gregexpr("\\[\\[.*?\\]\\]", sentenceR))[[1]])
Дает вам слова в двойных скобках.
> gsub("\\[\\[|\\]\\]", "", regmatches(sentenceR, gregexpr("\\[\\[.*?\\]\\]", sentenceR))[[1]])
[1] "Tübingen" "Wittenberg" "Martin Luther" "Johann Reuchlin"
Многие ТНХ, но я заметил, что для первой и 4-го предложения, есть два имени человека соответственно , Если я добавлю имя, например «Johann Eck» или «Johann Reuchlin», в «toMatch» и запустив код выше, я все равно получаю четыре вывода предложения. Мой новый вопрос: как я могу получить предложение каждого человека соответственно (перекрывается)? – hui
Я не совсем понимаю. Вы просите: а) только предложения, в которых есть все имена людей, или б) отдельное возвращение для каждого индивидуального имени (те предложения, в которых есть Мартин Лютер, а затем все предложения, в которых есть пауль и т. Д.)? –
@hui дайте мне знать, если новый код отвечает на ваш вопрос –