Lightweight concordancing function to return key words in context (KWIC) in a tidy format.
Arguments
- x
a character vector of tokenized strings, or a single string
- index
a character vector of regex pattern to match, or a numeric vector to use as index of matches
- n
an integer, to specify the number of context tokens either side of the matched node
- tokenize
a logical, to tokenize the text first or not. If
TRUE
, a very basic tokenizer is used to split the string on whitespaces and punctuation (but not word internal apostrophes, at marks and hyphens).- separated
a logical, to separate the context tokens or not
Value
A tibble containing:
case - a case number for the match found.
left - objects immediately adjacent (up to n) to the left of the matched node. In case of
separated = TRUE
, the left are separated into left(n):left1match - the matched search item, as defined by the
index
argument.right - tokens immediately adjacent (up to n) to the right of the matched node. In case of
separated = TRUE
, the right tokens are separated into right1:right(n).index - the index row position of matched result from the input data frame.
Examples
x <- c("The", "cat", "sat", "on", "the", "mat")
index <- c("cat", "sat")
quick_conc(x, index, n = 2)
#> # A tibble: 2 × 5
#> case token_id left match right
#> <int> <int> <chr> <chr> <chr>
#> 1 1 2 NA The cat sat on
#> 2 2 3 The cat sat on the
x <- "The dog barked loudly, alerting the neighbors of potential danger.
A nearby park seemed like the perfect spot for the dog and
it quickly made its way there."
quick_conc(x, index = "dog", n = 3, tokenize = TRUE, separated = TRUE)
#> # A tibble: 2 × 9
#> case token_id left3 left2 left1 match right1 right2 right3
#> <int> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 2 NA NA The dog barked loudly ,
#> 2 2 23 spot for the dog and it quickly
quick_conc(x, index = c(4,8,12), tokenize = TRUE)
#> # A tibble: 3 × 5
#> case token_id left match right
#> <int> <int> <chr> <chr> <chr>
#> 1 1 4 NA NA The dog barked loudly , alerting the nei…
#> 2 2 8 barked loudly , alerting the neighbors of potential dange…
#> 3 3 12 the neighbors of potential danger . A nearby park seem…