Let’s again say you have the example logs in the file
accesslog.txt.
log_file <- 'vignettes/accesslog.txt'
cat(readr::read_file(log_file))10.0.0.8 - - [2019-01-01:10:58:12 -0500] "https://mysite.com/index.html"
173.28.102.33 - - [2019-01-01:10:58:25 -0500] "https://mysite.com/login"
We first define the template as before.
template <- '{{ ip ip_address }} - - [{{ date date_time }}] "{{ url URL }}"'We then need to define our classes. ip and
url are builtins with the package, but dates come in a
variety of formats so we must explicitly define ours here. Note you can
see all builtins using default_classes()
date_parser <- parser(
'[0-9]{4}\\-[0-9]{2}\\-[0-9]{2}:[0-9]{2}:[0-9]{2}:[0-9]{2}[ ][\\-\\+][0-9]{4}',
function(x) lubridate::as_datetime(x, format = '%Y-%m-%d:%H:%M:%S %z'),
name = 'date'
)
date_parser## Parser: date
## ------------
## Matches:
## [0-9]{4}\-[0-9]{2}\-[0-9]{2}:[0-9]{2}:[0-9]{2}:[0-9]{2}[ ][\-\+][0-9]{4}
## Formatter:
## function (x)
## lubridate::as_datetime(x, format = "%Y-%m-%d:%H:%M:%S %z")
default_classes()[c('ip', 'url')]## $ip
## Parser: ip
## ----------
## Matches:
## [0-9]{1,3}(\.[0-9]{1,3}){3}
## Formatter:
## .Primitive("(")
##
## $url
## Parser: url
## -----------
## Matches:
## (-|(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\$&'\(\)\*\+,;=.]+)
## Formatter:
## .Primitive("(")
Both ip and url require no formatting, so
they have the identity function, (( in R), as their
formatter.
To get our final output in tabular format, we simply make the follow
call to parse_logs.
# Naming the date_parser 'date' in the list tells Tabulog to use it to parse
# the field with class 'date' in the template.
parse_logs(readLines(log_file), template, classes = list(date = date_parser))## ip_address date_time URL
## 1 10.0.0.8 2019-01-01 15:58:12 https://mysite.com/index.html
## 2 173.28.102.33 2019-01-01 15:58:25 https://mysite.com/login
Note that we only had to pass our custom class date. The
builtin classes ip and url were included by
default.
A more elegant and portable way of completing this task would be to define the template and the custom class in the same file, which can be ported to other Tabulog libraries in other languages, leaving only the formatters to be defined in the R script.
First, we define the template and the
classes in a yaml file
template_file <- 'vignettes/accesslog_template.yml'
cat(readr::read_file(template_file))template: '{{ ip ip_address }} - - [{{ date date_time }}] "{{ url URL }}"'
classes:
date: '[0-9]{4}\-[0-9]{2}\-[0-9]{2}:[0-9]{2}:[0-9]{2}:[0-9]{2}[ ][\-\+][0-9]{4}'
Next, we define the formatters for each of our classes. Here we only have one, but we still put it in a named list, with the name matching the name of the class in the template file.
formatters <- list(
date = function(x) lubridate::as_datetime(x, format = '%Y-%m-%d:%H:%M:%S %z')
)Finally, we make one call to parse_logs_file.
parse_logs_file(log_file, template_file, formatters)## ip_address date_time URL
## 1 10.0.0.8 2019-01-01 15:58:12 https://mysite.com/index.html
## 2 173.28.102.33 2019-01-01 15:58:25 https://mysite.com/login