Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The lines of log in Apache access.log files often uses vhost_combined instead of combined. #1594

Open
2 of 3 tasks
AlexisWilke opened this issue Nov 1, 2016 · 5 comments
Open
2 of 3 tasks
Labels

Comments

@AlexisWilke
Copy link

Environment:

  • Fail2Ban version (including any possible distribution suffixes): Fail2Ban v0.9.3
  • OS, including release name/version: Ubuntu 16.04.1 LTS
  • Fail2Ban installed via OS/distribution mechanisms
  • You have not applied any additional foreign patches to the codebase
  • Some customizations were done to the configuration (provide details below is so)

The issue:

The failregex for the few apache filters that check the access.log files suppose that the input is from the combined LogFormat instead (in my case) of the vhost_combined one (or even the common one).

Steps to reproduce

Create an entry such as:

<VirtualHost>
...
CustomLog /var/log/apache2/my.domain.com-access.log vhost_combined
...
</VirtualHost>

The definition of the vhost_combined format is:

LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined

And as we can see the string starts with %v:%p ... which is the virtual host domain name, a colon, and then the port being accessed. This way you can stick many logs in a single log file, which is very practical when you handle many hosts with one <VirtualHost> through a CMS like Snap! Websites.

Expected behavior

The failregex should either work with both formats or we should have two failregex entries and the administrator can choose the one he needs to use.

You may also add a variable that one can then use in the regex.

There is an updated version that works for the badbots filter and vhost_combined format:

[Definition]
failregex = ^.+? <HOST> -.*"(?:GET|POST|HEAD).*HTTP.*(?:%(badbots)s|%(badbotscustom)s)

I added .+? at the start to skip the %v:%p and also removed the $ at the end because I may have information about the SSL encryption used appearing there. I use the same log for non-SSL and SSL connections—I would imagine that some other people do the same—with the following LogFormat which is an extended vhost_combined:

LogFormat "%v:%p %h %l %u %t \"%r\" %>s %b \"%V\" \"%{Referer}i\" \"%{User-Agent}i\" %{SSL_PROTOCOL}x %{SSL_CIPHER}x" vhost_combined_ssl

I suppose one could have a variable to define whether the log line includes the %v:%p or not:

[Definition]
failregex = ^<vhost><HOST> -.*"(?:GET|POST|HEAD).*HTTP.*(?:%(badbots)s|%(badbotscustom)s)

[Init]
# Set vhost to ".+? " (notice the space at end!) if you use the vhost_combined LogFormat
vhost=
#vhost=.+? 

The space at the end of vhost=... variable may get trimmed. If so, we may want to use \s. I have not tested that theory, though. Another way is to use a more complicated regex which supports all combinations, combined, vhost_combined, vhost_combined_ssl (or other custom entries that add parameters at the end):

failregex=^([^:]+:[0-9]+ )?<HOST> -.*"(?:GET|POST|HEAD).*HTTP.*(?:%(badbots)s|%(badbotscustom)s)

This checks for the virtual host <domain>:<port> entry plus a space, but makes that optional. This is probably the best solution to allow either LogFormat within the same file, although we may need a better format at this location: -.*.

I also removed the ending $ so that way we can also support custom entries such as the vhost_combined_ssl format which adds two more fields at the end (and frankly it is a waste to add .*$ at the end of any regex.)

Observed behavior

The regex rejects everything. No ban ever happen.

Any additional information

The current regex also fails any custom formats that adds fields past the normal end of the combined or vhost_combined formats.

Configuration, dump and another helpful excerpts

Stock version.

Any customizations done to /etc/fail2ban/ configuration

The main problem does not come from customization.

The $ at the end of the failregex does prevent simple customization adding fields at the end of your log lines, though.

@sebres
Copy link
Contributor

sebres commented Nov 7, 2016

The same conclusion as #1589 (comment)

@galapogos01
Copy link

It would be great if you could reconsider this defect. Not only does the filter not work, it fails silently. Some doco or a warning would make more sense, as vhost_combined is the default log format for Apache servers hosting multiple domains.

@sebres
Copy link
Contributor

sebres commented Oct 18, 2023

It would be great if you could reconsider this defect.

Please provide the excerpt from access-log with this format.

Not only does the filter not work, it fails silently.

Which filter(s) exactly?
As for "it fails silently." - what do you mean? If some RE doesn't match, it simply doesn't find the wanted entries in the log/journal.
But this cannot be called "it fails silently."

Some doco or a warning would make more sense, as vhost_combined is the default log format for Apache servers hosting multiple domains.

This is basically outside of the responsibilities of fail2ban.
As for documentation, I'll never say no for more docu, so welcome with a PR.

By the way, monitoring of access-log is not recommended at all - see wiki :: Best practice (especially part Reduce parasitic log-traffic) for details.

@sebres sebres reopened this Oct 18, 2023
@galapogos01
Copy link

@sebres The filters mentioned in this ticket.

They fail to match any lines, which in the context of a piece of software that is designed to match log lines and provide blocking, means it is failing silently.

It would be great if the default regex could handle either scenario or expressly had a configuration item to do either. As it stands users will need to edit the default filters to make them work, there is no doco and then no further upstream updates will apply over the custom filter.

@sebres
Copy link
Contributor

sebres commented Oct 19, 2023

The filters mentioned in this ticket.

I asked for an exact filter list and an excerpt from access-log with messages that need to match.
Please provide this info.
Neither I use apache, nor I have time to search the internet for the logs for all possible constellations (vhosts etc) that affect some filters of fail2ban.
Savvy?

They fail to match any lines, which in the context of a piece of software that is designed to match log lines and provide blocking, means it is failing silently.

Well, it is not, let alone the range of reasons why it may stop to match, partially or even completely, is large:

  • changing of logging format by the developers or user
  • changing of logging target (different logfile or journal)
  • constellations which introduce several new tokens in log (like this vhost etc)
  • changing of the logging messages by the developers
  • etc

There is no way to detect such situation... excepting something like there are 1000 messages in log, where no one matches, then generate a warning (not really desirable, because it is extremely error-prone).

It would be great if the default regex could handle either scenario or expressly had a configuration item to do either.

Sure. That's why I reopened the issue.
Although it is advisable to consider stock filters as an example only and specify more precise and accurate REs (and other parameters), especially if it affects monitoring of huge log-files like access-log, exactly how it described in our wiki (see link in my previous comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants