How to count occurrences of text in a file?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
I have a log file sorted by IP addresses,
I want to find the number of occurrences of each unique IP address.
How can I do this with bash? Possibly listing the number of occurrences next to an ip, such as:
5.135.134.16 count: 5
13.57.220.172: count 30
18.206.226 count:2
and so on.
Here’s a sample of the log:
5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:56 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:06 -0400] "POST /wp-login.php HTTP/1.1" 200 3985 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:08 -0400] "POST /wp-login.php HTTP/1.1" 200 3833 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:09 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:11 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:12 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:15 -0400] "POST /wp-login.php HTTP/1.1" 200 3837 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:17 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] "GET / HTTP/1.1" 200 25160 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
command-line bash sort uniq
add a comment |
I have a log file sorted by IP addresses,
I want to find the number of occurrences of each unique IP address.
How can I do this with bash? Possibly listing the number of occurrences next to an ip, such as:
5.135.134.16 count: 5
13.57.220.172: count 30
18.206.226 count:2
and so on.
Here’s a sample of the log:
5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:56 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:06 -0400] "POST /wp-login.php HTTP/1.1" 200 3985 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:08 -0400] "POST /wp-login.php HTTP/1.1" 200 3833 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:09 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:11 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:12 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:15 -0400] "POST /wp-login.php HTTP/1.1" 200 3837 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:17 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] "GET / HTTP/1.1" 200 25160 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
command-line bash sort uniq
1
With “bash”, do you mean the plain shell or the command line in general?
– dessert
Mar 28 at 21:55
1
Do you have any database software available to use?
– SpacePhoenix
Mar 29 at 8:58
1
Related
– Julien Lopez
Mar 30 at 0:17
The log is from an appache2 server, not really a database. bash is what I would prefer, in a general use case. I see the python and perl solutions, if they are good for someone else, that is great. the initial sorting was done withsort -V
though I think that wasn't required. I sent the top 10 abusers of the login page to the system admin with recommendations for banning respective subnets. for example, One IP hit the login page over 9000 times. that IP, & its class D subnet is now blacklisted. I'm sure we could automate this, though that is a different question.
– j0h
Mar 31 at 19:36
add a comment |
I have a log file sorted by IP addresses,
I want to find the number of occurrences of each unique IP address.
How can I do this with bash? Possibly listing the number of occurrences next to an ip, such as:
5.135.134.16 count: 5
13.57.220.172: count 30
18.206.226 count:2
and so on.
Here’s a sample of the log:
5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:56 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:06 -0400] "POST /wp-login.php HTTP/1.1" 200 3985 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:08 -0400] "POST /wp-login.php HTTP/1.1" 200 3833 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:09 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:11 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:12 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:15 -0400] "POST /wp-login.php HTTP/1.1" 200 3837 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:17 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] "GET / HTTP/1.1" 200 25160 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
command-line bash sort uniq
I have a log file sorted by IP addresses,
I want to find the number of occurrences of each unique IP address.
How can I do this with bash? Possibly listing the number of occurrences next to an ip, such as:
5.135.134.16 count: 5
13.57.220.172: count 30
18.206.226 count:2
and so on.
Here’s a sample of the log:
5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:55 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
5.135.134.16 - - [23/Mar/2019:08:42:56 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:06 -0400] "POST /wp-login.php HTTP/1.1" 200 3985 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:08 -0400] "POST /wp-login.php HTTP/1.1" 200 3833 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:09 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:11 -0400] "POST /wp-login.php HTTP/1.1" 200 3836 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:12 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:15 -0400] "POST /wp-login.php HTTP/1.1" 200 3837 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.220.172 - - [23/Mar/2019:11:01:17 -0400] "POST /xmlrpc.php HTTP/1.1" 200 413 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] "GET / HTTP/1.1" 200 25160 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] "POST /wp-login.php HTTP/1.1" 200 3988 "https://www.google.com/url?3a622303df89920683e4421b2cf28977" "Mozilla/5.0 (Windows NT 6.2; rv:33.0) Gecko/20100101 Firefox/33.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] "GET /wp-login.php HTTP/1.1" 200 2988 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
command-line bash sort uniq
command-line bash sort uniq
edited Mar 28 at 22:25
dessert
25.4k673107
25.4k673107
asked Mar 28 at 21:51
j0hj0h
6,5721657121
6,5721657121
1
With “bash”, do you mean the plain shell or the command line in general?
– dessert
Mar 28 at 21:55
1
Do you have any database software available to use?
– SpacePhoenix
Mar 29 at 8:58
1
Related
– Julien Lopez
Mar 30 at 0:17
The log is from an appache2 server, not really a database. bash is what I would prefer, in a general use case. I see the python and perl solutions, if they are good for someone else, that is great. the initial sorting was done withsort -V
though I think that wasn't required. I sent the top 10 abusers of the login page to the system admin with recommendations for banning respective subnets. for example, One IP hit the login page over 9000 times. that IP, & its class D subnet is now blacklisted. I'm sure we could automate this, though that is a different question.
– j0h
Mar 31 at 19:36
add a comment |
1
With “bash”, do you mean the plain shell or the command line in general?
– dessert
Mar 28 at 21:55
1
Do you have any database software available to use?
– SpacePhoenix
Mar 29 at 8:58
1
Related
– Julien Lopez
Mar 30 at 0:17
The log is from an appache2 server, not really a database. bash is what I would prefer, in a general use case. I see the python and perl solutions, if they are good for someone else, that is great. the initial sorting was done withsort -V
though I think that wasn't required. I sent the top 10 abusers of the login page to the system admin with recommendations for banning respective subnets. for example, One IP hit the login page over 9000 times. that IP, & its class D subnet is now blacklisted. I'm sure we could automate this, though that is a different question.
– j0h
Mar 31 at 19:36
1
1
With “bash”, do you mean the plain shell or the command line in general?
– dessert
Mar 28 at 21:55
With “bash”, do you mean the plain shell or the command line in general?
– dessert
Mar 28 at 21:55
1
1
Do you have any database software available to use?
– SpacePhoenix
Mar 29 at 8:58
Do you have any database software available to use?
– SpacePhoenix
Mar 29 at 8:58
1
1
Related
– Julien Lopez
Mar 30 at 0:17
Related
– Julien Lopez
Mar 30 at 0:17
The log is from an appache2 server, not really a database. bash is what I would prefer, in a general use case. I see the python and perl solutions, if they are good for someone else, that is great. the initial sorting was done with
sort -V
though I think that wasn't required. I sent the top 10 abusers of the login page to the system admin with recommendations for banning respective subnets. for example, One IP hit the login page over 9000 times. that IP, & its class D subnet is now blacklisted. I'm sure we could automate this, though that is a different question.– j0h
Mar 31 at 19:36
The log is from an appache2 server, not really a database. bash is what I would prefer, in a general use case. I see the python and perl solutions, if they are good for someone else, that is great. the initial sorting was done with
sort -V
though I think that wasn't required. I sent the top 10 abusers of the login page to the system admin with recommendations for banning respective subnets. for example, One IP hit the login page over 9000 times. that IP, & its class D subnet is now blacklisted. I'm sure we could automate this, though that is a different question.– j0h
Mar 31 at 19:36
add a comment |
8 Answers
8
active
oldest
votes
You can use grep
and uniq
for the list of addresses, loop over them and grep
again for the count:
for i in $(<log grep -o '^[^ ]*' | uniq); do
printf '%s count %dn' "$i" $(<log grep -c "$i")
done
grep -o '^[^ ]*'
outputs every character from the beginning (^
) until the first space of each line, uniq
removes repeated lines, thus leaving you with a list of IP addresses. Thanks to command substitution, the for
loop loops over this list printing the currently processed IP followed by “ count ” and the count. The latter is computed by grep -c
, which counts the number of lines with at least one match.
Example run
$ for i in $(<log grep -o '^[^ ]*'|uniq);do printf '%s count %dn' "$i" $(<log grep -c "$i");done
5.135.134.16 count 5
13.57.220.172 count 9
13.57.233.99 count 1
18.206.226.75 count 2
18.213.10.181 count 3
12
This solution iterates over the input file repeatedly, once for each IP address, which will be very slow if the file is large. The other solutions usinguniq -c
orawk
only need to read the file once,
– David
Mar 29 at 1:56
1
@David this is true, but this would have been my first go at it as well, knowing that grep counts. Unless performance is measurably a problem... dont prematurely optimize?
– D. Ben Knoble
Mar 29 at 3:56
3
I would not call it a premature optimization, given that the more efficient solution is also simpler, but to each their own.
– David
Mar 29 at 5:26
By the way, why is it written as<log grep ...
and notgrep ... log
?
– Santiago
Apr 3 at 17:09
@Santiago Because that’s better in many ways, as Stéphane Chazelas explains here on U&L.
– dessert
Apr 3 at 17:28
add a comment |
You can use cut
and uniq
tools:
cut -d ' ' -f1 test.txt | uniq -c
5 5.135.134.16
9 13.57.220.172
1 13.57.233.99
2 18.206.226.75
3 18.213.10.181
Explanation :
cut -d ' ' -f1
: extract first field (ip address)
uniq -c
: report repeated lines and display the number of occurences
6
One could usesed
, e.g.sed -E 's/ *(S*) *(S*)/2 count: 1/'
to get the output exactly like OP wanted.
– dessert
Mar 28 at 22:22
2
This should be the accepted answer, as the one by dessert needs to read the file repeatedly so is much slower. And you can easily usesort file | cut ....
in case you're not sure if the file is already sorted.
– Guntram Blohm
Mar 29 at 8:44
add a comment |
If you don't specifically require the given output format, then I would recommend the already posted cut
+ uniq
based answer
If you really need the given output format, a single-pass way to do it in Awk would be
awk '{c[$1]++} END{for(i in c) print i, "count: " c[i]}' log
This is somewhat non-ideal when the input is already sorted since it unnecessarily stores all the IPs into memory - a better, though more complicated, way to do it in the pre-sorted case (more directly equivalent to uniq -c
) would be:
awk '
NR==1 {last=$1}
$1 != last {print last, "count: " c[last]; last = $1}
{c[$1]++}
END {print last, "count: " c[last]}
'
Ex.
$ awk 'NR==1 {last=$1} $1 != last {print last, "count: " c[last]; last = $1} {c[$1]++} END{print last, "count: " c[last]}' log
5.135.134.16 count: 5
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
it would be easy to change the cut + uniq based answer with sed to appear in the demanded format.
– Peter A. Schneider
Mar 29 at 11:12
@PeterA.Schneider yes it would - I believe that was already pointed out in comments to that answer
– steeldriver
Mar 29 at 12:07
Ah, yes, I see.
– Peter A. Schneider
Mar 29 at 12:36
add a comment |
Here is one possible solution:
IN_FILE="file.log"
for IP in $(awk '{print $1}' "$IN_FILE" | sort -u)
do
echo -en "${IP}tcount: "
grep -c "$IP" "$IN_FILE"
done
- replace
file.log
with the actual file name. - the command substitution expression
$(awk '{print $1}' "$IN_FILE" | sort -u)
will provide a list of the unique values of the first column. - then
grep -c
will count each of these values within the file.
$ IN_FILE="file.log"; for IP in $(awk '{print $1}' "$IN_FILE" | sort -u); do echo -en "${IP}tcount: "; grep -c "$IP" "$IN_FILE"; done
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
5.135.134.16 count: 5
1
Preferprintf
...
– D. Ben Knoble
Mar 29 at 3:58
1
This means you need to process the entire file multiple times. Once to get the list of IPs and then once more for each of the IPs you find.
– terdon♦
Mar 29 at 16:07
add a comment |
Some Perl:
$ perl -lae '$k{$F[0]}++; }{ print "$_ count: $k{$_}" for keys(%k)' log
13.57.233.99 count: 1
18.206.226.75 count: 2
13.57.220.172 count: 9
5.135.134.16 count: 5
18.213.10.181 count: 3
This is the same idea as Steeldriver's awk approach, but in Perl. The -a
causes perl to automatically split each input line into the array @F
, whose first element (the IP) is $F[0]
. So, $k{$F[0]}++
will create the hash %k
, whose keys are the IPs and whose values are the number of times each IP was seen. The }{
is funky perlspeak for "do the rest at the very end, after processing all input". So, at the end, the script will iterate over the keys of the hash and print the current key ($_
) along with its value ($k{$_}
).
And, just so people don't think that perl forces you to write script that look like cryptic scribblings, this is the same thing in a less condensed form:
perl -e '
while (my $line=<STDIN>){
@fields = split(/ /, $line);
$ip = $fields[0];
$counts{$ip}++;
}
foreach $ip (keys(%counts)){
print "$ip count: $counts{$ip}n"
}' < log
add a comment |
Maybe this is not what the OP want; however, if we know that the IP address length will be limited to 15 characters, a quicker way to display the counts with unique IPs from a huge log file can be achieved using uniq
command alone:
$ uniq -w 15 -c log
5 5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] ...
9 13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] ...
1 13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] ...
2 18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] ...
3 18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] ...
Options:
-w N
compares no more than N
characters in lines
-c
will prefix lines by the number of occurrences
Alternatively, For exact formatted output I prefer awk
(should also work for IPV6 addresses), ymmv.
$ awk 'NF { print $1 }' log | sort -h | uniq -c | awk '{printf "%s count: %dn", $2,$1 }'
5.135.134.16 count: 5
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
Note that uniq
won't detect repeated lines in the input file if they are not adjacent, so it may be necessary to sort
the file.
1
Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.
– Martin Thornton
Mar 29 at 23:17
I like it, I didnt know uniq could count!
– j0h
Mar 31 at 12:57
add a comment |
FWIW, Python 3:
from collections import Counter
with open('sample.log') as file:
counts = Counter(line.split()[0] for line in file)
for ip_address, count in counts.items():
print('%-15s count: %d' % (ip_address, count))
Output:
13.57.233.99 count: 1
18.213.10.181 count: 3
5.135.134.16 count: 5
18.206.226.75 count: 2
13.57.220.172 count: 9
add a comment |
cut -f1 -d- my.log | sort | uniq -c
Explanation: Take the first field of my.log splitting on dashes -
and sort it. uniq
needs sorted input. -c
tells it to count occurrences.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1129521%2fhow-to-count-occurrences-of-text-in-a-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
8 Answers
8
active
oldest
votes
8 Answers
8
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use grep
and uniq
for the list of addresses, loop over them and grep
again for the count:
for i in $(<log grep -o '^[^ ]*' | uniq); do
printf '%s count %dn' "$i" $(<log grep -c "$i")
done
grep -o '^[^ ]*'
outputs every character from the beginning (^
) until the first space of each line, uniq
removes repeated lines, thus leaving you with a list of IP addresses. Thanks to command substitution, the for
loop loops over this list printing the currently processed IP followed by “ count ” and the count. The latter is computed by grep -c
, which counts the number of lines with at least one match.
Example run
$ for i in $(<log grep -o '^[^ ]*'|uniq);do printf '%s count %dn' "$i" $(<log grep -c "$i");done
5.135.134.16 count 5
13.57.220.172 count 9
13.57.233.99 count 1
18.206.226.75 count 2
18.213.10.181 count 3
12
This solution iterates over the input file repeatedly, once for each IP address, which will be very slow if the file is large. The other solutions usinguniq -c
orawk
only need to read the file once,
– David
Mar 29 at 1:56
1
@David this is true, but this would have been my first go at it as well, knowing that grep counts. Unless performance is measurably a problem... dont prematurely optimize?
– D. Ben Knoble
Mar 29 at 3:56
3
I would not call it a premature optimization, given that the more efficient solution is also simpler, but to each their own.
– David
Mar 29 at 5:26
By the way, why is it written as<log grep ...
and notgrep ... log
?
– Santiago
Apr 3 at 17:09
@Santiago Because that’s better in many ways, as Stéphane Chazelas explains here on U&L.
– dessert
Apr 3 at 17:28
add a comment |
You can use grep
and uniq
for the list of addresses, loop over them and grep
again for the count:
for i in $(<log grep -o '^[^ ]*' | uniq); do
printf '%s count %dn' "$i" $(<log grep -c "$i")
done
grep -o '^[^ ]*'
outputs every character from the beginning (^
) until the first space of each line, uniq
removes repeated lines, thus leaving you with a list of IP addresses. Thanks to command substitution, the for
loop loops over this list printing the currently processed IP followed by “ count ” and the count. The latter is computed by grep -c
, which counts the number of lines with at least one match.
Example run
$ for i in $(<log grep -o '^[^ ]*'|uniq);do printf '%s count %dn' "$i" $(<log grep -c "$i");done
5.135.134.16 count 5
13.57.220.172 count 9
13.57.233.99 count 1
18.206.226.75 count 2
18.213.10.181 count 3
12
This solution iterates over the input file repeatedly, once for each IP address, which will be very slow if the file is large. The other solutions usinguniq -c
orawk
only need to read the file once,
– David
Mar 29 at 1:56
1
@David this is true, but this would have been my first go at it as well, knowing that grep counts. Unless performance is measurably a problem... dont prematurely optimize?
– D. Ben Knoble
Mar 29 at 3:56
3
I would not call it a premature optimization, given that the more efficient solution is also simpler, but to each their own.
– David
Mar 29 at 5:26
By the way, why is it written as<log grep ...
and notgrep ... log
?
– Santiago
Apr 3 at 17:09
@Santiago Because that’s better in many ways, as Stéphane Chazelas explains here on U&L.
– dessert
Apr 3 at 17:28
add a comment |
You can use grep
and uniq
for the list of addresses, loop over them and grep
again for the count:
for i in $(<log grep -o '^[^ ]*' | uniq); do
printf '%s count %dn' "$i" $(<log grep -c "$i")
done
grep -o '^[^ ]*'
outputs every character from the beginning (^
) until the first space of each line, uniq
removes repeated lines, thus leaving you with a list of IP addresses. Thanks to command substitution, the for
loop loops over this list printing the currently processed IP followed by “ count ” and the count. The latter is computed by grep -c
, which counts the number of lines with at least one match.
Example run
$ for i in $(<log grep -o '^[^ ]*'|uniq);do printf '%s count %dn' "$i" $(<log grep -c "$i");done
5.135.134.16 count 5
13.57.220.172 count 9
13.57.233.99 count 1
18.206.226.75 count 2
18.213.10.181 count 3
You can use grep
and uniq
for the list of addresses, loop over them and grep
again for the count:
for i in $(<log grep -o '^[^ ]*' | uniq); do
printf '%s count %dn' "$i" $(<log grep -c "$i")
done
grep -o '^[^ ]*'
outputs every character from the beginning (^
) until the first space of each line, uniq
removes repeated lines, thus leaving you with a list of IP addresses. Thanks to command substitution, the for
loop loops over this list printing the currently processed IP followed by “ count ” and the count. The latter is computed by grep -c
, which counts the number of lines with at least one match.
Example run
$ for i in $(<log grep -o '^[^ ]*'|uniq);do printf '%s count %dn' "$i" $(<log grep -c "$i");done
5.135.134.16 count 5
13.57.220.172 count 9
13.57.233.99 count 1
18.206.226.75 count 2
18.213.10.181 count 3
edited Mar 28 at 23:11
answered Mar 28 at 22:08
dessertdessert
25.4k673107
25.4k673107
12
This solution iterates over the input file repeatedly, once for each IP address, which will be very slow if the file is large. The other solutions usinguniq -c
orawk
only need to read the file once,
– David
Mar 29 at 1:56
1
@David this is true, but this would have been my first go at it as well, knowing that grep counts. Unless performance is measurably a problem... dont prematurely optimize?
– D. Ben Knoble
Mar 29 at 3:56
3
I would not call it a premature optimization, given that the more efficient solution is also simpler, but to each their own.
– David
Mar 29 at 5:26
By the way, why is it written as<log grep ...
and notgrep ... log
?
– Santiago
Apr 3 at 17:09
@Santiago Because that’s better in many ways, as Stéphane Chazelas explains here on U&L.
– dessert
Apr 3 at 17:28
add a comment |
12
This solution iterates over the input file repeatedly, once for each IP address, which will be very slow if the file is large. The other solutions usinguniq -c
orawk
only need to read the file once,
– David
Mar 29 at 1:56
1
@David this is true, but this would have been my first go at it as well, knowing that grep counts. Unless performance is measurably a problem... dont prematurely optimize?
– D. Ben Knoble
Mar 29 at 3:56
3
I would not call it a premature optimization, given that the more efficient solution is also simpler, but to each their own.
– David
Mar 29 at 5:26
By the way, why is it written as<log grep ...
and notgrep ... log
?
– Santiago
Apr 3 at 17:09
@Santiago Because that’s better in many ways, as Stéphane Chazelas explains here on U&L.
– dessert
Apr 3 at 17:28
12
12
This solution iterates over the input file repeatedly, once for each IP address, which will be very slow if the file is large. The other solutions using
uniq -c
or awk
only need to read the file once,– David
Mar 29 at 1:56
This solution iterates over the input file repeatedly, once for each IP address, which will be very slow if the file is large. The other solutions using
uniq -c
or awk
only need to read the file once,– David
Mar 29 at 1:56
1
1
@David this is true, but this would have been my first go at it as well, knowing that grep counts. Unless performance is measurably a problem... dont prematurely optimize?
– D. Ben Knoble
Mar 29 at 3:56
@David this is true, but this would have been my first go at it as well, knowing that grep counts. Unless performance is measurably a problem... dont prematurely optimize?
– D. Ben Knoble
Mar 29 at 3:56
3
3
I would not call it a premature optimization, given that the more efficient solution is also simpler, but to each their own.
– David
Mar 29 at 5:26
I would not call it a premature optimization, given that the more efficient solution is also simpler, but to each their own.
– David
Mar 29 at 5:26
By the way, why is it written as
<log grep ...
and not grep ... log
?– Santiago
Apr 3 at 17:09
By the way, why is it written as
<log grep ...
and not grep ... log
?– Santiago
Apr 3 at 17:09
@Santiago Because that’s better in many ways, as Stéphane Chazelas explains here on U&L.
– dessert
Apr 3 at 17:28
@Santiago Because that’s better in many ways, as Stéphane Chazelas explains here on U&L.
– dessert
Apr 3 at 17:28
add a comment |
You can use cut
and uniq
tools:
cut -d ' ' -f1 test.txt | uniq -c
5 5.135.134.16
9 13.57.220.172
1 13.57.233.99
2 18.206.226.75
3 18.213.10.181
Explanation :
cut -d ' ' -f1
: extract first field (ip address)
uniq -c
: report repeated lines and display the number of occurences
6
One could usesed
, e.g.sed -E 's/ *(S*) *(S*)/2 count: 1/'
to get the output exactly like OP wanted.
– dessert
Mar 28 at 22:22
2
This should be the accepted answer, as the one by dessert needs to read the file repeatedly so is much slower. And you can easily usesort file | cut ....
in case you're not sure if the file is already sorted.
– Guntram Blohm
Mar 29 at 8:44
add a comment |
You can use cut
and uniq
tools:
cut -d ' ' -f1 test.txt | uniq -c
5 5.135.134.16
9 13.57.220.172
1 13.57.233.99
2 18.206.226.75
3 18.213.10.181
Explanation :
cut -d ' ' -f1
: extract first field (ip address)
uniq -c
: report repeated lines and display the number of occurences
6
One could usesed
, e.g.sed -E 's/ *(S*) *(S*)/2 count: 1/'
to get the output exactly like OP wanted.
– dessert
Mar 28 at 22:22
2
This should be the accepted answer, as the one by dessert needs to read the file repeatedly so is much slower. And you can easily usesort file | cut ....
in case you're not sure if the file is already sorted.
– Guntram Blohm
Mar 29 at 8:44
add a comment |
You can use cut
and uniq
tools:
cut -d ' ' -f1 test.txt | uniq -c
5 5.135.134.16
9 13.57.220.172
1 13.57.233.99
2 18.206.226.75
3 18.213.10.181
Explanation :
cut -d ' ' -f1
: extract first field (ip address)
uniq -c
: report repeated lines and display the number of occurences
You can use cut
and uniq
tools:
cut -d ' ' -f1 test.txt | uniq -c
5 5.135.134.16
9 13.57.220.172
1 13.57.233.99
2 18.206.226.75
3 18.213.10.181
Explanation :
cut -d ' ' -f1
: extract first field (ip address)
uniq -c
: report repeated lines and display the number of occurences
edited Mar 28 at 22:34
answered Mar 28 at 22:04
Mikael FloraMikael Flora
441117
441117
6
One could usesed
, e.g.sed -E 's/ *(S*) *(S*)/2 count: 1/'
to get the output exactly like OP wanted.
– dessert
Mar 28 at 22:22
2
This should be the accepted answer, as the one by dessert needs to read the file repeatedly so is much slower. And you can easily usesort file | cut ....
in case you're not sure if the file is already sorted.
– Guntram Blohm
Mar 29 at 8:44
add a comment |
6
One could usesed
, e.g.sed -E 's/ *(S*) *(S*)/2 count: 1/'
to get the output exactly like OP wanted.
– dessert
Mar 28 at 22:22
2
This should be the accepted answer, as the one by dessert needs to read the file repeatedly so is much slower. And you can easily usesort file | cut ....
in case you're not sure if the file is already sorted.
– Guntram Blohm
Mar 29 at 8:44
6
6
One could use
sed
, e.g. sed -E 's/ *(S*) *(S*)/2 count: 1/'
to get the output exactly like OP wanted.– dessert
Mar 28 at 22:22
One could use
sed
, e.g. sed -E 's/ *(S*) *(S*)/2 count: 1/'
to get the output exactly like OP wanted.– dessert
Mar 28 at 22:22
2
2
This should be the accepted answer, as the one by dessert needs to read the file repeatedly so is much slower. And you can easily use
sort file | cut ....
in case you're not sure if the file is already sorted.– Guntram Blohm
Mar 29 at 8:44
This should be the accepted answer, as the one by dessert needs to read the file repeatedly so is much slower. And you can easily use
sort file | cut ....
in case you're not sure if the file is already sorted.– Guntram Blohm
Mar 29 at 8:44
add a comment |
If you don't specifically require the given output format, then I would recommend the already posted cut
+ uniq
based answer
If you really need the given output format, a single-pass way to do it in Awk would be
awk '{c[$1]++} END{for(i in c) print i, "count: " c[i]}' log
This is somewhat non-ideal when the input is already sorted since it unnecessarily stores all the IPs into memory - a better, though more complicated, way to do it in the pre-sorted case (more directly equivalent to uniq -c
) would be:
awk '
NR==1 {last=$1}
$1 != last {print last, "count: " c[last]; last = $1}
{c[$1]++}
END {print last, "count: " c[last]}
'
Ex.
$ awk 'NR==1 {last=$1} $1 != last {print last, "count: " c[last]; last = $1} {c[$1]++} END{print last, "count: " c[last]}' log
5.135.134.16 count: 5
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
it would be easy to change the cut + uniq based answer with sed to appear in the demanded format.
– Peter A. Schneider
Mar 29 at 11:12
@PeterA.Schneider yes it would - I believe that was already pointed out in comments to that answer
– steeldriver
Mar 29 at 12:07
Ah, yes, I see.
– Peter A. Schneider
Mar 29 at 12:36
add a comment |
If you don't specifically require the given output format, then I would recommend the already posted cut
+ uniq
based answer
If you really need the given output format, a single-pass way to do it in Awk would be
awk '{c[$1]++} END{for(i in c) print i, "count: " c[i]}' log
This is somewhat non-ideal when the input is already sorted since it unnecessarily stores all the IPs into memory - a better, though more complicated, way to do it in the pre-sorted case (more directly equivalent to uniq -c
) would be:
awk '
NR==1 {last=$1}
$1 != last {print last, "count: " c[last]; last = $1}
{c[$1]++}
END {print last, "count: " c[last]}
'
Ex.
$ awk 'NR==1 {last=$1} $1 != last {print last, "count: " c[last]; last = $1} {c[$1]++} END{print last, "count: " c[last]}' log
5.135.134.16 count: 5
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
it would be easy to change the cut + uniq based answer with sed to appear in the demanded format.
– Peter A. Schneider
Mar 29 at 11:12
@PeterA.Schneider yes it would - I believe that was already pointed out in comments to that answer
– steeldriver
Mar 29 at 12:07
Ah, yes, I see.
– Peter A. Schneider
Mar 29 at 12:36
add a comment |
If you don't specifically require the given output format, then I would recommend the already posted cut
+ uniq
based answer
If you really need the given output format, a single-pass way to do it in Awk would be
awk '{c[$1]++} END{for(i in c) print i, "count: " c[i]}' log
This is somewhat non-ideal when the input is already sorted since it unnecessarily stores all the IPs into memory - a better, though more complicated, way to do it in the pre-sorted case (more directly equivalent to uniq -c
) would be:
awk '
NR==1 {last=$1}
$1 != last {print last, "count: " c[last]; last = $1}
{c[$1]++}
END {print last, "count: " c[last]}
'
Ex.
$ awk 'NR==1 {last=$1} $1 != last {print last, "count: " c[last]; last = $1} {c[$1]++} END{print last, "count: " c[last]}' log
5.135.134.16 count: 5
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
If you don't specifically require the given output format, then I would recommend the already posted cut
+ uniq
based answer
If you really need the given output format, a single-pass way to do it in Awk would be
awk '{c[$1]++} END{for(i in c) print i, "count: " c[i]}' log
This is somewhat non-ideal when the input is already sorted since it unnecessarily stores all the IPs into memory - a better, though more complicated, way to do it in the pre-sorted case (more directly equivalent to uniq -c
) would be:
awk '
NR==1 {last=$1}
$1 != last {print last, "count: " c[last]; last = $1}
{c[$1]++}
END {print last, "count: " c[last]}
'
Ex.
$ awk 'NR==1 {last=$1} $1 != last {print last, "count: " c[last]; last = $1} {c[$1]++} END{print last, "count: " c[last]}' log
5.135.134.16 count: 5
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
edited Mar 28 at 22:36
answered Mar 28 at 22:12
steeldriversteeldriver
70.6k11115187
70.6k11115187
it would be easy to change the cut + uniq based answer with sed to appear in the demanded format.
– Peter A. Schneider
Mar 29 at 11:12
@PeterA.Schneider yes it would - I believe that was already pointed out in comments to that answer
– steeldriver
Mar 29 at 12:07
Ah, yes, I see.
– Peter A. Schneider
Mar 29 at 12:36
add a comment |
it would be easy to change the cut + uniq based answer with sed to appear in the demanded format.
– Peter A. Schneider
Mar 29 at 11:12
@PeterA.Schneider yes it would - I believe that was already pointed out in comments to that answer
– steeldriver
Mar 29 at 12:07
Ah, yes, I see.
– Peter A. Schneider
Mar 29 at 12:36
it would be easy to change the cut + uniq based answer with sed to appear in the demanded format.
– Peter A. Schneider
Mar 29 at 11:12
it would be easy to change the cut + uniq based answer with sed to appear in the demanded format.
– Peter A. Schneider
Mar 29 at 11:12
@PeterA.Schneider yes it would - I believe that was already pointed out in comments to that answer
– steeldriver
Mar 29 at 12:07
@PeterA.Schneider yes it would - I believe that was already pointed out in comments to that answer
– steeldriver
Mar 29 at 12:07
Ah, yes, I see.
– Peter A. Schneider
Mar 29 at 12:36
Ah, yes, I see.
– Peter A. Schneider
Mar 29 at 12:36
add a comment |
Here is one possible solution:
IN_FILE="file.log"
for IP in $(awk '{print $1}' "$IN_FILE" | sort -u)
do
echo -en "${IP}tcount: "
grep -c "$IP" "$IN_FILE"
done
- replace
file.log
with the actual file name. - the command substitution expression
$(awk '{print $1}' "$IN_FILE" | sort -u)
will provide a list of the unique values of the first column. - then
grep -c
will count each of these values within the file.
$ IN_FILE="file.log"; for IP in $(awk '{print $1}' "$IN_FILE" | sort -u); do echo -en "${IP}tcount: "; grep -c "$IP" "$IN_FILE"; done
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
5.135.134.16 count: 5
1
Preferprintf
...
– D. Ben Knoble
Mar 29 at 3:58
1
This means you need to process the entire file multiple times. Once to get the list of IPs and then once more for each of the IPs you find.
– terdon♦
Mar 29 at 16:07
add a comment |
Here is one possible solution:
IN_FILE="file.log"
for IP in $(awk '{print $1}' "$IN_FILE" | sort -u)
do
echo -en "${IP}tcount: "
grep -c "$IP" "$IN_FILE"
done
- replace
file.log
with the actual file name. - the command substitution expression
$(awk '{print $1}' "$IN_FILE" | sort -u)
will provide a list of the unique values of the first column. - then
grep -c
will count each of these values within the file.
$ IN_FILE="file.log"; for IP in $(awk '{print $1}' "$IN_FILE" | sort -u); do echo -en "${IP}tcount: "; grep -c "$IP" "$IN_FILE"; done
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
5.135.134.16 count: 5
1
Preferprintf
...
– D. Ben Knoble
Mar 29 at 3:58
1
This means you need to process the entire file multiple times. Once to get the list of IPs and then once more for each of the IPs you find.
– terdon♦
Mar 29 at 16:07
add a comment |
Here is one possible solution:
IN_FILE="file.log"
for IP in $(awk '{print $1}' "$IN_FILE" | sort -u)
do
echo -en "${IP}tcount: "
grep -c "$IP" "$IN_FILE"
done
- replace
file.log
with the actual file name. - the command substitution expression
$(awk '{print $1}' "$IN_FILE" | sort -u)
will provide a list of the unique values of the first column. - then
grep -c
will count each of these values within the file.
$ IN_FILE="file.log"; for IP in $(awk '{print $1}' "$IN_FILE" | sort -u); do echo -en "${IP}tcount: "; grep -c "$IP" "$IN_FILE"; done
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
5.135.134.16 count: 5
Here is one possible solution:
IN_FILE="file.log"
for IP in $(awk '{print $1}' "$IN_FILE" | sort -u)
do
echo -en "${IP}tcount: "
grep -c "$IP" "$IN_FILE"
done
- replace
file.log
with the actual file name. - the command substitution expression
$(awk '{print $1}' "$IN_FILE" | sort -u)
will provide a list of the unique values of the first column. - then
grep -c
will count each of these values within the file.
$ IN_FILE="file.log"; for IP in $(awk '{print $1}' "$IN_FILE" | sort -u); do echo -en "${IP}tcount: "; grep -c "$IP" "$IN_FILE"; done
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
5.135.134.16 count: 5
edited Mar 28 at 22:20
answered Mar 28 at 22:07
pa4080pa4080
14.8k52872
14.8k52872
1
Preferprintf
...
– D. Ben Knoble
Mar 29 at 3:58
1
This means you need to process the entire file multiple times. Once to get the list of IPs and then once more for each of the IPs you find.
– terdon♦
Mar 29 at 16:07
add a comment |
1
Preferprintf
...
– D. Ben Knoble
Mar 29 at 3:58
1
This means you need to process the entire file multiple times. Once to get the list of IPs and then once more for each of the IPs you find.
– terdon♦
Mar 29 at 16:07
1
1
Prefer
printf
...– D. Ben Knoble
Mar 29 at 3:58
Prefer
printf
...– D. Ben Knoble
Mar 29 at 3:58
1
1
This means you need to process the entire file multiple times. Once to get the list of IPs and then once more for each of the IPs you find.
– terdon♦
Mar 29 at 16:07
This means you need to process the entire file multiple times. Once to get the list of IPs and then once more for each of the IPs you find.
– terdon♦
Mar 29 at 16:07
add a comment |
Some Perl:
$ perl -lae '$k{$F[0]}++; }{ print "$_ count: $k{$_}" for keys(%k)' log
13.57.233.99 count: 1
18.206.226.75 count: 2
13.57.220.172 count: 9
5.135.134.16 count: 5
18.213.10.181 count: 3
This is the same idea as Steeldriver's awk approach, but in Perl. The -a
causes perl to automatically split each input line into the array @F
, whose first element (the IP) is $F[0]
. So, $k{$F[0]}++
will create the hash %k
, whose keys are the IPs and whose values are the number of times each IP was seen. The }{
is funky perlspeak for "do the rest at the very end, after processing all input". So, at the end, the script will iterate over the keys of the hash and print the current key ($_
) along with its value ($k{$_}
).
And, just so people don't think that perl forces you to write script that look like cryptic scribblings, this is the same thing in a less condensed form:
perl -e '
while (my $line=<STDIN>){
@fields = split(/ /, $line);
$ip = $fields[0];
$counts{$ip}++;
}
foreach $ip (keys(%counts)){
print "$ip count: $counts{$ip}n"
}' < log
add a comment |
Some Perl:
$ perl -lae '$k{$F[0]}++; }{ print "$_ count: $k{$_}" for keys(%k)' log
13.57.233.99 count: 1
18.206.226.75 count: 2
13.57.220.172 count: 9
5.135.134.16 count: 5
18.213.10.181 count: 3
This is the same idea as Steeldriver's awk approach, but in Perl. The -a
causes perl to automatically split each input line into the array @F
, whose first element (the IP) is $F[0]
. So, $k{$F[0]}++
will create the hash %k
, whose keys are the IPs and whose values are the number of times each IP was seen. The }{
is funky perlspeak for "do the rest at the very end, after processing all input". So, at the end, the script will iterate over the keys of the hash and print the current key ($_
) along with its value ($k{$_}
).
And, just so people don't think that perl forces you to write script that look like cryptic scribblings, this is the same thing in a less condensed form:
perl -e '
while (my $line=<STDIN>){
@fields = split(/ /, $line);
$ip = $fields[0];
$counts{$ip}++;
}
foreach $ip (keys(%counts)){
print "$ip count: $counts{$ip}n"
}' < log
add a comment |
Some Perl:
$ perl -lae '$k{$F[0]}++; }{ print "$_ count: $k{$_}" for keys(%k)' log
13.57.233.99 count: 1
18.206.226.75 count: 2
13.57.220.172 count: 9
5.135.134.16 count: 5
18.213.10.181 count: 3
This is the same idea as Steeldriver's awk approach, but in Perl. The -a
causes perl to automatically split each input line into the array @F
, whose first element (the IP) is $F[0]
. So, $k{$F[0]}++
will create the hash %k
, whose keys are the IPs and whose values are the number of times each IP was seen. The }{
is funky perlspeak for "do the rest at the very end, after processing all input". So, at the end, the script will iterate over the keys of the hash and print the current key ($_
) along with its value ($k{$_}
).
And, just so people don't think that perl forces you to write script that look like cryptic scribblings, this is the same thing in a less condensed form:
perl -e '
while (my $line=<STDIN>){
@fields = split(/ /, $line);
$ip = $fields[0];
$counts{$ip}++;
}
foreach $ip (keys(%counts)){
print "$ip count: $counts{$ip}n"
}' < log
Some Perl:
$ perl -lae '$k{$F[0]}++; }{ print "$_ count: $k{$_}" for keys(%k)' log
13.57.233.99 count: 1
18.206.226.75 count: 2
13.57.220.172 count: 9
5.135.134.16 count: 5
18.213.10.181 count: 3
This is the same idea as Steeldriver's awk approach, but in Perl. The -a
causes perl to automatically split each input line into the array @F
, whose first element (the IP) is $F[0]
. So, $k{$F[0]}++
will create the hash %k
, whose keys are the IPs and whose values are the number of times each IP was seen. The }{
is funky perlspeak for "do the rest at the very end, after processing all input". So, at the end, the script will iterate over the keys of the hash and print the current key ($_
) along with its value ($k{$_}
).
And, just so people don't think that perl forces you to write script that look like cryptic scribblings, this is the same thing in a less condensed form:
perl -e '
while (my $line=<STDIN>){
@fields = split(/ /, $line);
$ip = $fields[0];
$counts{$ip}++;
}
foreach $ip (keys(%counts)){
print "$ip count: $counts{$ip}n"
}' < log
answered Mar 29 at 16:14
terdon♦terdon
67.6k13139223
67.6k13139223
add a comment |
add a comment |
Maybe this is not what the OP want; however, if we know that the IP address length will be limited to 15 characters, a quicker way to display the counts with unique IPs from a huge log file can be achieved using uniq
command alone:
$ uniq -w 15 -c log
5 5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] ...
9 13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] ...
1 13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] ...
2 18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] ...
3 18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] ...
Options:
-w N
compares no more than N
characters in lines
-c
will prefix lines by the number of occurrences
Alternatively, For exact formatted output I prefer awk
(should also work for IPV6 addresses), ymmv.
$ awk 'NF { print $1 }' log | sort -h | uniq -c | awk '{printf "%s count: %dn", $2,$1 }'
5.135.134.16 count: 5
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
Note that uniq
won't detect repeated lines in the input file if they are not adjacent, so it may be necessary to sort
the file.
1
Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.
– Martin Thornton
Mar 29 at 23:17
I like it, I didnt know uniq could count!
– j0h
Mar 31 at 12:57
add a comment |
Maybe this is not what the OP want; however, if we know that the IP address length will be limited to 15 characters, a quicker way to display the counts with unique IPs from a huge log file can be achieved using uniq
command alone:
$ uniq -w 15 -c log
5 5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] ...
9 13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] ...
1 13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] ...
2 18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] ...
3 18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] ...
Options:
-w N
compares no more than N
characters in lines
-c
will prefix lines by the number of occurrences
Alternatively, For exact formatted output I prefer awk
(should also work for IPV6 addresses), ymmv.
$ awk 'NF { print $1 }' log | sort -h | uniq -c | awk '{printf "%s count: %dn", $2,$1 }'
5.135.134.16 count: 5
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
Note that uniq
won't detect repeated lines in the input file if they are not adjacent, so it may be necessary to sort
the file.
1
Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.
– Martin Thornton
Mar 29 at 23:17
I like it, I didnt know uniq could count!
– j0h
Mar 31 at 12:57
add a comment |
Maybe this is not what the OP want; however, if we know that the IP address length will be limited to 15 characters, a quicker way to display the counts with unique IPs from a huge log file can be achieved using uniq
command alone:
$ uniq -w 15 -c log
5 5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] ...
9 13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] ...
1 13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] ...
2 18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] ...
3 18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] ...
Options:
-w N
compares no more than N
characters in lines
-c
will prefix lines by the number of occurrences
Alternatively, For exact formatted output I prefer awk
(should also work for IPV6 addresses), ymmv.
$ awk 'NF { print $1 }' log | sort -h | uniq -c | awk '{printf "%s count: %dn", $2,$1 }'
5.135.134.16 count: 5
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
Note that uniq
won't detect repeated lines in the input file if they are not adjacent, so it may be necessary to sort
the file.
Maybe this is not what the OP want; however, if we know that the IP address length will be limited to 15 characters, a quicker way to display the counts with unique IPs from a huge log file can be achieved using uniq
command alone:
$ uniq -w 15 -c log
5 5.135.134.16 - - [23/Mar/2019:08:42:54 -0400] ...
9 13.57.220.172 - - [23/Mar/2019:11:01:05 -0400] ...
1 13.57.233.99 - - [23/Mar/2019:04:17:45 -0400] ...
2 18.206.226.75 - - [23/Mar/2019:21:58:07 -0400] ...
3 18.213.10.181 - - [23/Mar/2019:14:45:42 -0400] ...
Options:
-w N
compares no more than N
characters in lines
-c
will prefix lines by the number of occurrences
Alternatively, For exact formatted output I prefer awk
(should also work for IPV6 addresses), ymmv.
$ awk 'NF { print $1 }' log | sort -h | uniq -c | awk '{printf "%s count: %dn", $2,$1 }'
5.135.134.16 count: 5
13.57.220.172 count: 9
13.57.233.99 count: 1
18.206.226.75 count: 2
18.213.10.181 count: 3
Note that uniq
won't detect repeated lines in the input file if they are not adjacent, so it may be necessary to sort
the file.
edited Mar 31 at 12:13
answered Mar 29 at 18:38
Y. PradhanY. Pradhan
412
412
1
Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.
– Martin Thornton
Mar 29 at 23:17
I like it, I didnt know uniq could count!
– j0h
Mar 31 at 12:57
add a comment |
1
Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.
– Martin Thornton
Mar 29 at 23:17
I like it, I didnt know uniq could count!
– j0h
Mar 31 at 12:57
1
1
Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.
– Martin Thornton
Mar 29 at 23:17
Likely good enough in practice, but worth noting the corner cases. Only 6 probably constant characters after the IP ` - - [`. But in theory the address could be up to 8 characters shorter than the maximum so a change of date could split the count for such an IP. And as you hint, this won't work for IPv6.
– Martin Thornton
Mar 29 at 23:17
I like it, I didnt know uniq could count!
– j0h
Mar 31 at 12:57
I like it, I didnt know uniq could count!
– j0h
Mar 31 at 12:57
add a comment |
FWIW, Python 3:
from collections import Counter
with open('sample.log') as file:
counts = Counter(line.split()[0] for line in file)
for ip_address, count in counts.items():
print('%-15s count: %d' % (ip_address, count))
Output:
13.57.233.99 count: 1
18.213.10.181 count: 3
5.135.134.16 count: 5
18.206.226.75 count: 2
13.57.220.172 count: 9
add a comment |
FWIW, Python 3:
from collections import Counter
with open('sample.log') as file:
counts = Counter(line.split()[0] for line in file)
for ip_address, count in counts.items():
print('%-15s count: %d' % (ip_address, count))
Output:
13.57.233.99 count: 1
18.213.10.181 count: 3
5.135.134.16 count: 5
18.206.226.75 count: 2
13.57.220.172 count: 9
add a comment |
FWIW, Python 3:
from collections import Counter
with open('sample.log') as file:
counts = Counter(line.split()[0] for line in file)
for ip_address, count in counts.items():
print('%-15s count: %d' % (ip_address, count))
Output:
13.57.233.99 count: 1
18.213.10.181 count: 3
5.135.134.16 count: 5
18.206.226.75 count: 2
13.57.220.172 count: 9
FWIW, Python 3:
from collections import Counter
with open('sample.log') as file:
counts = Counter(line.split()[0] for line in file)
for ip_address, count in counts.items():
print('%-15s count: %d' % (ip_address, count))
Output:
13.57.233.99 count: 1
18.213.10.181 count: 3
5.135.134.16 count: 5
18.206.226.75 count: 2
13.57.220.172 count: 9
edited Mar 31 at 17:34
answered Mar 31 at 17:25
wjandreawjandrea
9,53142765
9,53142765
add a comment |
add a comment |
cut -f1 -d- my.log | sort | uniq -c
Explanation: Take the first field of my.log splitting on dashes -
and sort it. uniq
needs sorted input. -c
tells it to count occurrences.
add a comment |
cut -f1 -d- my.log | sort | uniq -c
Explanation: Take the first field of my.log splitting on dashes -
and sort it. uniq
needs sorted input. -c
tells it to count occurrences.
add a comment |
cut -f1 -d- my.log | sort | uniq -c
Explanation: Take the first field of my.log splitting on dashes -
and sort it. uniq
needs sorted input. -c
tells it to count occurrences.
cut -f1 -d- my.log | sort | uniq -c
Explanation: Take the first field of my.log splitting on dashes -
and sort it. uniq
needs sorted input. -c
tells it to count occurrences.
edited Mar 31 at 17:04
wjandrea
9,53142765
9,53142765
answered Mar 30 at 18:01
PhDPhD
101
101
add a comment |
add a comment |
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1129521%2fhow-to-count-occurrences-of-text-in-a-file%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
With “bash”, do you mean the plain shell or the command line in general?
– dessert
Mar 28 at 21:55
1
Do you have any database software available to use?
– SpacePhoenix
Mar 29 at 8:58
1
Related
– Julien Lopez
Mar 30 at 0:17
The log is from an appache2 server, not really a database. bash is what I would prefer, in a general use case. I see the python and perl solutions, if they are good for someone else, that is great. the initial sorting was done with
sort -V
though I think that wasn't required. I sent the top 10 abusers of the login page to the system admin with recommendations for banning respective subnets. for example, One IP hit the login page over 9000 times. that IP, & its class D subnet is now blacklisted. I'm sure we could automate this, though that is a different question.– j0h
Mar 31 at 19:36