2016-12-05

プレミアリーグを得点時間帯ごとにクラスタリングする

機械学習

概要

スポーツの結果予想のために、最近、少しずつ機械学習に取り組んでいる。

今日は、機械学習 k-means（k平均法）を用いたクラスタリングを試してみる。

素材は、サッカーの時間帯別の得点率で、これを使いチームをクラスタリングする。

テスト用に作成した素材はLive scores, results, fixtures, tables, statistics and news - Soccerwayから、取得して、加工した。

gist.github.com

これは、総得点のうち、15分刻みの特定の時間帯に得点する割合を示している。

実装

K-meansでクラスタリングする最適なクラスタ数の決定方法には、X-meansという方法もあるが、

時間がかかりそうだったので、適当に4に設定してみた。

#coding: utf-8
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

df = pd.read_csv('uk-soccer-goal-rate-per-time-20161205.csv')
array = np.array([
df['0to15'].tolist(),
df['15to30'].tolist(),
df['30to45'].tolist(),
df['45to60'].tolist(),
df['60to75'].tolist(),
df['75to90'].tolist(),
], np.float)

array = array.T
predict = KMeans(n_clusters=4).fit_predict(array)
df['cluster_id'] = predict

print(df.sort_values(["cluster_id"])[["cluster_id", "Team"]])

clusterinfo = pd.DataFrame()
for i in range(4):
  clusterinfo['cluster' + str(i)] = df[df['cluster_id'] == i].mean()
clusterinfo = clusterinfo.drop('cluster_id')
my_plot = clusterinfo.T.plot(kind='bar', stacked=True, title="Stacked bar by cluster")
plt.show()

このスクリプトを実行すると、クラスタのIDとチーム名が表示される。

python k-means.py
    cluster_id               Team
17           0         sunderland
2            1          liverpool
5            1  manchester_united
0            2            chelsea
16           2    west_ham_united
15           2       middlebrough
13           2            burnley
12           2     crystal_palace
11           2        southampton
9            2    afc_bournemauth
6            2      west_bromwich
3            2    manchester_city
1            2            arsenal
19           2       swansea_city
10           3            watford
8            3         stoke_city
7            3            everton
14           3     leicester_city
4            3          tottenham
18           3          hull_city

また、グラフも表示され、こんな感じになった。

f:id:togattti1990:20161205080058p:plain

前半、後半に強いチームがあるのが分かるくらいで、実際の成績ともそこまで、相関はなさそう。本当なら、試合結果を予測するのに使えるデータがほしいが、まだ、難しそうだ。

# 参考

scikit-learn でクラスタ分析 (K-means 法) – Python でデータサイエンス

2016-11-23

Kibana5.0.0に、認証、認可を実装する

Kibana NGINX OpenResty Lua

概要

Kibanaを管理者以外の誰かに使わせる場合、そのユーザを認証し、閲覧できるインデックスを限定したいことがある。

Shieldという有償プラグインで認証、認可が可能らしいが、お金をかけずに実現したかった。

dev.classmethod.jp

今回は、OpenRestyを使い、認証、認可を付け加えた。

認証は、NginxのBasic認証を利用する。
認可は、luaで、Nginxをカスタマイズして、認証を経たユーザのHTTPリクエストを制御する。

環境

OS
- Ubuntu 16.04
Kibana5.0.0
- この記事では、192.168.0.2で動作する。

NginxとKibanaは同一のサーバ内で動作している。

Kibanaへの直アクセス禁止

KibanaのURLへ直接アクセスされると、認証、認可が適用されないので、外部から5601へのアクセスを閉じておく。

ただし、リバースプロキシ経由のアクセスは許可する。

# iptables -A INPUT -i lo -j ACCEPT
# iptables -A OUTPUT -o lo -j ACCEPT
# iptables -A INPUT -p tcp --dport 5601 -j REJECT

OpenRestyのセットアップ

良さげなGistがあったので、参考にした。

Easy install openresty (used and tested on Ubuntu 14.04, 15.10 and 16.04) · GitHub

最新版は、本家から持ってくる。

OpenResty - Download

認証

KibanaへのNginxのリバースプロキシ設定は、記事で書いたので貼っておく。

togattti.hateblo.jp

認証は、Basic認証をつけてるだけ。

手順

必要なモジュールを入手する。

# apt install apache2-utils

ユーザにパスワードを発行する。とりあえず、sales、support、techユーザにした。

# htpasswd -c -b /etc/nginx/.htpasswd sales salespasswd
# htpasswd -b /etc/nginx/.htpasswd support supportpasswd
# htpasswd -b /etc/nginx/.htpasswd tech techpasswd
# cat /etc/nginx/.htpasswd 
sales:$apr1$BMU.bsHb$c/jXRc1T3.keiYTtmdtua/
support:$apr1$Yvlq26fj$vtrYnqrc/XW/2WSRG6vlN.
tech:$apr1$Zqt0uiD3$xWO72SD10EFYB4Fq.JEw.1

locationディレクティブに、auth_basicとauth_basic_user_fileを追加して、nginxを再起動する。

(snip)
location ~ (/app/kibana|/bundles/|/status|/elasticsearch|/plugins|/timelion|/console|/api/) {
                auth_basic "Restricted";
                auth_basic_user_file "/etc/nginx/.htpasswd";
                proxy_pass http://192.168.0.2:5601;
                proxy_set_header        Host $host;
                proxy_set_header        X-Real-IP $remote_addr;
                proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header        X-Forwarded-Proto $scheme;
                proxy_set_header        X-Forwarded-Host $http_host;
}
(snip)

認可

インデックスsales-* 、support-* 、tech-*がある場合に、ユーザは関連するインデックスしか閲覧できないようにしたい。

f:id:togattti1990:20161123155035p:plain

例えば、ユーザsalesは、インデックスsales-*を閲覧できるが、インデックスsupport-* 、tech-* は閲覧できないようにする。

f:id:togattti1990:20161123155042p:plain

手順

Luaで書いたサンプルを示す。

Kibanaが、Elasticsearchに、インデックスを要求する際、HTTPリクエストのボディ部分に、インデックス名を含む。

それを利用して、アクセス拒否するようにした。

no_allow_indexesに、ユーザ名とアクセス制限するインデックスを記述する。

ngx.var.remote_userで、認証されたユーザを取得できる。

-- /etc/nginx/custom/kibana_simple_acl.lua
local no_allow_indexes = {
    sales = {
    "support-*",
    "tech-*"
    },
    support = {
    "tech-*",
    "sales-*"
    },
    tech = {
    "sales-*",
    "support-*"
    }
}

local user_ = ngx.var.remote_user
if user_ == nil then
    ngx.header.content_type = "text/plain"
    ngx.log(ngx.STDERR, "no user.")
    ngx.status(403)
    ngx.say("403 Forbidden: You do not have access to this page.")
    return ngx.exit(403)
end

user_check = false
for user, indexes in pairs(no_allow_indexes) do
    local p = string.match(user, user_)
    ngx.log(ngx.STDERR, string.format("user: %s, user_: %s", user, user_))
    if p then
        user_check = true
        ngx.req.read_body()
        local body_data = ngx.req.get_body_data()
        if body_data == nil then return end
        for _, index in pairs(indexes) do
            local matcher = ngx.re.match(body_data, index)
            if matcher then
                ngx.log(ngx.STDERR, string.format("User does not have access to %s", index))
                return ngx.exit(403)
            end
        end
    end
end

if not user_check then
    ngx.header.content_type = "text/plain"
    ngx.log(ngx.STDERR, string.format("invalid user: %s", user_))
    ngx.status = 403
    ngx.say("403 Forbidden: You do not have access to this page.")
    return ngx.exit(403)
end

locationディレクティブに、access_by_lua_fileを追加して、再起動する。

(snip)
location ~ (/app/kibana|/bundles/|/status|/elasticsearch|/plugins|/timelion|/console|/api/) {
                auth_basic "Restricted";
                auth_basic_user_file "/etc/nginx/.htpasswd";
                access_by_lua_file /etc/nginx/custom/kibana_simple_acl.lua;
                proxy_pass http://192.168.0.2:5601;
                proxy_set_header        Host $host;
                proxy_set_header        X-Real-IP $remote_addr;
                proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header        X-Forwarded-Proto $scheme;
                proxy_set_header        X-Forwarded-Host $http_host;
}
(snip)

認証されたユーザが認可されてないインデックスを含むDiscoverやDashBoardが閲覧できないことを確認する。

f:id:togattti1990:20161123155631p:plain

2016-11-19

KibanaをNginxのリバースプロキシで動かす

Elasticsearch Kibana NGINX

環境

Ubuntu 16.04
Elasticsearch 5.0.0 GA
Kibana 5.0.0

使用するIPは、192.168.0.2とする。

メモ

試行錯誤した結果、設定ファイルのLocationディレクティブをこうすると動作した。

location ~ (/app/kibana|/bundles/|/status|/elasticsearch|/plugins|/timelion|/console) {
           proxy_pass http://192.168.0.2:5601;
           proxy_set_header        Host $host;
           proxy_set_header        X-Real-IP $remote_addr;
           proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
           proxy_set_header        X-Forwarded-Proto $scheme;
           proxy_set_header        X-Forwarded-Host $http_host;
        }

2016-11-17

Elasticsearchクラスタの共有リポジトリ設定

Elasticsearch

Snapshot機能で、使用する共有リポジトリの設定方法を示す。

概要

クラスタに、共有リポジトリを設定するためには、どのノードからもアクセスできるようにファイルシステムを共有する必要がある。今回は、NFSで、共有ファイルサーバを構築し、共有リポジトリを設定することにした。

ファイルシステムを共有しないと、下記のように怒られる。

RemoteTransportException...
This might indicate that the store [(snip)] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node
RepositoryVerificationException[[snip] a file written by master to the store [snip] cannot be accessed on the node

環境

Ubuntu 16.04
Elasticsearch 5.0.0 GA
- ノード1(192.168.0.2)
- ノード2(192.168.0.3)

ノード1とノード2は、クラスタを組ませている。

手順

はじめに

リポジトリの置き場所をノード1、ノード2にそれぞれ生成する。

# mkdir -p /var/elasticsearch/snapshot
# chown nobody:nogroup /var/elasticsearch/snapshot

NFS周りの設定

今回は、ノード1上で、NFSサーバを起動させて、リポジトリデータを置くことにする。そして、ノード2上では、NFSクライアントを起動させる。

ノード1から作業する。 NFSサーバをインストールする。

# apt-get update && apt-get install nfs-kernel-server

/etc/exportsに追記する。

/var/elasticsearch/snapshot  192.168.0.3(rw,sync,no_subtree_check)

NFSサーバを再起動する。

# systemctl restart nfs-kernel-server

次に、ノード2で、NFSクライアントの設定をする。 NFSクライアントをインストールする。

# apt-get update && apt-get install nfs-common

NFSサーバ側のディレクトリをマウントする。

# mount 192.168.0.2:/var/elasticsearch/snapshot /var/elasticsearch/snapshot
# df -h
Filesystem                                 Size  Used Avail Use% Mounted on
(snip)
192.168.0.2:/var/elasticsearch/snapshot   63G   15G   46G  25% /var/elasticsearch/snapshot

リポジトリ生成

ノード1、ノード2それぞれのelasticsearch.ymlに、下記を追記する。

path.repo: ["/var/elasticsearch/snapshot"]

Elasticsearchを再起動する。

# systemctl restart elasticsearch

ノード1で、リポジトリを生成する。 /var/elasticsearch/snapshotがnobody:nogroupのままだと、snapshot1の権限がelasticsearch:elasticsearchにならないので、手動で変更しています。

# mkdir /var/elasticsearch/snapshot/snapshot1
# chown elasticsearch:elasticsearch /var/elasticsearch/snapshot/snapshot1
# curl -XPUT 'http://192.168.0.2:9200/_snapshot/snapshot1?pretty' -d '{
  "type": "fs",
  "settings": {
    "location" : "/var/elasticsearch/snapshot/snapshot1",
    "compress": true
  }}'
# curl -XGET http://192.168.0.2:9200/_snapshot/snapshot1?pretty
{
  "snapshot1" : {
    "type" : "fs",
    "settings" : {
      "compress" : "true",
      "location" : "/var/elasticsearch/snapshot/snapshot1"
    }
  }
}

2016-11-09

Elasticsearchのデータをバックアップして、別ノードにリストアする

Elasticsearch

Elasticsearchを運用していて、あるノードのデータをバックアップして、それを別のノードにリストアする方法を探した。

以前は、elasticsearch-knapsackやesclientといった方法があったらしい。

ただ、Elasticsearch自体の機能に、バックアップ、リストアの機能が備わっている。

技術の進歩って早いね！！

はじめに

192.168.0.2
- データ移動元ノード
192.168.0.3
- データ移動先ノード、インストール後の状態で用意。

共に、Ubuntu 16.04.1 LTS、 Elasticsearch 2.4.1で構築している。

データのバックアップ

バックアップは、Snapshotを使う。作業は、移動元ノードで行う。

www.elastic.co

バックアップディレクトリ作成

# mkdir -p /var/elasticsearch/backups
# chown -R elasticsearch:elasticsearch /var/elasticsearch

リポジトリ作成

# curl -XPUT -uadmin:admin 'http://192.168.0.2:9200/_snapshot/backup1' -d '{
  "type": "fs",
  "settings": {
    "location" : "/var/elasticsearch/backups/backup1",
    "compress": true
  }
}'

設定ファイル編集

# diff -u  /etc/elasticsearch/elasticsearch.yml.20161109/etc/elasticsearch/elasticsearch.yml
(snip)
+path.repo: ["/var/elasticsearch/backups"]
(snip)

Elasticsearchを再起動

# systemctl restart elasticsearch

Snapshot取得

# curl -XPUT 'http://192.168.0.2:9200/_snapshot/backup1/backup-2016.11.09?wait_for_completion=true' -d '{
"indices": "*",
"ignore_unavailable": true,
"include_global_state": false
}'
# curl -XGET -uadmin:admin 'http://192.168.0.2:9200/_snapshot/backup1/backup-2016.11.09?pretty'
{
  "snapshots" : [ {
    "snapshot" : "backup-2016.11.09",
    "version_id" : 2040199,
    "version" : "2.4.1",
    "indices" : [ (snip) ],
    "state" : "SUCCESS",
    "start_time" : "2016-11-09T01:47:12.035Z",
    "start_time_in_millis" : 1478656032035,
    "end_time" : "2016-11-09T01:52:47.587Z",
    "end_time_in_millis" : 1478656367587,
    "duration_in_millis" : 335552,
    "failures" : [ ],
    "shards" : {
      "total" : 717,
      "failed" : 0,
      "successful" : 717
    }
  } ]
}

stateが"SUCCESS"になっていることを確認する。

# ls -l /var/elasticsearch/backups/backup1/
total 32
-rw-r--r--   1 elasticsearch elasticsearch    59 11月  9 10:52 index
drwxr-xr-x 147 elasticsearch elasticsearch 12288 11月  9 10:46 indices
-rw-r--r--   1 elasticsearch elasticsearch   103 11月  9 10:45 meta-snapshot-2016.11.09.dat
-rw-r--r--   1 elasticsearch elasticsearch  2133 11月  9 10:46 snap-snapshot-2016.11.09.dat

バックアップディレクトリに、データが作られていることを確認する。

バックアップデータを移動先ノードに転送

# scp -pr /var/elasticsearch 192.168.0.3:/var/

データのリストア

以下、移動先ノードで行う。

所有権、グループ権変更

# chown -R elasticsearch:elasticsearch /var/elasticsearch

リポジトリ作成

# curl -XPUT 'http://192.168.0.3:9200/_snapshot/backup1' -d '{
  "type": "fs",
  "settings": {
    "location" : "/var/elasticsearch/backups/backup1",
    "compress": true
  }
}'

設定ファイル編集

# diff -u  /etc/elasticsearch/elasticsearch.yml.20161109 /etc/elasticsearch/elasticsearch.yml
(snip)
+path.repo: ["/var/elasticsearch/backups"]
(snip)

Elasticsearchを再起動

# systemctl restart elasticsearch

リストア

# curl -XPOST 'http://192.168.0.3:9200/_snapshot/backup1/backup-2016.11.09/_restore' -d '{
"indices": "*",
"ignore_unavailable": true,
"include_global_state": false
}'

参考

Elasticsearchのバックアップとリストア - Qiita

togatttiのエンジニアメモ

過度な期待はしないでください.

プレミアリーグを得点時間帯ごとにクラスタリングする

概要

実装

Kibana5.0.0に、認証、認可を実装する

概要

環境

Kibanaへの直アクセス禁止

OpenRestyのセットアップ

認証

手順

認可

手順

KibanaをNginxのリバースプロキシで動かす

環境

メモ

Elasticsearchクラスタの共有リポジトリ設定

概要

環境

手順

はじめに

NFS周りの設定

リポジトリ生成

Elasticsearchのデータをバックアップして、別ノードにリストアする

はじめに

データのバックアップ

バックアップディレクトリ作成

リポジトリ作成

設定ファイル編集

Elasticsearchを再起動

Snapshot取得

バックアップデータを移動先ノードに転送

データのリストア

所有権、グループ権変更

リポジトリ作成

設定ファイル編集

Elasticsearchを再起動

リストア

参考