2016-12-05

ファイルの各行にUUIDを付加するスニペット

雑記 C

ログ収集ツールのfluentdのuuid_keyを使う場合、ログにUUIDを振る必要がある。

こんなときシェルスクリプトで、whileやfor、uuidgenを組み合わせて、追記していたが、行が膨大な場合、処理の遅延が気になったので、C言語で処理するプログラムを書いた。

プログラム

Ubuntu環境で、動かした。

依存ライブラリをインストールする。

$ sudo apt install uuid-dev

Cのプログラムは、こんな感じ。

#include <stdio.h>
#include <string.h>
#include <uuid/uuid.h>

int main(int argc, char *argv[]) {
  FILE *source, *target;
  char line[256];
  uuid_t uuid;
  char uuid_str[37];
  int len;

  source = fopen(argv[1], "r");
  target = fopen(argv[2], "w");
  while(fgets(line, 256, source)) {
    len = strlen(line);
    if (line[len-1] == '\n') {
      line[len-1] = 0;
    }
    uuid_generate_time_safe(uuid);
    uuid_unparse_lower(uuid, uuid_str);
    fprintf(target, "%s %s\n", line, uuid_str);
  }
  fclose(source);
  fclose(target);
  return 0;
 }

コンパイルする。

$ gcc -o append_uuid append_uuid.c -luuid

ベンチマーク計測

比較用のファイルを用意する。

$ for i in `seq 10000`; do echo "abc" >> uuid_by_bash.txt; echo "abc" >> uuid_by_c.txt; done
$ wc -l uuid_by_bash.txt uuid_by_c.txt
10000 uuid_by_bash.txt
10000 uuid_by_c.txt
20000 total

まずは、Bashでやる場合(これがBashのベストプラクティスかどうかは知らない。。)

$ time cat uuid_by_bash.txt | while read line; do uuid=$(uuidgen); echo "$line $uuid" >> uuid_by_bash.txt.new; done

real    0m25.741s
user    0m1.268s
sys 0m7.632s
$ tail -1 uuid_by_bash.txt.new
abc 006b69fd-9ad0-4749-afe1-eef73c6ffad0

次は、Cで書いたプログラム。

$ time ./append_uuid uuid_by_c.txt uuid_by_c.txt.new
uuid_by_c.txt uuid_by_c.txt.new
real    0m0.020s
user    0m0.016s
sys 0m0.000s
$ tail -1 uuid_by_c.txt.new
abc d05dc0b5-bad5-11e6-81bd-000c291c1598

クソ速い。

2016-12-05

プレミアリーグを得点時間帯ごとにクラスタリングする

機械学習

概要

スポーツの結果予想のために、最近、少しずつ機械学習に取り組んでいる。

今日は、機械学習 k-means（k平均法）を用いたクラスタリングを試してみる。

素材は、サッカーの時間帯別の得点率で、これを使いチームをクラスタリングする。

テスト用に作成した素材はLive scores, results, fixtures, tables, statistics and news - Soccerwayから、取得して、加工した。

gist.github.com

これは、総得点のうち、15分刻みの特定の時間帯に得点する割合を示している。

実装

K-meansでクラスタリングする最適なクラスタ数の決定方法には、X-meansという方法もあるが、

時間がかかりそうだったので、適当に4に設定してみた。

#coding: utf-8
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

df = pd.read_csv('uk-soccer-goal-rate-per-time-20161205.csv')
array = np.array([
df['0to15'].tolist(),
df['15to30'].tolist(),
df['30to45'].tolist(),
df['45to60'].tolist(),
df['60to75'].tolist(),
df['75to90'].tolist(),
], np.float)

array = array.T
predict = KMeans(n_clusters=4).fit_predict(array)
df['cluster_id'] = predict

print(df.sort_values(["cluster_id"])[["cluster_id", "Team"]])

clusterinfo = pd.DataFrame()
for i in range(4):
  clusterinfo['cluster' + str(i)] = df[df['cluster_id'] == i].mean()
clusterinfo = clusterinfo.drop('cluster_id')
my_plot = clusterinfo.T.plot(kind='bar', stacked=True, title="Stacked bar by cluster")
plt.show()

このスクリプトを実行すると、クラスタのIDとチーム名が表示される。

python k-means.py
    cluster_id               Team
17           0         sunderland
2            1          liverpool
5            1  manchester_united
0            2            chelsea
16           2    west_ham_united
15           2       middlebrough
13           2            burnley
12           2     crystal_palace
11           2        southampton
9            2    afc_bournemauth
6            2      west_bromwich
3            2    manchester_city
1            2            arsenal
19           2       swansea_city
10           3            watford
8            3         stoke_city
7            3            everton
14           3     leicester_city
4            3          tottenham
18           3          hull_city

また、グラフも表示され、こんな感じになった。

f:id:togattti1990:20161205080058p:plain

前半、後半に強いチームがあるのが分かるくらいで、実際の成績ともそこまで、相関はなさそう。本当なら、試合結果を予測するのに使えるデータがほしいが、まだ、難しそうだ。

# 参考

scikit-learn でクラスタ分析 (K-means 法) – Python でデータサイエンス

2016-11-23

Kibana5.0.0に、認証、認可を実装する

Kibana NGINX OpenResty Lua

概要

Kibanaを管理者以外の誰かに使わせる場合、そのユーザを認証し、閲覧できるインデックスを限定したいことがある。

Shieldという有償プラグインで認証、認可が可能らしいが、お金をかけずに実現したかった。

dev.classmethod.jp

今回は、OpenRestyを使い、認証、認可を付け加えた。

認証は、NginxのBasic認証を利用する。
認可は、luaで、Nginxをカスタマイズして、認証を経たユーザのHTTPリクエストを制御する。

環境

OS
- Ubuntu 16.04
Kibana5.0.0
- この記事では、192.168.0.2で動作する。

NginxとKibanaは同一のサーバ内で動作している。

Kibanaへの直アクセス禁止

KibanaのURLへ直接アクセスされると、認証、認可が適用されないので、外部から5601へのアクセスを閉じておく。

ただし、リバースプロキシ経由のアクセスは許可する。

# iptables -A INPUT -i lo -j ACCEPT
# iptables -A OUTPUT -o lo -j ACCEPT
# iptables -A INPUT -p tcp --dport 5601 -j REJECT

OpenRestyのセットアップ

良さげなGistがあったので、参考にした。

Easy install openresty (used and tested on Ubuntu 14.04, 15.10 and 16.04) · GitHub

最新版は、本家から持ってくる。

OpenResty - Download

認証

KibanaへのNginxのリバースプロキシ設定は、記事で書いたので貼っておく。

togattti.hateblo.jp

認証は、Basic認証をつけてるだけ。

手順

必要なモジュールを入手する。

# apt install apache2-utils

ユーザにパスワードを発行する。とりあえず、sales、support、techユーザにした。

# htpasswd -c -b /etc/nginx/.htpasswd sales salespasswd
# htpasswd -b /etc/nginx/.htpasswd support supportpasswd
# htpasswd -b /etc/nginx/.htpasswd tech techpasswd
# cat /etc/nginx/.htpasswd 
sales:$apr1$BMU.bsHb$c/jXRc1T3.keiYTtmdtua/
support:$apr1$Yvlq26fj$vtrYnqrc/XW/2WSRG6vlN.
tech:$apr1$Zqt0uiD3$xWO72SD10EFYB4Fq.JEw.1

locationディレクティブに、auth_basicとauth_basic_user_fileを追加して、nginxを再起動する。

(snip)
location ~ (/app/kibana|/bundles/|/status|/elasticsearch|/plugins|/timelion|/console|/api/) {
                auth_basic "Restricted";
                auth_basic_user_file "/etc/nginx/.htpasswd";
                proxy_pass http://192.168.0.2:5601;
                proxy_set_header        Host $host;
                proxy_set_header        X-Real-IP $remote_addr;
                proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header        X-Forwarded-Proto $scheme;
                proxy_set_header        X-Forwarded-Host $http_host;
}
(snip)

認可

インデックスsales-* 、support-* 、tech-*がある場合に、ユーザは関連するインデックスしか閲覧できないようにしたい。

f:id:togattti1990:20161123155035p:plain

例えば、ユーザsalesは、インデックスsales-*を閲覧できるが、インデックスsupport-* 、tech-* は閲覧できないようにする。

f:id:togattti1990:20161123155042p:plain

手順

Luaで書いたサンプルを示す。

Kibanaが、Elasticsearchに、インデックスを要求する際、HTTPリクエストのボディ部分に、インデックス名を含む。

それを利用して、アクセス拒否するようにした。

no_allow_indexesに、ユーザ名とアクセス制限するインデックスを記述する。

ngx.var.remote_userで、認証されたユーザを取得できる。

-- /etc/nginx/custom/kibana_simple_acl.lua
local no_allow_indexes = {
    sales = {
    "support-*",
    "tech-*"
    },
    support = {
    "tech-*",
    "sales-*"
    },
    tech = {
    "sales-*",
    "support-*"
    }
}

local user_ = ngx.var.remote_user
if user_ == nil then
    ngx.header.content_type = "text/plain"
    ngx.log(ngx.STDERR, "no user.")
    ngx.status(403)
    ngx.say("403 Forbidden: You do not have access to this page.")
    return ngx.exit(403)
end

user_check = false
for user, indexes in pairs(no_allow_indexes) do
    local p = string.match(user, user_)
    ngx.log(ngx.STDERR, string.format("user: %s, user_: %s", user, user_))
    if p then
        user_check = true
        ngx.req.read_body()
        local body_data = ngx.req.get_body_data()
        if body_data == nil then return end
        for _, index in pairs(indexes) do
            local matcher = ngx.re.match(body_data, index)
            if matcher then
                ngx.log(ngx.STDERR, string.format("User does not have access to %s", index))
                return ngx.exit(403)
            end
        end
    end
end

if not user_check then
    ngx.header.content_type = "text/plain"
    ngx.log(ngx.STDERR, string.format("invalid user: %s", user_))
    ngx.status = 403
    ngx.say("403 Forbidden: You do not have access to this page.")
    return ngx.exit(403)
end

locationディレクティブに、access_by_lua_fileを追加して、再起動する。

(snip)
location ~ (/app/kibana|/bundles/|/status|/elasticsearch|/plugins|/timelion|/console|/api/) {
                auth_basic "Restricted";
                auth_basic_user_file "/etc/nginx/.htpasswd";
                access_by_lua_file /etc/nginx/custom/kibana_simple_acl.lua;
                proxy_pass http://192.168.0.2:5601;
                proxy_set_header        Host $host;
                proxy_set_header        X-Real-IP $remote_addr;
                proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header        X-Forwarded-Proto $scheme;
                proxy_set_header        X-Forwarded-Host $http_host;
}
(snip)

認証されたユーザが認可されてないインデックスを含むDiscoverやDashBoardが閲覧できないことを確認する。

f:id:togattti1990:20161123155631p:plain

2016-11-19

KibanaをNginxのリバースプロキシで動かす

Elasticsearch Kibana NGINX

環境

Ubuntu 16.04
Elasticsearch 5.0.0 GA
Kibana 5.0.0

使用するIPは、192.168.0.2とする。

メモ

試行錯誤した結果、設定ファイルのLocationディレクティブをこうすると動作した。

location ~ (/app/kibana|/bundles/|/status|/elasticsearch|/plugins|/timelion|/console) {
           proxy_pass http://192.168.0.2:5601;
           proxy_set_header        Host $host;
           proxy_set_header        X-Real-IP $remote_addr;
           proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
           proxy_set_header        X-Forwarded-Proto $scheme;
           proxy_set_header        X-Forwarded-Host $http_host;
        }

2016-11-17

Elasticsearchクラスタの共有リポジトリ設定

Elasticsearch

Snapshot機能で、使用する共有リポジトリの設定方法を示す。

概要

クラスタに、共有リポジトリを設定するためには、どのノードからもアクセスできるようにファイルシステムを共有する必要がある。今回は、NFSで、共有ファイルサーバを構築し、共有リポジトリを設定することにした。

ファイルシステムを共有しないと、下記のように怒られる。

RemoteTransportException...
This might indicate that the store [(snip)] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node
RepositoryVerificationException[[snip] a file written by master to the store [snip] cannot be accessed on the node

環境

Ubuntu 16.04
Elasticsearch 5.0.0 GA
- ノード1(192.168.0.2)
- ノード2(192.168.0.3)

ノード1とノード2は、クラスタを組ませている。

手順

はじめに

リポジトリの置き場所をノード1、ノード2にそれぞれ生成する。

# mkdir -p /var/elasticsearch/snapshot
# chown nobody:nogroup /var/elasticsearch/snapshot

NFS周りの設定

今回は、ノード1上で、NFSサーバを起動させて、リポジトリデータを置くことにする。そして、ノード2上では、NFSクライアントを起動させる。

ノード1から作業する。 NFSサーバをインストールする。

# apt-get update && apt-get install nfs-kernel-server

/etc/exportsに追記する。

/var/elasticsearch/snapshot  192.168.0.3(rw,sync,no_subtree_check)

NFSサーバを再起動する。

# systemctl restart nfs-kernel-server

次に、ノード2で、NFSクライアントの設定をする。 NFSクライアントをインストールする。

# apt-get update && apt-get install nfs-common

NFSサーバ側のディレクトリをマウントする。

# mount 192.168.0.2:/var/elasticsearch/snapshot /var/elasticsearch/snapshot
# df -h
Filesystem                                 Size  Used Avail Use% Mounted on
(snip)
192.168.0.2:/var/elasticsearch/snapshot   63G   15G   46G  25% /var/elasticsearch/snapshot

リポジトリ生成

ノード1、ノード2それぞれのelasticsearch.ymlに、下記を追記する。

path.repo: ["/var/elasticsearch/snapshot"]

Elasticsearchを再起動する。

# systemctl restart elasticsearch

ノード1で、リポジトリを生成する。 /var/elasticsearch/snapshotがnobody:nogroupのままだと、snapshot1の権限がelasticsearch:elasticsearchにならないので、手動で変更しています。

# mkdir /var/elasticsearch/snapshot/snapshot1
# chown elasticsearch:elasticsearch /var/elasticsearch/snapshot/snapshot1
# curl -XPUT 'http://192.168.0.2:9200/_snapshot/snapshot1?pretty' -d '{
  "type": "fs",
  "settings": {
    "location" : "/var/elasticsearch/snapshot/snapshot1",
    "compress": true
  }}'
# curl -XGET http://192.168.0.2:9200/_snapshot/snapshot1?pretty
{
  "snapshot1" : {
    "type" : "fs",
    "settings" : {
      "compress" : "true",
      "location" : "/var/elasticsearch/snapshot/snapshot1"
    }
  }
}

togatttiのエンジニアメモ

過度な期待はしないでください.