I have an AWS EC2 instance, which has a docker container running elasticsearch. Every day, some process pushed new documents into elasticsearch. I want to push those documents to an S3 bucket the day after the document has entered elasticsearch.
I found this solution online:
https://github.com/AckeeDevOps/elasticsearch-backup-to-s3
I am new to elasticsearch, docker, and aws, so the steps provided in the git repository are not detailed enough for me to know what to do, so I am wondering if someone could verify my understanding of steps I need to take, below:
0) Clone the above git repository and set the following parameters:
ELASTICSEARCH_URL url with port where your elasticsearch runs, for example localhost:9200S3_URL contains address in S3 where to store backups bucket-name/directoryS3_ACCESS_KEYS3_SECRET_KEYCRON_SCHEDULE cron schedule string, default '0 2 * * *'
1) On my EC2 instance, do
docker build https://github.com/AckeeDevOps/elasticsearch-backup-to-s3
2) Then, I need to change the existing docker container that is running to have additional options
/var/backup/elasticsearch -p 9200:9200 -d elasticsearch -Des.path.repo=/var/backup/elasticsearch
3) I do not understand what this line does:
docker run --link elasticsearch:elasticsearch -e ELASTICSEARCH_URL="elasticsearch:9200" -e SNAPSHOT_VOLUME="/var/backup/elasticsearch" -e S3_URL="your S3 url" -e S3_ACCESS_KEY="your S3 access key" -e S3_SECRET_KEY="your S3 secret key"
Is there a more accepted/documented way of having elasticsearch data pushed to an external bucket continuously? Any piece of advice and/or illumination of the steps one would need to take to use the repository above would be appreciated.