ElasticSearch
Elasticsearch migrations come from our own code and from an external gem. These were developed for two basic cases and extended into a third one.
Change data in the current index.
Pipe data into a new index with a different mapping.
Workflow
create a new migration class
Override the needed methods
I suggest you read the overridable methods and the initialize method from Oma::Elasticsearch::DataMigration so you understand what the options are.
setup a rake task to run the migration
Run the migration
commit the migration files
push it to github
ssh to worker_1
checkout oma-models code with the migration in it
Optional - pause processing:
bundle exec rake console OMA_ENV=production
Oma::Resque.pause
run it
sh # OMA_ENV=production nohup bundle exec rake es:migrations:add_keyword_in_url_to_rank>>
/var/log/app/add_keyword_in_url_to_rank.log 2>&1 &
The migration will run in the background and produce logs that are viewable on papertrail. The logs might be buffered and not real time.
Other notes
*_You can and should unit test migrations. *_There are examples in code. And it really boosts writing it make sure it works if you develop with a unit test. See in code for examples.
You should not run migrations connecting to Elasticsearch with authentication.At the time of writing authentications is implemented with an Apache proxy that has limits HTTP request sizes. Sipnce migrations work with batches that limit will prevent some data to be stored. That is one of the reasons we don't run them from development machines.
Migrating to a new index is problematicbecause it requires us to pause data generation. It would be better if we would migrate existing data and at the end keep moving newly generated data. Data could be done by fetching base on the updated_at field. This would allow us to pause processing during the time needed to switch to a new index.
** WHEN YOU CREATE A MIGRATION, PLEASE NAME IT WITH A DATE SO WE KNOW APPROXIMATELY WHEN IT WAS RUN.
**Also, on the main worker box there is TMUX. If you are running multiple migrations, it might make sense to start a TMUX session and tail the various workers, to make sure they are working. Splitting terminals and basic TMUX tutorial is listed here.http://lukaszwrobel.pl/blog/tmux-tutorial-split-terminal-windows-easily
Last updated