Deploying processors
(by andrej, last updated: oct 2015)
Deploying processors
TLDR
$git push origin master:deploy
Migrations
When your code needs to make changes to the database make sure you run the migrations before the deploy. Migrating is not part of the deploy by default sou you need to run it on your local machine in production env prior the deploy.
Postgress
Run migrations in the oma repo and copy db/schema.rb over to the other two (oma-models/db oma-processors/db). Make sure you're deploying the updated schema.
Elasticsearch
Adding a new field is easy.Mention13.update_mapping only:[:new_field, :another_new_field]
Changing the configuration of an existing one is not. That requires re-indexing which literally means piping all data from the current index in the new one with the correct settings. Ask someone more experienced for help.
Putting the App into Pause
Putting the back end in pause should be done first. To do this, go to any production console and type $ `Oma::Resque.pause
Deploy
$git push origin master:deploy
See Papertail hot_update.sh kicking in
Post deploy tasks
Once you see that the chef recipe has ended on worker_2(10.0.31.35), ssh to that machine from the aud_server and change the hot_update.sh file to change the line from:
{"run_list": [ "["recipe[processor_box::worker_2_attributes]", "recipe[processor_box::deploy]", "recipe[processor_box::hotdeploy]"]","["role[processor_box_worker_2]"]" ]}'
to
'{"run_list": ["recipe[processor_box::worker_2_attributes]", "recipe[processor_box::deploy]", "recipe[processor_box::hotdeploy]"]}'
This is a temporary fix and the reasons for it are listed in the App maintenance crash course notes
You can check that the deploy has finished on all the machines by searching for the string 'end chef' in the paper trail log above. The number of unique instances of the string should match the number of workers configured as spot instances + 2(worker_1 & worker_2)
Check the resque worker count in this page to ensure the count is fine. It should be 10 per spot instance and a few for the two permanent worker instances.
Checking that everything is OK
Once everything has upgrades, the paused machines should be stating their version. Look for pause statements that don't have the correct version of the code. SSH to those boxes and sort out the problem.
Strategies to fix:
Kill zombie processes that didn't upgrade. Our boxes have pkill on them so you can do pkill -f "grep statement"
Reboot the box. Be careful that this isn't Worker_1 and we have running migrations on it.
Last updated