Deploying processors

(by andrej, last updated: oct 2015)

Deploying processors

TLDR

$git push origin master:deploy

Migrations

When your code needs to make changes to the database make sure you run the migrations before the deploy. Migrating is not part of the deploy by default sou you need to run it on your local machine in production env prior the deploy.

Postgress

Run migrations in the oma repo and copy db/schema.rb over to the other two (oma-models/db oma-processors/db). Make sure you're deploying the updated schema.

Elasticsearch

Adding a new field is easy.Mention13.update_mapping only:[:new_field, :another_new_field]

Changing the configuration of an existing one is not. That requires re-indexing which literally means piping all data from the current index in the new one with the correct settings. Ask someone more experienced for help.

Putting the App into Pause

Putting the back end in pause should be done first. To do this, go to any production console and type $ `Oma::Resque.pause

Deploy

$git push origin master:deploy

See Papertail hot_update.sh kicking in

Post deploy tasks

  • Once you see that the chef recipe has ended on worker_2(10.0.31.35), ssh to that machine from the aud_server and change the hot_update.sh file to change the line from:

{"run_list": [ "["recipe[processor_box::worker_2_attributes]", "recipe[processor_box::deploy]", "recipe[processor_box::hotdeploy]"]","["role[processor_box_worker_2]"]" ]}'

to

'{"run_list": ["recipe[processor_box::worker_2_attributes]", "recipe[processor_box::deploy]", "recipe[processor_box::hotdeploy]"]}'

This is a temporary fix and the reasons for it are listed in the App maintenance crash course notes

End Chef On Papertrail

  • You can check that the deploy has finished on all the machines by searching for the string 'end chef' in the paper trail log above. The number of unique instances of the string should match the number of workers configured as spot instances + 2(worker_1 & worker_2)

  • Check the resque worker count in this page to ensure the count is fine. It should be 10 per spot instance and a few for the two permanent worker instances.

Checking that everything is OK

Once everything has upgrades, the paused machines should be stating their version. Look for pause statements that don't have the correct version of the code. SSH to those boxes and sort out the problem.

Strategies to fix:

  • Kill zombie processes that didn't upgrade. Our boxes have pkill on them so you can do pkill -f "grep statement"

  • Reboot the box. Be careful that this isn't Worker_1 and we have running migrations on it.

Last updated