{"id":1035,"date":"2015-09-01T15:33:17","date_gmt":"2015-09-01T13:33:17","guid":{"rendered":"https:\/\/elkano.org\/blog\/?p=1035"},"modified":"2015-09-01T15:33:17","modified_gmt":"2015-09-01T13:33:17","slug":"recover-ceph-backed-file-system-openstack-instance","status":"publish","type":"post","link":"https:\/\/elkano.org\/blog\/recover-ceph-backed-file-system-openstack-instance\/","title":{"rendered":"Recover ceph backed file system in OpenStack instance"},"content":{"rendered":"<p>I have had problems with one of my instances today. The instance does not reboot properly and when I checked it, I have been able to see the root of the problem, the file system was corrupted. To fix the instance disk file system I followed these steps:<\/p>\n<p>First go to compute node and see what instance is what you are looking for. You can get the instance-id number from <strong>nova show &lt;uuid&gt;<\/strong> command output. In my case the instance was 63.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-linenumbers=\"false\" data-enlighter-theme=\"enlighter\"># virsh list\r\nId Name State\r\n----------------------------------------------------\r\n15 instance-00000111 running\r\n34 instance-00000174 running\r\n61 instance-0000017d running\r\n63 instance-00000177 running\r\n<\/pre>\n<p>Now, because you want to change the file system, it is a good idea to stop or suspend the instance<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-linenumbers=\"false\" data-enlighter-theme=\"enlighter\"># virsh suspend 63\r\nDomain 63 suspended\r\n<\/pre>\n<p>The instance ephemeral disk is in a CEPH pool so you need to know the ceph image name used by the instance. This can be check in the instance xml definition file:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-linenumbers=\"false\" data-enlighter-theme=\"enlighter\">\/var\/lib\/nova\/instances\/7d6e2893-7151-4ce0-b706-6bab230ad586\/libvirt.xml\r\n<\/pre>\n<p>You are going to see some line like:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-linenumbers=\"false\" data-enlighter-theme=\"enlighter\">&lt;source protocol=&quot;rbd&quot; name=&quot;vmtier-10\/7d6e2893-7151-4ce0-b706-6bab230ad586_disk&quot;&gt;\r\n<\/pre>\n<p>Now, you need map the image with the rbd kernel module. You will want to do that in some server (it may be different from the compute node) with recent kernel version running and access to the ceph cluster.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-linenumbers=\"false\" data-enlighter-theme=\"enlighter\"># rbd -p vmtier-10 map 7d6e2893-7151-4ce0-b706-6bab230ad586_disk\r\n<\/pre>\n<p>You can see your mapped devices:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-linenumbers=\"false\" data-enlighter-theme=\"enlighter\"># rbd showmapped\r\nid pool image snap device\r\n1 vmtier-10 7d6e2893-7151-4ce0-b706-6bab230ad586_disk - \/dev\/rbd1\r\n<\/pre>\n<p>Assuming that the corrupted file system is in first partition of the disk you can fix it with:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-linenumbers=\"false\" data-enlighter-theme=\"enlighter\"># fsck.ext4 -f -y \/dev\/rbd1p1\r\n<\/pre>\n<p>Once the FS is fixed you can unmap the device<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-linenumbers=\"false\" data-enlighter-theme=\"enlighter\">rbd unmap \/dev\/rbd1\r\n<\/pre>\n<p>And resume the instance or start it if you have stopped it before.<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-linenumbers=\"false\" data-enlighter-theme=\"enlighter\"># virsh resume 63\r\nDomain 63 resumed\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>I have had problems with one of my instances today. The instance does not reboot properly and when I checked it, I have been able to see the root of the problem, the file system was corrupted. To fix the instance disk file system I followed these steps: First go to compute node and see [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[128],"tags":[121,149],"_links":{"self":[{"href":"https:\/\/elkano.org\/blog\/wp-json\/wp\/v2\/posts\/1035"}],"collection":[{"href":"https:\/\/elkano.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/elkano.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/elkano.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/elkano.org\/blog\/wp-json\/wp\/v2\/comments?post=1035"}],"version-history":[{"count":14,"href":"https:\/\/elkano.org\/blog\/wp-json\/wp\/v2\/posts\/1035\/revisions"}],"predecessor-version":[{"id":1049,"href":"https:\/\/elkano.org\/blog\/wp-json\/wp\/v2\/posts\/1035\/revisions\/1049"}],"wp:attachment":[{"href":"https:\/\/elkano.org\/blog\/wp-json\/wp\/v2\/media?parent=1035"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/elkano.org\/blog\/wp-json\/wp\/v2\/categories?post=1035"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/elkano.org\/blog\/wp-json\/wp\/v2\/tags?post=1035"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}