Loading [MathJax]/extensions/TeX/AMSmath.js

Tuesday, March 7, 2017

Distributing Jupyter Workbook with images

Jupyter is an awesome tool. It's an easy way to share information mixed with code, and the Jupyter extension associated with Chrome makes it even better to insert images.
However when images are included in a notebook, the file actually contains a link to the file:
 <img src="QERDFASRTSFDGSDTRWERDSFG.PNG"/>
which means that if you send the .ipynb file, the images will appear as broken.

The first solution that comes to mind is to zip the file and images in a file and distribute it. It works perfectly well, it's however rather inconvenient if there are many images. Also managing deleted images is quite painful.

The second solution is to embed the image data in the tag. This is possible with the data option in the image source:
<img src=\"data:image/png;base64,<data>\"/>

That unfortunately doesn't work: Jupyter sanitizes the code and removes the data. The result is still a broken image.

Here's a simple solution based on the fact that Jupyter doesn't sanitize Python code. It's is therefore possible to include images by writing in a code cell:
display(HTML('<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAMAAABEpIrGAAAAYFBMVEX///82aZT/zj7/2m7/1FZDcpv/88+btMra4+v/55//8MKCobxchahoj6//5JP/3Xv/12K0x9fy9vj/6qv/0UqnvdD/9tv/+ef/4Ib/7bdPfKHB0d6Oq8N0l7Xn7fLN2eQE1eizAAAA2klEQVQ4jZ3SSRaDIBAE0KYBFQxmUDMP979lVAiQUMkitXBhfeHRSPTKbrw5MUW79kEgjRYpLQArUVMbRV0CIZZHiEZgS2PaBIG3/AE+YP7qshJliJkHs/RbUHvA3E/9Tv8AwwRG2AfAFWXTy+OsB2YeMYhupAfr4vxLbk3ne5YZqO/x4DZ8z6wSaOyGy0wruPC9BfUCrmEBBYGJgyTY8yFOEgOZ7gKDyt+mO0OgZJVfenj7/UcBm/8NejIIpEHNel2CUwTHQ8dVCcimNRTqZ9JJxRtp9t/PCvIEy2AHqB9InbEAAAAASUVORK5CYII="/>'))
So in short the idea is to convert all images to a code cell such as the above.
To do this on an actual Jupyter notebook manually is rather tedious so I'm joining the following code that does the job, and adds some code to hide the added code:
import json,base64
import re
import sys


def embed_images(notebookname):
    notebook = json.load(open(notebookname,encoding='utf-8'))    
    file_re = re.compile('<img +src="(.*)"/>')
    for cell in notebook['cells']:
        if cell['cell_type']=='code':
            source = cell['source']
            if source and source[0] == '#dispimage__':
                print('found it')
                break
    else:
        notebook['cells'].insert(0,{
                "cell_type": "code", "execution_count": None,
                "metadata": {"collapsed": True},
                "outputs": [],
                "source": ['from IPython.core.display import HTML,display\n']
                })
        notebook['cells'].append({
                "cell_type": "code", "execution_count": None,"metadata": {"collapsed": True},
                "outputs": [],
                "source": [
     '# Embedded image display: all images are included in python code',
     '# so the file can be distributed without attached image files',
     'display(HTML("""',
     '<script>function sel(){return $("div.input:contains(\'#dispimage__\')")};',
     '$(function(){setTimeout(function() {sel().hide()}),3000});',
     '</script><button onclick="sel().toggle()">Show/hide</button>',
     '"""))']
                })
    # embed images
    cells = notebook['cells']
    for i in range(len(cells)):
        cell = cells[i]
        source = cell['source']
        if cell['cell_type']=='markdown' and len(source)==1:
            m = file_re.match(source[0])
            if m:
                filename = m.group(1)
                print(filename)
                s = open(filename,'rb').read()
                image = base64.b64encode(s)
                cells[i] = {
                    "cell_type": "code",
                    "execution_count": None,
                    "metadata": {
                        "collapsed": True
                    },
                    "outputs": [],
                    "source": [
     "display(HTML('<img src=\"data:image/png;base64,%s\"/> #dispimage__'))"%image.decode('utf-8')
                        ]
                    }

    with open('embed_'+notebookname,'w',encoding='utf-8') as f:
        json.dump(notebook,f)

if __name__ == '__main__':
    import sys
    embed_images(sys.argv[1])
All you need to do is to copy the above code in a file, then run
python [thatfilename.py] [yournotebook.ipynb]

It creates a new workbook starting with embed_. The script adds a button at the end of the notebook to hide or display the added code.

2 comments:

  1. Very useful Guillaume ! Excellent !

    Unfortunately I am not sure whether I run your code correctly ; actually I get the following error message when I run it :

    File "C:/Users/reda.merzouki/.spyder-py3/image_embbeding.py", line 70, in
    embed_images(sys.argv[1])

    IndexError: list index out of range

    Thanks for your help

    Kind regards,

    Reda

    ReplyDelete
    Replies
    1. Hi Reda,

      did you add the name of the ipython notebook file in the command? It seems that sys.argv is missing item 1.
      Assuming that the code above was saved in the file "embed.py", make sure to call the program as follows:
      python embed.py path/to/yournotebook.ipynb

      Kind regards
      Guillaume

      Delete