Own implement encoding and decoding base64 files in Python

python base64 encode file
base64 decode
base64 decode string python
python base64 decode utf-8
python3 base64 decode
python base64 decode file
python base64 encode image
python base64 encode bytes

I have a problem with my own implementation of base64 encoding. I have achieved to get the code below. It only works for text files with the English Letters, I suppose. For instance pdf file is encoded and decoded, it differs single characters.

def base64Encode(data):
    alphabet = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P","Q","R","S","T","U","V","W","X","Y","Z","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","0","1","2","3","4","5","6","7","8","9","+","/"]
    bit_str = ""      
    base64_str = ""

    for char in data:
        bin_char = bin(char).lstrip("0b")
        bin_char = bin_char.zfill(8)
        bit_str += bin_char 

    brackets = [bit_str[x:x+6] for x in range(0,len(bit_str),6)]

    for bracket in brackets:
        if(len(bracket) < 6):
            bracket = bracket + (6-len(bracket))*"0" 
        base64_str += alphabet[int(bracket,2)]

    # print(brackets[-4:])
    #if(bracket[-1:)
    #print(len(base64_str))
    #if(len(base64_str) != 76):
    #    base64_str += "="

    return base64_str

def base64Decode(text):
        alphabet = ["A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","0","1","2","3","4","5","6","7","8","9","+","/"]
        bit_str = ""
        text_str = ""

        for char in text:
            if char in alphabet:
                bin_char = bin(alphabet.index(char)).lstrip("0b")
                bin_char = bin_char.zfill(6)
                bit_str += bin_char

        brackets = [bit_str[x:x+8] for x in range(0,len(bit_str),8)]

        for bracket in brackets:
            text_str += chr(int(bracket,2))

        return text_str.encode("UTF-8")

w = open("encode.txt", "w") 
with open("bla.txt", "rb") as f:
    byte = f.read(57)
    while byte:
        w.write(base64Encode(byte))
        w.write("\n")
        byte = f.read(57)
    w.close()
f.close()

w = open("decode.txt", "wb") 
with open("encode.txt", "r") as f:
    byte = f.read(77)
    while byte:
        w.write(base64Decode(byte))
        byte = f.read(77)
    w.close()
f.close()

In my opinion, this line "return text_str.encode (" UTF-8 ")" should be without decoding to UTF-8. However, if you leave only "return text_str", gets error: TypeError: 'str' does not support the buffer interface.

bla.txt:

Phil Mercer reports on Cyclone Pam which has ravaged the Pacific nation of Vanuatu. Video courtesy of YouTube/Isso Nihmei at 350.org

Save the Children's Vanuatu country director Tom Skirrow said on Saturday: "The scene here this morning is complete devastation - houses are destroyed, trees are down, roads are blocked and people are wandering the streets looking for help.

ĄŚĆŹŻÓ

encode.txt

UGhpbCBNZXJjZXIgcmVwb3J0cyBvbiBDeWNsb25lIFBhbSB3aGljaCBoYXMgcmF2YWdlZCB0aGUg
UGFjaWZpYyBuYXRpb24gb2YgVmFudWF0dS4gVmlkZW8gY291cnRlc3kgb2YgWW91VHViZS9Jc3Nv
IE5paG1laSBhdCAzNTAub3JnDQoNClNhdmUgdGhlIENoaWxkcmVuJ3MgVmFudWF0dSBjb3VudHJ5
IGRpcmVjdG9yIFRvbSBTa2lycm93IHNhaWQgb24gU2F0dXJkYXk6ICJUaGUgc2NlbmUgaGVyZSB0
aGlzIG1vcm5pbmcgaXMgY29tcGxldGUgZGV2YXN0YXRpb24gLSBob3VzZXMgYXJlIGRlc3Ryb3ll
ZCwgdHJlZXMgYXJlIGRvd24sIHJvYWRzIGFyZSBibG9ja2VkIGFuZCBwZW9wbGUgYXJlIHdhbmRl
cmluZyB0aGUgc3RyZWV0cyBsb29raW5nIGZvciBoZWxwLg0KDQrEhMWaxIbFucW7w5M

decode.txt

Phil Mercer reports on Cyclone Pam which has ravaged the Pacific nation of Vanuatu. Video courtesy of YouTube/Isso Nihmei at 350.org

Save the Children's Vanuatu country director Tom Skirrow said on Saturday: "The scene here this morning is complete devastation - houses are destroyed, trees are down, roads are blocked and people are wandering the streets looking for help.

ÄÅÄŹŻÃ

The same text encoded by page: http://www.motobit.com/util/base64-decoder-encoder.asp

UGhpbCBNZXJjZXIgcmVwb3J0cyBvbiBDeWNsb25lIFBhbSB3aGljaCBoYXMgcmF2YWdlZCB0aGUg
UGFjaWZpYyBuYXRpb24gb2YgVmFudWF0dS4gVmlkZW8gY291cnRlc3kgb2YgWW91VHViZS9Jc3Nv
IE5paG1laSBhdCAzNTAub3JnDQoNClNhdmUgdGhlIENoaWxkcmVuJ3MgVmFudWF0dSBjb3VudHJ5
IGRpcmVjdG9yIFRvbSBTa2lycm93IHNhaWQgb24gU2F0dXJkYXk6ICJUaGUgc2NlbmUgaGVyZSB0
aGlzIG1vcm5pbmcgaXMgY29tcGxldGUgZGV2YXN0YXRpb24gLSBob3VzZXMgYXJlIGRlc3Ryb3ll
ZCwgdHJlZXMgYXJlIGRvd24sIHJvYWRzIGFyZSBibG9ja2VkIGFuZCBwZW9wbGUgYXJlIHdhbmRl
cmluZyB0aGUgc3RyZWV0cyBsb29raW5nIGZvciBoZWxwLg0KDQrEhMWaxIbFucW7w5M=

It is the same, except "=", which omitted to implement due to the error at the very beginning of the file.

And sample originale file in pdf:

%PDF-1.5
%µµµµ
1 0 obj
<</Type/Catalog/Pages 2 0 R/Lang(pl-PL) /StructTreeRoot 8 0 R/MarkInfo<</Marked true>>>>
endobj
2 0 obj
<</Type/Pages/Count 1/Kids[ 3 0 R] >>
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 595.32 841.92] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 0>>
endobj
4 0 obj
<</Filter/FlateDecode/Length 110>>
stream
xœUÌ­
€@ྰï0QËÝ®Èiž?(†kb°hòý«ZD˜4ßÀΨ*;…¡xº  ¨#"íªFrÄI!w…˜2ËQ81®D<™ÇS=Ó’léŠ82µ·>^åŒÊO-  >[´SÀ 
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/ABCDEE+Calibri/Encoding/WinAnsiEncoding/FontDescriptor 6 0 R/FirstChar 32/LastChar 97/Widths 15 0 R>>
endobj
6 0 obj
<</Type/FontDescriptor/FontName/ABCDEE+Calibri/Flags 32/ItalicAngle 0/Ascent 750/Descent -250/CapHeight 750/AvgWidth 521/MaxWidth 1743/FontWeight 400/XHeight 250/StemV 52/FontBBox[ -503 -250 1240 750] /FontFile2 16 0 R>>
endobj
7 0 obj

And after executing script:

%PDF-1.5
%µµµµ
1 0 obj
<</Type/Catalog/Pages 2 0 R/Lang(pl-PL) /StructTreeRoot 8 0 R/MarkInfo<</Marked true>>>>
endobj
2 0 obj
<</Type/Pages/Count 1/Kids[ 3 0 R] >>
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 595.32 841.92] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 0>>
endobj
4 0 obj
<</Filter/FlateDecode/Length 110>>
stream
xUÌ­
@ྰï0QËÝ®Èi?(kb°hòý«ZD4ßÀΨ*;¡xº  ¨#íªFrÄI!w2ËQ81®D<ÇS=Ólé82µ·>^åÊO-  >[´SÀ 
endstream
endobj
5 0 obj
<</Type/Font/Subtype/TrueType/Name/F1/BaseFont/ABCDEE+Calibri/Encoding/WinAnsiEncoding/FontDescriptor 6 0 R/FirstChar 32/LastChar 97/Widths 15 0 R>>
endobj
6 0 obj
<</Type/FontDescriptor/FontName/ABCDEE+Calibri/Flags 32/ItalicAngle 0/Ascent 750/Descent -250/CapHeight 750/AvgWidth 521/MaxWidth 1743/FontWeight 400/XHeight 250/StemV 52/FontBBox[ -503 -250 1240 750] /FontFile2 16 0 R>>
endobj
7 0 obj

The differences are for instance at the beginning of line 15 and 16.

My goal is to load the file and encode it in base64 and then decode and obtain the same file. Fit for use. I suppose that the error is in the data read or write or encoding. Any suggestions?

My first suggestion is troubleshooting: Determine if you are failing to encode or decode properly or both. Encode the file using a working utility and with your app and compare. Decode a properly encoded file with your app and with a working utility and compare. Second suggestion: Deal with the data as individual bytes, not text that may be interpreted as UTF-8.

Open the PDF file in binary mode. See Reading binary file in Python and looping over each byte on how to do that. Pass the raw bytes to your base64Encode. Do not use the bin function to convert from string to binary.

18.12. base64 — RFC 3548: Base16, Base32, Base64 Data , This module provides data encoding and decoding as specified in from file-like objects as well as strings, but only using the Base64 standard alphabet. The modern interface, which was introduced in Python 2.4, provides:. Using python to encode strings: In Python the base64 module is used to encode and decode data. First, the strings are converted into byte-like objects and then encoded using the base64 module. The below example shows the implementation of encoding strings isn’t base64 characters.

I was able to accomplish this task. Replace line .encode("UTF-8") on .encode ("latin-1") and It works at least for pdf files.

base64 — Base16, Base32, Base64, Base85 Data Encodings , It provides encoding and decoding functions for the encodings specified in it does provide functions for encoding and decoding to and from file objects. Decode the Base64 encoded bytes-like object or ASCII string s and return the decoded bytes . is framed with <~ and ~> , which is used by the Adobe implementation. To decode an image using Python, we simply use the base64.decodestring(s) function. Python mentions the following regarding this function: Decode the string s, which must contain one or more lines of base64 encoded data, and return a string containing the resulting binary data. So, in order to decode the image we encoded in the previous section, we do the following: base64.decodestring(image_64_encode)

I've modified the original code. This works on text, PNG and PDF, I haven't tried other file types, but I expect it will work on them.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sat Mar 16 07:38:19 2019

@author: tracyanne
"""
import os

class Base64():

    def __init__(self):

        ## We only need to do this once
        self.b64 = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P","Q","R","S","T","U","V","W","X","Y","Z","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","0","1","2","3","4","5","6","7","8","9","+","/"]


    def Encode(self, data):
        alphabet = self.b64
        bit_str = ""
        base64_str = ""

        for char in data:
            bin_char = bin(char).lstrip("0b")
            bin_char = bin_char.zfill(8)
            bit_str += bin_char

        brackets = [bit_str[x:x+6] for x in range(0,len(bit_str),6)]

        for bracket in brackets:
            if(len(bracket) < 6):
                bracket = bracket + (6-len(bracket))*"0"
            base64_str += alphabet[int(bracket,2)]

        ##Add padding characters to maintain compatibility with forced padding
        padding_indicator = len(base64_str) % 4
        if padding_indicator == 3:
            base64_str += "="
        elif  padding_indicator == 2:
            base64_str += "=="

        return base64_str

    def Decode(self, text, eof):
        alphabet = self.b64
        bit_str = ""
        text_str = ""

        for char in text:
            if char in alphabet:
                bin_char = bin(alphabet.index(char)).lstrip("0b")
                bin_char = bin_char.zfill(6)
                bit_str += bin_char

        brackets = [bit_str[x:x+8] for x in range(0,len(bit_str),8)]

        for bracket in brackets:
            ## When eof ignore last value in brackets to remove \x00
            if eof and brackets[len(brackets) -1] == bracket:
                pass
            else:
                text_str += chr(int(bracket,2))

        ## encode string as Latin-1 == ISO-8859-1
        return text_str.encode("ISO-8859-1")

    def base64Encode(self, inFile, outFile):
        w = open(outFile, "w")
        with open(inFile, "rb") as f:
            byte = f.read(57)
            while byte:
                w.write(self.Encode(byte))
                w.write("\n")
                byte = f.read(57)
            w.close()
        f.close()

    def base64Decode(self, inFile, outFile):
        ## Get size of input file for later comparison
        fsize = os.path.getsize(inFile)
        incsize = 0
        eof = False

        w = open(outFile, "wb")
        with open(inFile, "r") as f:
            byte = f.read(77)
            while byte:
                ## keep current dataread and if current data read ==
                ## input file size set eof True
                incsize += len(byte)
                if fsize - incsize == 0:
                    eof = True
                ## Pass in eof to Decode
                w.write(base64.base64Decode(byte, eof))
                byte = f.read(77)
            w.close()
        f.close()

Encoding and Decoding Base64 Strings in Python, In Python the base64 module is used to encode and decode data. First, the strings are converted into byte-like objects and then encoded using the base64 module. Now that we know how to Bas64 encode binary data in Python, let's move on Base64 decoding binary data. Decoding Binary Data with Python. Base64 decoding binary is similar to Base64 decoding text data. The key difference is that after we Base64 decode the string, we save the data as a binary file instead of a string.

codecs – String encoding and decoding, The next two lines encode the string as UTF-8 and UTF-16 respectively, and show the data encoding and decoding for you, so you don't have to create your own. python codecs_open_write.py utf-8 Writing to utf-8.txt File contents: 70 69 3a 20 cf 80 For example, Python includes codecs for working with base-64, bzip2,� Python Base64 URL safe Decode Example. The URL and Filename safe Base64 decoding is similar to the standard Base64 decoding except that it works with Base64’s URL and Filename safe Alphabet which uses hyphen (-) in place of + and underscore (_) in place of / character.

Algorithm Implementation/Miscellaneous/Base64, The traditional (MIME) base64 encoding and decoding processes are fairly simple to implement. Here an example using JavaScript is given, including the� First, we import the base64 module into our Python script. Once we have done so, we define a function, get_base64_encoded_image, that takes an image path as the parameter. When we have the image path, we use the open function to get a file object to the image that we wish to encode in Base64.

Python Base64, In this article, you'll learn how to Base64 encode a string in Python. can be used to convert an image file to a Python file with the image encoded as base64. If you don't know they're there, you can spend a while implementing your own� Rajeev Singh2 mins Python’s Base64 module provides functions to encode binary data to Base64 encoded format and decode such encodings back to binary data. It implements Base64 encoding and decoding as specified in RFC 3548. This article contains examples that demonstrate how to perform Base64 encoding in Python.

Comments
  • I have shown above that the encoding seems to be correct. Compared with an external encoder base64 given in the link. I tried to write bytes but there is a problem I described above. I use the wrong function?
  • Could you explain how it works and fix your editing?
  • Not sure what you mean by fix the editing. But save the Code as MyBase64.py the following is the code I used to test it