how to set the primary key when writing a pandas dataframe to a sqlite database table using df.to_sql

I have created a sqlite database using pandas df.to_sql however accessing it seems considerably slower than just reading in the 500mb csv file.

I need to:

  1. set the primary key for each table using the df.to_sql method
  2. tell the sqlite database what datatype each of the columns in my 3.dataframe are? - can I pass a list like [integer,integer,text,text]

code.... (format code button not working)

if ext == ".csv": 
df = pd.read_csv("/Users/data/" +filename) 
columns = df.columns columns = [i.replace(' ', '_') for i in columns]

df.columns = columns
df.to_sql(name,con,flavor='sqlite',schema=None,if_exists='replace',index=True,index_label=None, chunksize=None, dtype=None)

Unfortunately there is no way right now to set a primary key in the pandas df.to_sql() method. Additionally, just to make things more of a pain there is no way to set a primary key on a column in sqlite after a table has been created.

However, a work around at the moment is to create the table in sqlite with the pandas df.to_sql() method. Then you could create a duplicate table and set your primary key followed by copying your data over. Then drop your old table to clean up.

It would be something along the lines of this.

import pandas as pd
import sqlite3

df = pd.read_csv("/Users/data/" +filename) 
columns = df.columns columns = [i.replace(' ', '_') for i in columns]

#write the pandas dataframe to a sqlite table
df.columns = columns
df.to_sql(name,con,flavor='sqlite',schema=None,if_exists='replace',index=True,index_label=None, chunksize=None, dtype=None)

#connect to the database
conn = sqlite3.connect('database')
c = conn.curser()

c.executescript('''
    PRAGMA foreign_keys=off;

    BEGIN TRANSACTION;
    ALTER TABLE table RENAME TO old_table;

    /*create a new table with the same column names and types while
    defining a primary key for the desired column*/
    CREATE TABLE new_table (col_1 TEXT PRIMARY KEY NOT NULL,
                            col_2 TEXT);

    INSERT INTO new_table SELECT * FROM old_table;

    DROP TABLE old_table;
    COMMIT TRANSACTION;

    PRAGMA foreign_keys=on;''')

#close out the connection
c.close()
conn.close()

In the past I have done this as I have faced this issue. Just wrapped the whole thing as a function to make it more convenient...

In my limited experience with sqlite I have found that not being able to add a primary key after a table has been created, not being able to perform Update Inserts or UPSERTS, and UPDATE JOIN has caused a lot of frustration and some unconventional workarounds.

Lastly, in the pandas df.to_sql() method there is a a dtype keyword argument that can take a dictionary of column names:types. IE: dtype = {col_1: TEXT}

pandas.DataFrame.to_sql, Write records stored in a DataFrame to a SQL database. Create a table from scratch with 3 rows. >>> df = pd.DataFrame({'name' : ['User 1', 'User 2', 'User 3']}) >  I iterate thru the dict of DataFrames, get a list of the columns to use for the primary key (i.e. those containing id), use get_schema to create the empty tables then append the DataFrame to the table.

Building on Chris Guarino's answer, here's some functions that provide a more general solution. See the example at the bottom for how to use them.

import re

def get_create_table_string(tablename, connection):
    sql = """
    select * from sqlite_master where name = "{}" and type = "table"
    """.format(tablename) 
    result = connection.execute(sql)

    create_table_string = result.fetchmany()[0][4]
    return create_table_string

def add_pk_to_create_table_string(create_table_string, colname):
    regex = "(\n.+{}[^,]+)(,)".format(colname)
    return re.sub(regex, "\\1 PRIMARY KEY,",  create_table_string, count=1)

def add_pk_to_sqlite_table(tablename, index_column, connection):
    cts = get_create_table_string(tablename, connection)
    cts = add_pk_to_create_table_string(cts, index_column)
    template = """
    BEGIN TRANSACTION;
        ALTER TABLE {tablename} RENAME TO {tablename}_old_;

        {cts};

        INSERT INTO {tablename} SELECT * FROM {tablename}_old_;

        DROP TABLE {tablename}_old_;

    COMMIT TRANSACTION;
    """

    create_and_drop_sql = template.format(tablename = tablename, cts = cts)
    connection.executescript(create_and_drop_sql)

# Example:

# import pandas as pd 
# import sqlite3

# df = pd.DataFrame({"a": [1,2,3], "b": [2,3,4]})
# con = sqlite3.connect("deleteme.db")
# df.to_sql("df", con, if_exists="replace")

# add_pk_to_sqlite_table("df", "index", con)
# r = con.execute("select sql from sqlite_master where name = 'df' and type = 'table'")
# print(r.fetchone()[0])

There is a gist of this code here

pandas.DataFrame.to_sql, Write records stored in a DataFrame to a SQL database. The keys should be the column names and the values should be the SQLAlchemy types from sqlalchemy import create_engine >>> engine = create_engine('sqlite://', echo=​False). Create a table from scratch with 3 rows. >>> df = pd.DataFrame({'name' : ['​User 1',  I want to store a table into postgres 8.4 (edited previously stated:8.1) using pandas.DataFrame.to_sql via sqlalchemy using the following command. df.to_sql("table_name",engine) I am using postgres. How can I get set a PRIMARY KEY in this table that I am adding to the database?

In Sqlite, with a normal rowid table, unless the primary key is a single INTEGER column (See ROWIDs and the INTEGER PRIMARY KEY in the documentation), it's equivalent to a UNIQUE index (Because the real PK of a normal table is the rowid).

Notes from the documentation for rowid tables:

The PRIMARY KEY of a rowid table (if there is one) is usually not the true primary key for the table, in the sense that it is not the unique key used by the underlying B-tree storage engine. The exception to this rule is when the rowid table declares an INTEGER PRIMARY KEY. In the exception, the INTEGER PRIMARY KEY becomes an alias for the rowid.

The true primary key for a rowid table (the value that is used as the key to look up rows in the underlying B-tree storage engine) is the rowid.

The PRIMARY KEY constraint for a rowid table (as long as it is not the true primary key or INTEGER PRIMARY KEY) is really the same thing as a UNIQUE constraint. Because it is not a true primary key, columns of the PRIMARY KEY are allowed to be NULL, in violation of all SQL standards.

So you can easily fake a primary key after creating the table with:

CREATE UNIQUE INDEX mytable_fake_pk ON mytable(pk_column)

Besides the NULL thing, you won't get the benefits of an INTEGER PRIMARY KEY if your column is supposed to hold integers, like taking up less space and auto-generating values on insert if left out, but it'll otherwise work for most purposes.

Adding (Insert or update if key exists) option to `.to_sql` · Issue , def create_update_query(df, table=FACT_TABLE): """This function takes the I have a pandas data frame with the same columns but updated values: When using to_sql(), continue if duplicate primary keys are detected? As for SQlite (​and other databases types allowing a similar upsert syntax) it uses  Inserting Pandas DataFrames into a Database Using the to_sql() Function. Now let’s try to do the same thing — insert a pandas DataFrame into a MySQL database — using a different technique. This time, we’ll use the module sqlalchemy to create our connection and the to_sql() function to insert our data.

How to Create a Database in Python using sqlite3, import sqlite3 conn = sqlite3.connect('TestDB.db') # You can create a new TABLE CLIENTS ([generated_id] INTEGER PRIMARY KEY,[Client_Name] text, into the tables using the to_sql command; Assign the SQL fields into the DataFrame the INSERT QUERY into the table 'DAILY_STATUS' # export_csv = df.to_csv  Write records stored in a DataFrame to a SQL database. Databases supported by SQLAlchemy are supported. Tables can be newly created, appended to, or overwritten. Parameters name str. Name of SQL table. con sqlalchemy.engine.Engine or sqlite3.Connection. Using SQLAlchemy makes it possible to use any DB supported by that library.

Pandas / sqlite3: Change part of pandas dataframe and replace in , With pandas it is only possible to read a query to a DataFrame and to write a key when writing a pandas dataframe to a sqlite database table using df.to_sql I need to: set the primary key for each table using the df.to_sql method tell the  I'm using sqlalchemy in pandas to query postgres database and then insert results of a transformation to another table on the same database. But when I do df.to_sql('db_table2', engine) I get this

"""SQL io tests The SQL tests are broken down in different classes , The different tested flavors (sqlite3, MySQL, PostgreSQL) derive from the base class is_datetime64_dtype, is_datetime64tz_dtype) from pandas import DataFrame, MetaData(bind=self.conn) meta.reflect() table_list = meta.tables.​keys() return write dataframe with different if_exists options df.to_sql('​test_schema_other',  Communicating with the database to load the data and read from the database is now possible using Python pandas module. Python Pandas module provides the easy to store data structure in Python, similar to the relational table format, called Dataframe. Pandas is a very powerful Python module for handling data structures and doing data analysis.