Thursday, 25 October 2018

Mysql: MyISAM versus InnoDB

I'm working on a projects which involves a lot of database writes, I'd say (70% inserts and 30% reads). This ratio would also include updates which I consider to be one read and one write. The reads can be dirty (e.g. I don't need 100% accurate information at the time of read).
The task in question will be doing over 1 million database transactions an hour.
I've read a bunch of stuff on the web about the differences between MyISAM and InnoDB, and MyISAM seems like the obvious choice to me for the particular database/tables that I'll be using for this task. From what I seem to be reading, InnoDB is good if transactions are needed since row level locking is supported.
Does anybody have any experience with this type of load (or higher)? Is MyISAM the way to go?

 Answers

now its time for Database. the major thing that beginners are curioues to know what are the difference between InnoDB and MyISAM. when where to use when.

The Simple is that to show By The Table.

My ISAMInnoDB
Required full text SearchYes
Require TransactionsYes
frequent select queriesYes
frequent insert,update,deleteYes
Row Locking (multi processing on single table)Yes
Relational base designYes


if there is any doubt which to use when. then MyISAM is your default choice.
you can also mix different table types in same database. Hence it  is recommended.

I have briefly discussed above this question in a table so you can conclude whether to go with InnoDB or MyISAM.
Here is a small overview of which db storage engine you should use in which situation:
                                                 MyISAM   InnoDB
----------------------------------------------------------------
Required full-text search                        Yes      5.6.4
----------------------------------------------------------------
Require transactions                                      Yes
----------------------------------------------------------------
Frequent select queries                          Yes      
----------------------------------------------------------------
Frequent insert, update, delete                           Yes
----------------------------------------------------------------
Row locking (multi processing on single table)            Yes
----------------------------------------------------------------
Relational base design                                    Yes
To summarize:
Frequent reading, almost no writing   => MyISAM
Full-text search in MySQL <= 5.5      => MyISAM
In all other circumstances, InnoDB is usually the best way to go.



People often talk about performance, reads vs. writes, foreign keys, etc. but there's one other must-have feature for a storage engine in my opinion: atomic updates.
Try this:
  1. Issue an UPDATE against your MyISAM table that takes 5 seconds.
  2. While the UPDATE is in progress, say 2.5 seconds in, hit Ctrl-C to interrupt it.
  3. Observe the effects on the table. How many rows were updated? How many were not updated? Is the table even readable, or was it corrupted when you hit Ctrl-C?
  4. Try the same experiment with UPDATE against an InnoDB table, interrupting the query in progress.
  5. Observe the InnoDB table. Zero rows were updated. InnoDB has assured you have atomic updates, and if the full update could not be committed, it rolls back the whole change. Also, the table is not corrupt. This works even if you use killall -9 mysqld to simulate a crash.
Performance is desirable of course, but not losing data should trump that.



For a load with more writes and reads, you will benefit from InnoDB. Because InnoDB provides row-locking rather than table-locking, your SELECTs can be concurrent, not just with each other but also with many INSERTs. However, unless you are intending to use SQL transactions, set the InnoDB commit flush to 2 (innodb_flush_log_at_trx_commit). This gives you back a lot of raw performance that you would otherwise lose when moving tables from MyISAM to InnoDB.
Also, consider adding replication. This gives you some read scaling and since you stated your reads don't have to be up-to-date, you can let the replication fall behind a little. Just be sure that it can catch up under anything but the heaviest traffic or it will always be behind and will never catch up. If you go this way, however, I strongly recommend you isolate reading from the slaves and replication lag management to your database handler. It is so much simpler if the application code does not know about this.
Finally, be aware of different table loads. You will not have the same read/write ratio on all tables. Some smaller tables with near 100% reads could afford to stay MyISAM. Likewise, if you have some tables that are near 100% write, you may benefit from INSERT DELAYED, but that is only supported in MyISAM (the DELAYED clause is ignored for an InnoDB table).
But benchmark to be sure.



To add to the wide selection of responses here covering the mechanical differences between the two engines, I present an empirical speed comparison study.
In terms of pure speed, it is not always the case that MyISAM is faster than InnoDB but in my experience it tends to be faster for PURE READ working environments by a factor of about 2.0-2.5 times. Clearly this isn't appropriate for all environments - as others have written, MyISAM lacks such things as transactions and foreign keys.
I've done a bit of benchmarking below - I've used python for looping and the timeit library for timing comparisons. For interest I've also included the memory engine, this gives the best performance across the board although it is only suitable for smaller tables (you continually encounter The table 'tbl' is full when you exceed the MySQL memory limit). The four types of select I look at are:
  1. vanilla SELECTs
  2. counts
  3. conditional SELECTs
  4. indexed and non-indexed sub-selects
Firstly, I created three tables using the following SQL
CREATE TABLE
    data_interrogation.test_table_myisam
    (
        index_col BIGINT NOT NULL AUTO_INCREMENT,
        value1 DOUBLE,
        value2 DOUBLE,
        value3 DOUBLE,
        value4 DOUBLE,
        PRIMARY KEY (index_col)
    )
    ENGINE=MyISAM DEFAULT CHARSET=utf8
with 'MyISAM' substituted for 'InnoDB' and 'memory' in the second and third tables.

1) Vanilla selects

Query: SELECT * FROM tbl WHERE index_col = xx
Result: draw
The speed of these is all broadly the same, and as expected is linear in the number of columns to be selected. InnoDB seems slightly faster than MyISAM but this is really marginal.
Code:
import timeit
import MySQLdb
import MySQLdb.cursors
import random
from random import randint

db = MySQLdb.connect(host="...", user="...", passwd="...", db="...", cursorclass=MySQLdb.cursors.DictCursor)
cur = db.cursor()

lengthOfTable = 100000

# Fill up the tables with random data
for x in xrange(lengthOfTable):
    rand1 = random.random()
    rand2 = random.random()
    rand3 = random.random()
    rand4 = random.random()

    insertString = "INSERT INTO test_table_innodb (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"
    insertString2 = "INSERT INTO test_table_myisam (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"
    insertString3 = "INSERT INTO test_table_memory (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"

    cur.execute(insertString)
    cur.execute(insertString2)
    cur.execute(insertString3)

db.commit()

# Define a function to pull a certain number of records from these tables
def selectRandomRecords(testTable,numberOfRecords):

    for x in xrange(numberOfRecords):
        rand1 = randint(0,lengthOfTable)

        selectString = "SELECT * FROM " + testTable + " WHERE index_col = " + str(rand1)
        cur.execute(selectString)

setupString = "from __main__ import selectRandomRecords"

# Test time taken using timeit
myisam_times = []
innodb_times = []
memory_times = []

for theLength in [3,10,30,100,300,1000,3000,10000]:

    innodb_times.append( timeit.timeit('selectRandomRecords("test_table_innodb",' + str(theLength) + ')', number=100, setup=setupString) )
    myisam_times.append( timeit.timeit('selectRandomRecords("test_table_myisam",' + str(theLength) + ')', number=100, setup=setupString) )
    memory_times.append( timeit.timeit('selectRandomRecords("test_table_memory",' + str(theLength) + ')', number=100, setup=setupString) )

2) Counts

Query: SELECT count(*) FROM tbl
Result: MyISAM wins
This one demonstrates a big difference between MyISAM and InnoDB - MyISAM (and memory) keeps track of the number of records in the table, so this transaction is fast and O(1). The amount of time required for InnoDB to count increases super-linearly with table size in the range I investigated. I suspect many of the speed-ups from MyISAM queries that are observed in practice are due to similar effects.
Code:
myisam_times = []
innodb_times = []
memory_times = []

# Define a function to count the records
def countRecords(testTable):

    selectString = "SELECT count(*) FROM " + testTable
    cur.execute(selectString)

setupString = "from __main__ import countRecords"

# Truncate the tables and re-fill with a set amount of data
for theLength in [3,10,30,100,300,1000,3000,10000,30000,100000]:

    truncateString = "TRUNCATE test_table_innodb"
    truncateString2 = "TRUNCATE test_table_myisam"
    truncateString3 = "TRUNCATE test_table_memory"

    cur.execute(truncateString)
    cur.execute(truncateString2)
    cur.execute(truncateString3)

    for x in xrange(theLength):
        rand1 = random.random()
        rand2 = random.random()
        rand3 = random.random()
        rand4 = random.random()

        insertString = "INSERT INTO test_table_innodb (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"
        insertString2 = "INSERT INTO test_table_myisam (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"
        insertString3 = "INSERT INTO test_table_memory (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"

        cur.execute(insertString)
        cur.execute(insertString2)
        cur.execute(insertString3)

    db.commit()

    # Count and time the query
    innodb_times.append( timeit.timeit('countRecords("test_table_innodb")', number=100, setup=setupString) )
    myisam_times.append( timeit.timeit('countRecords("test_table_myisam")', number=100, setup=setupString) )
    memory_times.append( timeit.timeit('countRecords("test_table_memory")', number=100, setup=setupString) )

3) Conditional selects

Query: SELECT * FROM tbl WHERE value1<0.5 AND value2<0.5 AND value3<0.5 AND value4<0.5
Result: MyISAM wins
Here, MyISAM and memory perform approximately the same, and beat InnoDB by about 50% for larger tables. This is the sort of query for which the benefits of MyISAM seem to be maximised.
Code:
myisam_times = []
innodb_times = []
memory_times = []

# Define a function to perform conditional selects
def conditionalSelect(testTable):
    selectString = "SELECT * FROM " + testTable + " WHERE value1 < 0.5 AND value2 < 0.5 AND value3 < 0.5 AND value4 < 0.5"
    cur.execute(selectString)

setupString = "from __main__ import conditionalSelect"

# Truncate the tables and re-fill with a set amount of data
for theLength in [3,10,30,100,300,1000,3000,10000,30000,100000]:

    truncateString = "TRUNCATE test_table_innodb"
    truncateString2 = "TRUNCATE test_table_myisam"
    truncateString3 = "TRUNCATE test_table_memory"

    cur.execute(truncateString)
    cur.execute(truncateString2)
    cur.execute(truncateString3)

    for x in xrange(theLength):
        rand1 = random.random()
        rand2 = random.random()
        rand3 = random.random()
        rand4 = random.random()

        insertString = "INSERT INTO test_table_innodb (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"
        insertString2 = "INSERT INTO test_table_myisam (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"
        insertString3 = "INSERT INTO test_table_memory (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"

        cur.execute(insertString)
        cur.execute(insertString2)
        cur.execute(insertString3)

    db.commit()

    # Count and time the query
    innodb_times.append( timeit.timeit('conditionalSelect("test_table_innodb")', number=100, setup=setupString) )
    myisam_times.append( timeit.timeit('conditionalSelect("test_table_myisam")', number=100, setup=setupString) )
    memory_times.append( timeit.timeit('conditionalSelect("test_table_memory")', number=100, setup=setupString) )

4) Sub-selects

Result: InnoDB wins
For this query, I created an additional set of tables for the sub-select. Each is simply two columns of BIGINTs, one with a primary key index and one without any index. Due to the large table size, I didn't test the memory engine. The SQL table creation command was
CREATE TABLE
    subselect_myisam
    (
        index_col bigint NOT NULL,
        non_index_col bigint,
        PRIMARY KEY (index_col)
    )
    ENGINE=MyISAM DEFAULT CHARSET=utf8;
where once again, 'MyISAM' is substituted for 'InnoDB' in the second table.
In this query, I leave the size of the selection table at 1000000 and instead vary the size of the sub-selected columns.
Here the InnoDB wins easily. After we get to a reasonable size table both engines scale linearly with the size of the sub-select. The index speeds up the MyISAM command but interestingly has little effect on the InnoDB speed. subSelect.png
Code:
myisam_times = []
innodb_times = []
myisam_times_2 = []
innodb_times_2 = []

def subSelectRecordsIndexed(testTable,testSubSelect):
    selectString = "SELECT * FROM " + testTable + " WHERE index_col in ( SELECT index_col FROM " + testSubSelect + " )"
    cur.execute(selectString)

setupString = "from __main__ import subSelectRecordsIndexed"

def subSelectRecordsNotIndexed(testTable,testSubSelect):
    selectString = "SELECT * FROM " + testTable + " WHERE index_col in ( SELECT non_index_col FROM " + testSubSelect + " )"
    cur.execute(selectString)

setupString2 = "from __main__ import subSelectRecordsNotIndexed"

# Truncate the old tables, and re-fill with 1000000 records
truncateString = "TRUNCATE test_table_innodb"
truncateString2 = "TRUNCATE test_table_myisam"

cur.execute(truncateString)
cur.execute(truncateString2)

lengthOfTable = 1000000

# Fill up the tables with random data
for x in xrange(lengthOfTable):
    rand1 = random.random()
    rand2 = random.random()
    rand3 = random.random()
    rand4 = random.random()

    insertString = "INSERT INTO test_table_innodb (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"
    insertString2 = "INSERT INTO test_table_myisam (value1,value2,value3,value4) VALUES (" + str(rand1) + "," + str(rand2) + "," + str(rand3) + "," + str(rand4) + ")"

    cur.execute(insertString)
    cur.execute(insertString2)

for theLength in [3,10,30,100,300,1000,3000,10000,30000,100000]:

    truncateString = "TRUNCATE subselect_innodb"
    truncateString2 = "TRUNCATE subselect_myisam"

    cur.execute(truncateString)
    cur.execute(truncateString2)

    # For each length, empty the table and re-fill it with random data
    rand_sample = sorted(random.sample(xrange(lengthOfTable), theLength))
    rand_sample_2 = random.sample(xrange(lengthOfTable), theLength)

    for (the_value_1,the_value_2) in zip(rand_sample,rand_sample_2):
        insertString = "INSERT INTO subselect_innodb (index_col,non_index_col) VALUES (" + str(the_value_1) + "," + str(the_value_2) + ")"
        insertString2 = "INSERT INTO subselect_myisam (index_col,non_index_col) VALUES (" + str(the_value_1) + "," + str(the_value_2) + ")"

        cur.execute(insertString)
        cur.execute(insertString2)

    db.commit()

    # Finally, time the queries
    innodb_times.append( timeit.timeit('subSelectRecordsIndexed("test_table_innodb","subselect_innodb")', number=100, setup=setupString) )
    myisam_times.append( timeit.timeit('subSelectRecordsIndexed("test_table_myisam","subselect_myisam")', number=100, setup=setupString) )

    innodb_times_2.append( timeit.timeit('subSelectRecordsNotIndexed("test_table_innodb","subselect_innodb")', number=100, setup=setupString2) )
    myisam_times_2.append( timeit.timeit('subSelectRecordsNotIndexed("test_table_myisam","subselect_myisam")', number=100, setup=setupString2) )
I think the take-home message of all of this is that if you are really concerned about speed, you need to benchmark the queries that you're doing rather than make any assumptions about which engine will be more suitable.



InnoDB offers:
ACID transactions
row-level locking
foreign key constraints
automatic crash recovery
table compression (read/write)
spatial data types (no spatial indexes)
In InnoDB all data in a row except for TEXT and BLOB can occupy 8,000 bytes at most. No full text indexing is available for InnoDB. In InnoDB the COUNT(*)s (when WHERE, GROUP BY, or JOIN is not used) execute slower than in MyISAM because the row count is not stored internally. InnoDB stores both data and indexes in one file. InnoDB uses a buffer pool to cache both data and indexes.
MyISAM offers:
fast COUNT(*)s (when WHERE, GROUP BY, or JOIN is not used)
full text indexing
smaller disk footprint
very high table compression (read only)
spatial data types and indexes (R-tree)
MyISAM has table-level locking, but no row-level locking. No transactions. No automatic crash recovery, but it does offer repair table functionality. No foreign key constraints. MyISAM tables are generally more compact in size on disk when compared to InnoDB tables. MyISAM tables could be further highly reduced in size by compressing with myisampack if needed, but become read-only. MyISAM stores indexes in one file and data in another. MyISAM uses key buffers for caching indexes and leaves the data caching management to the operating system.
Overall I would recommend InnoDB for most purposes and MyISAM for specialized uses only. InnoDB is now the default engine in new MySQL versions.



Please note that my formal education and experience is with Oracle, while my work with MySQL has been entirely personal and on my own time, so if I say things that are true for Oracle but are not true for MySQL, I apologize. While the two systems share a lot, the relational theory/algebra is the same, and relational databases are still relational databases, there are still plenty of differences!!
I particularly like (as well as row-level locking) that InnoDB is transaction-based, meaning that you may be updating/inserting/creating/altering/dropping/etc several times for one "operation" of your web application. The problem that arises is that if only some of those changes/operations end up being committed, but others do not, you will most times (depending on the specific design of the database) end up with a database with conflicting data/structure.
Note: With Oracle, create/alter/drop statements are called "DDL" (Data Definition) statements, and implicitly trigger a commit. Insert/update/delete statements, called "DML" (Data Manipulation), are not committed automatically, but only when a DDL, commit, or exit/quit is performed (or if you set your session to "auto-commit", or if your client auto-commits). It's imperative to be aware of that when working with Oracle, but I am not sure how MySQL handles the two types of statements. Because of this, I want to make it clear that I'm not sure of this when it comes to MySQL; only with Oracle.

An example of when transaction-based engines excel:

Let's say that I or you are on a web-page to sign up to attend a free event, and one of the main purposes of the system is to only allow up to 100 people to sign up, since that is the limit of the seating for the event. Once 100 sign-ups are reached, the system would disable further signups, at least until others cancel.
In this case, there may be a table for guests (name, phone, email, etc.), and a second table which tracks the number of guests that have signed up. We thus have two operations for one "transaction". Now suppose that after the guest info is added to the GUESTS table, there is a connection loss, or an error with the same impact. The GUESTS table was updated (inserted into), but the connection was lost before the "available seats" could be updated.
Now we have a guest added to the guest table, but the number of available seats is now incorrect (for example, value is 85 when it's actually 84).
Of course there are many ways to handle this, such as tracking available seats with "100 minus number of rows in guests table," or some code that checks that the info is consistent, etc.... But with a transaction-based database engine such as InnoDB, either ALL of the operations are committed, or NONE of them are. This can be helpful in many cases, but like I said, it's not the ONLY way to be safe, no (a nice way, however, handled by the database, not the programmer/script-writer).
That's all "transaction-based" essentially means in this context, unless I'm missing something -- that either the whole transaction succeeds as it should, or nothing is changed, since making only partial changes could make a minor to SEVERE mess of the database, perhaps even corrupting it...
But I'll say it one more time, it's not the only way to avoid making a mess. But it is one of the methods that the engine itself handles, leaving you to code/script with only needing to worry about "was the transaction successful or not, and what do I do if not (such as retry)," instead of manually writing code to check it "manually" from outside of the database, and doing a lot more work for such events.

Lastly, a note about table-locking vs row-locking:

DISCLAIMER: I may be wrong in all that follows in regard to MySQL, and the hypothetical/example situations are things to look into, but I may be wrong in what exactly is possible to cause corruption with MySQL. The examples are however very real in general programming, even if MySQL has more mechanisms to avoid such things...
Anyway, I am fairly confident in agreeing with those who have argued that how many connections are allowed at a time does not work around a locked table. In fact, multiple connections are the entire point of locking a table!! So that other processes/users/apps are not able to corrupt the database by making changes at the same time.
How would two or more connections working on the same row make a REALLY BAD DAY for you?? Suppose there are two processes both want/need to update the same value in the same row, let's say because the row is a record of a bus tour, and each of the two processes simultaneously want to update the "riders" or "available_seats" field as "the current value plus 1."
Let's do this hypothetically, step by step:
  1. Process one reads the current value, let's say it's empty, thus '0' so far.
  2. Process two reads the current value as well, which is still 0.
  3. Process one writes (current + 1) which is 1.
  4. Process two should be writing 2, but since it read the current value before process one write the new value, it too writes 1 to the table.
I'm not certain that two connections could intermingle like that, both reading before the first one writes... But if not, then I would still see a problem with:
  1. Process one reads the current value, which is 0.
  2. Process one writes (current + 1), which is 1.
  3. Process two reads the current value now. But while process one DID write (update), it has not committed the data, thus only that same process can read the new value that it updated, while all others see the older value, until there is a commit.
Also, at least with Oracle databases, there are isolation levels, which I will not waste our time trying to paraphrase. Here is a good article on that subject, and each isolation level having it's pros and cons, which would go along with how important transaction-based engines may be in a database...
Lastly, there may likely be different safeguards in place within MyISAM, instead of foreign-keys and transaction-based interaction. Well, for one, there is the fact that an entire table is locked, which makes it less likely that transactions/FKs are needed.
And alas, if you are aware of these concurrency issues, yes you can play it less safe and just write your applications, set up your systems so that such errors are not possible (your code is then responsible, rather than the database itself). However, in my opinion, I would say that it is always best to use as many safeguards as possible, programming defensively, and always being aware that human error is impossible to completely avoid. It happens to everyone, and anyone who says they are immune to it must be lying, or hasn't done more than write a "Hello World" application/script. ;-)
I hope that SOME of that is helpful to some one, and even more-so, I hope that I have not just now been a culprit of assumptions and being a human in error!! My apologies if so, but the examples are good to think about, research the risk of, and so on, even if they are not potential in this specific context.
Feel free to correct me, edit this "answer," even vote it down. Just please try to improve, rather than correcting a bad assumption of mine with another. ;-)
This is my first response, so please forgive the length due to all the disclaimers, etc... I just don't want to sound arrogant when I am not absolutely certain!






The Question and most of the Answers are out of date.
Yes, it is an old wives' tale that MyISAM is faster than InnoDB. notice the Question's date: 2008; it is now almost a decade later. InnoDB has made significant performance strides since then.
The dramatic graph was for the one case where MyISAM wins: COUNT(*) without a WHERE clause. But is that really what you spend your time doing?
If you run concurrency test, InnoDB is very likely to win, even against MEMORY.
If you do any writes while benchmarking SELECTs, MyISAM and MEMORY are likely to lose because of table-level locking.
In fact, Oracle is so sure that InnoDB is better that they removed MyISAM from 8.0!
The Question was written early in the days of 5.1. Since then, these major versions were marked "General Availability":
  • 2010: 5.5 (.8 in Dec.)
  • 2013: 5.6 (.10 in Feb.)
  • 2015: 5.7 (.9 in Oct.)
  • [TBD, maybe 2018], 8.0
Bottom line: Don't use MyISAM



In my experience, MyISAM was a better choice as long as you don't do DELETEs, UPDATEs, a whole lot of single INSERT, transactions, and full-text indexing. BTW, CHECK TABLE is horrible. As the table gets older in terms of the number of rows, you don't know when it will end.



Every application has it's own performance profile for using a database, and chances are it will change over time.
The best thing you can do is to test your options. Switching between MyISAM and InnoDB is trivial, so load some test data and fire jmeter against your site and see what happens.



myisam is a NOGO for that type of workload (high concurrency writes), i dont have that much experience with innodb (tested it 3 times and found in each case that the performance sucked, but it's been a while since the last test) if you're not forced to run mysql, consider giving postgres a try as it handles concurrent writes MUCH better



For that ratio of read/writes I would guess InnoDB will perform better. Since you are fine with dirty reads, you might (if you afford) replicate to a slave and let all your reads go to the slave. Also, consider inserting in bulk, rather than one record at a time.



If it is 70% inserts and 30% reads then it is more like on the InnoDB side.



bottomline: if you are working offline with selects on large chunks of data, MyISAM will probably give you better (much better) speeds.
there are some situations when MyISAM is infinitely more efficient than InnoDB: when manipulating large data dumps offline (because of table lock).
example: I was converting a csv file (15M records) from NOAA which uses VARCHAR fields as keys. InnoDB was taking forever, even with large chunks of memory available.
this an example of the csv (first and third fields are keys).
USC00178998,20130101,TMAX,-22,,,7,0700
USC00178998,20130101,TMIN,-117,,,7,0700
USC00178998,20130101,TOBS,-28,,,7,0700
USC00178998,20130101,PRCP,0,T,,7,0700
USC00178998,20130101,SNOW,0,T,,7,
since what i need to do is run a batch offline update of observed weather phenomena, i use MyISAM table for receiving data and run JOINS on the keys so that i can clean the incoming file and replace VARCHAR fields with INT keys (which are related to external tables where the original VARCHAR values are stored).

0 comments:

Post a Comment