CSV to MySQL Queries PHP Function

Need to upload a really big CSV file into your MySQL database? This will help. If you search for help on the topic, everyone points to command line functions or to increasing the max_upload_size so you can just dump the giant CSV into phpMyAdmin. But, if you're on a shared web host, you might not have the ability to do either of those. Finding myself in a similar situation, I built this function that converts a CSV file into text files (each smaller than your max_upload_size) with MySQL queries to populate a table.

Here's the function:

<?php

// CSV to MySQL Function
// Copyright (c) 2015, Ink Plant
// this version was last updated November 6, 2015
// https://inkplant.com/code/csv-to-mysql

ini_set('auto_detect_line_endings',true);
function csv_to_mysql($args=array()) {
	//key => default
	$fields = array(
		'csv_file'=>false, //REQUIRED: the full path of the csv file you're reading
		'mysql_file'=>false, //REQUIRED: the full path of the mysql file you're writing (note: depending on size of import, this might get split into multiple files)
		'mysql_table'=>false, //REQUIRED: the name of the table you\'re importing into
		'mysql_cols'=>false, //REQUIRED: the names of the columns you\'re importing into. needs to match up with columns in CSV. use false to skip a column
		'skip_rows'=>0, //the number of rows at the beginning of the file to skip
		'skip_if'=>false, //the name of a function that returns TRUE if row should be skipped, FALSE if not. see example_skip_if below
		'numeric_headers'=>false, //if true, ignore the names in the header row and use integers as array keys instead
		'trim_headers'=>true, //trim whitespace around header row values
		'trim_values'=>true, //trim whitespace around all non-header row values
		'max_file_size'=>8, //in megabytes; this determines how big the export files can get before they\'re split up
		'max_rows_per_query'=>false, //if set, this will split the query (but not necessarily the file) after X rows
		'max_rows_total'=>false, //if set, this will stop the process completely (not just split to a new file) after X rows. for debug purposes mainly
		'mysql_intro'=>false, //a MySQL command (string) to create the table (and/or drop a previous table) before we begin
		'mysql_tail'=>false, //a MySQL command (string) to run at the end of the file
		'lb'=>"\n", //line break character
		'delimiter'=>',', //what is separating the fields
		'enclosure'=>'"', //what is enclosing the fields
		'require_enclosure'=>false, //if true, an error will be thrown on any data not inside an enclosure
		'escape'=>'\\', //just one, actually, but we have to escape the escape here
		'debug'=>false, //set to true while testing if you run into troubles
	);
	foreach ($fields as $key => $default) {
		if (array_key_exists($key,$args)) { $$key = $args[$key]; }
		else { $$key = $default; }
	}

	if (!$csv_file) { custom_die('The csv_to_mysql function was not given a CSV file to read.'); }
	elseif (!is_readable($csv_file)) { custom_die('The CSV file given to the csv_to_mysql function is unreadable. Check permissions.'); }

	if (!$mysql_file) { custom_die('The csv_to_mysql function was not given a MySQL file to write to.'); }
	elseif (substr($mysql_file,-4) != '.sql') { custom_die('The csv_to_mysql function requires a MySQL file to write to that ends in .sql.'); }
	if (file_exists($mysql_file)) {
		if (!is_writeable($mysql_file)) { custom_die('The MySQL file given to the csv_to_mysql function is unwriteable. Check permissions.'); }
	} else { //try to create a new blank file there
		$fp = @fopen($mysql_file, 'w') or custom_die('Could not create new MySQL file for writing.');
		if (!$fp) { custom_die('Could not create new MySQL file for writing.'); }
		@fwrite($fp, 'csv_to_mysql error!') or custom_die('Could not write to new MySQL file.');; //this will get replaced later
		fclose($fp);
	}
	
	if (!$mysql_table) { custom_die('The csv_to_mysql function requires a MySQL table name.'); }
	if ((!$mysql_cols) || (!is_array($mysql_cols))) { custom_die('The csv_to_mysql function requires a MySQL columns array.'); }

	if ($max_file_size <= 0) { custom_die('The csv_to_mysql function was not given a valid max_file_size.'); }
	$max_file_size = round($max_file_size * 1048576); //convert to bytes
	$max_file_size = $max_file_size - 1024; //a little buffer
	
	if ($skip_if) {
		if (!function_exists($skip_if)) { custom_die('The skip_if function does not exist.'); }
	}

	if ($debug) {
		$echo_lines = false; //change this to true to see the imported lines on your screen
		$echo_queries = false; //change this to true to see the queries on your screen
	} else {
		$echo_lines = false; //change this to true to see the imported lines on your screen
		$echo_queries = false; //change this to true to see the queries on your screen
	}	

	if ($debug) {
		echo '<p>Opening '.htmlspecialchars($csv_file).'&hellip;</p>';
	}	

	$data = array();

	$mysql_cols_wrapped = array();
	foreach ($mysql_cols as $key => $col) {
		if ($col) { $mysql_cols_wrapped[$key] = '`'.$col.'`'; } //here, we surround col names with ` to prevent errors
	}

	$lb = "\n"; //line break

	$file_count = 1;
	$new_mysql_file = $mysql_file; //same here, but this changes later
	$fp = @fopen($new_mysql_file, 'w') or custom_die('Could not open '.$new_mysql_file.' for writing.');
	$bytes = 0;
	$comma = ' '; //after first row, start placing a comma and line break between entries

	//create the table (if MySQL to do so is provided)
	if ($mysql_intro) {
		$q = trim($mysql_intro);
		if (substr($q,-1) != ';') { $q .= ';'; }
		$q .= $lb.$lb;
		$bytes = $bytes + mb_strlen($q);
		if (($debug) && ($echo_queries)) { echo $q.'<br>'; }
		fwrite($fp,$q);
	}

	$first_query = 'INSERT INTO `'.$mysql_table.'` ('.implode(',',$mysql_cols_wrapped).') VALUES '.$lb; //this is used later too
	$q = $first_query;
	$bytes = $bytes + mb_strlen($q);
	if (($debug) && ($echo_queries)) { echo $q.'<br>'; }
	fwrite($fp,$q);

	$row = 0; // row number of the file we're reading
	$qrow = 1; // row number of the mysql query we're writing
	if (($handle = fopen($csv_file,'r')) !== false) {
		while (($line = fgets($handle)) !== false) {
			$row++;
			if ($row > $skip_rows) {
				if ($skip_if) { $skip = $skip_if($line); }
				else { $skip = false; }
				if (!$skip) {
					$qrow++;
					if (($debug) && ($echo_lines)) { echo $line.'<br>'; }

					$values = str_getcsv($line,$delimiter,$enclosure,$escape);
					foreach ($values as $key => $value) {
						if ($mysql_cols[$key]) { //if we're importing this column
							$values[$key] = dbPrepare($value); //clean up the values so that they\'re safe to use
						} else {
							unset($values[$key]);
						}
					}

					$q = $comma.'('.implode(',',$values).')';
					$bytes = $bytes + mb_strlen($q);

					if ($bytes > $max_file_size) { //if we have exceeded the maximumum size for the file, split it into a new one
				
						//remove comma from current query
						$q = substr($q,strlen($comma));

						//first, finish up old file
						$q2 = ';'; //we use $q2 for these new queries so that waiting $q doesn't get overwritten
						$bytes = $bytes + mb_strlen($q2);
						if (($debug) && ($echo_queries)) { echo $q2.'<br>'; }
						fwrite($fp,$q2);
						fclose($fp);
						if ($debug) { echo '<p>'.number_format($bytes).' bytes were written to '.$new_mysql_file.'.</p>'; }

						//then, start new one
						$file_count++;
						//if ($file_count > 3) { break; } //uncomment this if you want a limit for testing
						$new_mysql_file = substr($mysql_file,0,-4).'_'.$file_count.'.sql';
						$fp = @fopen($new_mysql_file, 'w') or custom_die('Could not open '.$new_mysql_file.' for writing.');
						$bytes = 0;
						$comma = '';
					
						$q2 = $first_query; //we use $q2 for these new queries so that waiting $q doesn't get overwritten
						$bytes = $bytes + mb_strlen($q2);
						if (($debug) && ($echo_queries)) { echo $q2.'<br>'; }
						fwrite($fp,$q2);
						$qrow = 0; //reset query line counter

					} elseif (($max_rows_per_query) && ($qrow > $max_rows_per_query)) { //if the query needs to be split up within the same file

						//add on a semi-colon
						$q .= ';'.$lb;

						//create the new insert command row
						$q .= $first_query;
						$bytes = $bytes + mb_strlen($q);
						if (($debug) && ($echo_queries)) { echo $q.'<br>'; }
						fwrite($fp,$q);

						$qrow = 1; //reset query line counter
						$comma = ' '; //reset comma
						$q = false; //keep query from being written again

					}

					if ($q !== false) { //as long as the query is ready, write it
						if (($debug) && ($echo_queries)) { echo $q.'<br>'; }
						fwrite($fp,$q);
						$comma = ','.$lb.' ';
					}
				}
			}
			if (($max_rows_total) && ($row > $max_rows_total)) { break; }
		}
		fclose($handle);
	} else {
		custom_die('There was an error opening the file.');
	}

	//close up the existing file
	if (is_resource($fp)) {
		$q = ';';
		if ($mysql_tail) { $q .= $lb.$mysql_tail; } //add on tail if it's present
		$bytes = $bytes + mb_strlen($q);
		if (($debug) && ($echo_queries)) { echo $q.'<br>'; }
		fwrite($fp,$q);

		fclose($fp);
		if ($debug) { echo '<p>'.number_format($bytes).' bytes were written to '.$new_mysql_file.'.</p>'; }
	}

	return true;
}

//we can use a query to skip certain lines
function example_skip_if($line) {
	$req_string = 'test'; //your required string
	if (strpos($line,$req_string) === false) { return true; } //skip
	else { return false; } //don't skip
}

?>

Note that we use str_getcsv instead of fgetcsv. That's because we need to process the huge file line by line, rather than trying to create a huge array and exceeding our memory allocations. If you get an error because of that, make sure you're running PHP 5.3.0 or later.

If you'd rather just dump that CSV file into one big PHP array, check out our CSV File to Array PHP function instead. It's easier to use than this, but doesn't handle really big files nearly as well.


Comments

Loading…

This post was published on November 3rd, 2015 by Robert James Reese in the following categories: MySQL and PHP. Before using any of the code or other content in this post, you must read and agree to our terms of use.