CSV to MySQL Queries PHP Function

Need to upload a really big CSV file into your MySQL database? This will help. If you search for help on the topic, everyone points to command line functions or to increasing the max_upload_size so you can just dump the giant CSV into phpMyAdmin. But, if you're on a shared web host, you might not have the ability to do either of those. Finding myself in a similar situation, I built this function that converts a CSV file into text files (each smaller than your max_upload_size) with MySQL queries to populate a table.

Here's the function:

1.   <?php
2.   
3.   // CSV to MySQL Function
4.   // Copyright (c) 2015, Ink Plant
5.   // this version was last updated November 6, 2015
6.   // https://inkplant.com/code/csv-to-mysql
7.   
8.   ini_set('auto_detect_line_endings',true);
9.   function csv_to_mysql($args=array()) {
10.  	//key => default
11.  	$fields = array(
12.  		'csv_file'=>false, //REQUIRED: the full path of the csv file you're reading
13.  		'mysql_file'=>false, //REQUIRED: the full path of the mysql file you're writing (note: depending on size of import, this might get split into multiple files)
14.  		'mysql_table'=>false, //REQUIRED: the name of the table you\'re importing into
15.  		'mysql_cols'=>false, //REQUIRED: the names of the columns you\'re importing into. needs to match up with columns in CSV. use false to skip a column
16.  		'skip_rows'=>0, //the number of rows at the beginning of the file to skip
17.  		'skip_if'=>false, //the name of a function that returns TRUE if row should be skipped, FALSE if not. see example_skip_if below
18.  		'numeric_headers'=>false, //if true, ignore the names in the header row and use integers as array keys instead
19.  		'trim_headers'=>true, //trim whitespace around header row values
20.  		'trim_values'=>true, //trim whitespace around all non-header row values
21.  		'max_file_size'=>8, //in megabytes; this determines how big the export files can get before they\'re split up
22.  		'max_rows_per_query'=>false, //if set, this will split the query (but not necessarily the file) after X rows
23.  		'max_rows_total'=>false, //if set, this will stop the process completely (not just split to a new file) after X rows. for debug purposes mainly
24.  		'mysql_intro'=>false, //a MySQL command (string) to create the table (and/or drop a previous table) before we begin
25.  		'mysql_tail'=>false, //a MySQL command (string) to run at the end of the file
26.  		'lb'=>"\n", //line break character
27.  		'delimiter'=>',', //what is separating the fields
28.  		'enclosure'=>'"', //what is enclosing the fields
29.  		'require_enclosure'=>false, //if true, an error will be thrown on any data not inside an enclosure
30.  		'escape'=>'\\', //just one, actually, but we have to escape the escape here
31.  		'debug'=>false, //set to true while testing if you run into troubles
32.  	);
33.  	foreach ($fields as $key => $default) {
34.  		if (array_key_exists($key,$args)) { $$key = $args[$key]; }
35.  		else { $$key = $default; }
36.  	}
37.  
38.  	if (!$csv_file) { custom_die('The csv_to_mysql function was not given a CSV file to read.'); }
39.  	elseif (!is_readable($csv_file)) { custom_die('The CSV file given to the csv_to_mysql function is unreadable. Check permissions.'); }
40.  
41.  	if (!$mysql_file) { custom_die('The csv_to_mysql function was not given a MySQL file to write to.'); }
42.  	elseif (substr($mysql_file,-4) != '.sql') { custom_die('The csv_to_mysql function requires a MySQL file to write to that ends in .sql.'); }
43.  	if (file_exists($mysql_file)) {
44.  		if (!is_writeable($mysql_file)) { custom_die('The MySQL file given to the csv_to_mysql function is unwriteable. Check permissions.'); }
45.  	} else { //try to create a new blank file there
46.  		$fp = @fopen($mysql_file, 'w') or custom_die('Could not create new MySQL file for writing.');
47.  		if (!$fp) { custom_die('Could not create new MySQL file for writing.'); }
48.  		@fwrite($fp, 'csv_to_mysql error!') or custom_die('Could not write to new MySQL file.');; //this will get replaced later
49.  		fclose($fp);
50.  	}
51.  	
52.  	if (!$mysql_table) { custom_die('The csv_to_mysql function requires a MySQL table name.'); }
53.  	if ((!$mysql_cols) || (!is_array($mysql_cols))) { custom_die('The csv_to_mysql function requires a MySQL columns array.'); }
54.  
55.  	if ($max_file_size <= 0) { custom_die('The csv_to_mysql function was not given a valid max_file_size.'); }
56.  	$max_file_size = round($max_file_size * 1048576); //convert to bytes
57.  	$max_file_size = $max_file_size - 1024; //a little buffer
58.  	
59.  	if ($skip_if) {
60.  		if (!function_exists($skip_if)) { custom_die('The skip_if function does not exist.'); }
61.  	}
62.  
63.  	if ($debug) {
64.  		$echo_lines = false; //change this to true to see the imported lines on your screen
65.  		$echo_queries = false; //change this to true to see the queries on your screen
66.  	} else {
67.  		$echo_lines = false; //change this to true to see the imported lines on your screen
68.  		$echo_queries = false; //change this to true to see the queries on your screen
69.  	}	
70.  
71.  	if ($debug) {
72.  		echo '<p>Opening '.htmlspecialchars($csv_file).'&hellip;</p>';
73.  	}	
74.  
75.  	$data = array();
76.  
77.  	$mysql_cols_wrapped = array();
78.  	foreach ($mysql_cols as $key => $col) {
79.  		if ($col) { $mysql_cols_wrapped[$key] = '`'.$col.'`'; } //here, we surround col names with ` to prevent errors
80.  	}
81.  
82.  	$lb = "\n"; //line break
83.  
84.  	$file_count = 1;
85.  	$new_mysql_file = $mysql_file; //same here, but this changes later
86.  	$fp = @fopen($new_mysql_file, 'w') or custom_die('Could not open '.$new_mysql_file.' for writing.');
87.  	$bytes = 0;
88.  	$comma = ' '; //after first row, start placing a comma and line break between entries
89.  
90.  	//create the table (if MySQL to do so is provided)
91.  	if ($mysql_intro) {
92.  		$q = trim($mysql_intro);
93.  		if (substr($q,-1) != ';') { $q .= ';'; }
94.  		$q .= $lb.$lb;
95.  		$bytes = $bytes + mb_strlen($q);
96.  		if (($debug) && ($echo_queries)) { echo $q.'<br>'; }
97.  		fwrite($fp,$q);
98.  	}
99.  
100. 	$first_query = 'INSERT INTO `'.$mysql_table.'` ('.implode(',',$mysql_cols_wrapped).') VALUES '.$lb; //this is used later too
101. 	$q = $first_query;
102. 	$bytes = $bytes + mb_strlen($q);
103. 	if (($debug) && ($echo_queries)) { echo $q.'<br>'; }
104. 	fwrite($fp,$q);
105. 
106. 	$row = 0; // row number of the file we're reading
107. 	$qrow = 1; // row number of the mysql query we're writing
108. 	if (($handle = fopen($csv_file,'r')) !== false) {
109. 		while (($line = fgets($handle)) !== false) {
110. 			$row++;
111. 			if ($row > $skip_rows) {
112. 				if ($skip_if) { $skip = $skip_if($line); }
113. 				else { $skip = false; }
114. 				if (!$skip) {
115. 					$qrow++;
116. 					if (($debug) && ($echo_lines)) { echo $line.'<br>'; }
117. 
118. 					$values = str_getcsv($line,$delimiter,$enclosure,$escape);
119. 					foreach ($values as $key => $value) {
120. 						if ($mysql_cols[$key]) { //if we're importing this column
121. 							$values[$key] = dbPrepare($value); //clean up the values so that they\'re safe to use
122. 						} else {
123. 							unset($values[$key]);
124. 						}
125. 					}
126. 
127. 					$q = $comma.'('.implode(',',$values).')';
128. 					$bytes = $bytes + mb_strlen($q);
129. 
130. 					if ($bytes > $max_file_size) { //if we have exceeded the maximumum size for the file, split it into a new one
131. 				
132. 						//remove comma from current query
133. 						$q = substr($q,strlen($comma));
134. 
135. 						//first, finish up old file
136. 						$q2 = ';'; //we use $q2 for these new queries so that waiting $q doesn't get overwritten
137. 						$bytes = $bytes + mb_strlen($q2);
138. 						if (($debug) && ($echo_queries)) { echo $q2.'<br>'; }
139. 						fwrite($fp,$q2);
140. 						fclose($fp);
141. 						if ($debug) { echo '<p>'.number_format($bytes).' bytes were written to '.$new_mysql_file.'.</p>'; }
142. 
143. 						//then, start new one
144. 						$file_count++;
145. 						//if ($file_count > 3) { break; } //uncomment this if you want a limit for testing
146. 						$new_mysql_file = substr($mysql_file,0,-4).'_'.$file_count.'.sql';
147. 						$fp = @fopen($new_mysql_file, 'w') or custom_die('Could not open '.$new_mysql_file.' for writing.');
148. 						$bytes = 0;
149. 						$comma = '';
150. 					
151. 						$q2 = $first_query; //we use $q2 for these new queries so that waiting $q doesn't get overwritten
152. 						$bytes = $bytes + mb_strlen($q2);
153. 						if (($debug) && ($echo_queries)) { echo $q2.'<br>'; }
154. 						fwrite($fp,$q2);
155. 						$qrow = 0; //reset query line counter
156. 
157. 					} elseif (($max_rows_per_query) && ($qrow > $max_rows_per_query)) { //if the query needs to be split up within the same file
158. 
159. 						//add on a semi-colon
160. 						$q .= ';'.$lb;
161. 
162. 						//create the new insert command row
163. 						$q .= $first_query;
164. 						$bytes = $bytes + mb_strlen($q);
165. 						if (($debug) && ($echo_queries)) { echo $q.'<br>'; }
166. 						fwrite($fp,$q);
167. 
168. 						$qrow = 1; //reset query line counter
169. 						$comma = ' '; //reset comma
170. 						$q = false; //keep query from being written again
171. 
172. 					}
173. 
174. 					if ($q !== false) { //as long as the query is ready, write it
175. 						if (($debug) && ($echo_queries)) { echo $q.'<br>'; }
176. 						fwrite($fp,$q);
177. 						$comma = ','.$lb.' ';
178. 					}
179. 				}
180. 			}
181. 			if (($max_rows_total) && ($row > $max_rows_total)) { break; }
182. 		}
183. 		fclose($handle);
184. 	} else {
185. 		custom_die('There was an error opening the file.');
186. 	}
187. 
188. 	//close up the existing file
189. 	if (is_resource($fp)) {
190. 		$q = ';';
191. 		if ($mysql_tail) { $q .= $lb.$mysql_tail; } //add on tail if it's present
192. 		$bytes = $bytes + mb_strlen($q);
193. 		if (($debug) && ($echo_queries)) { echo $q.'<br>'; }
194. 		fwrite($fp,$q);
195. 
196. 		fclose($fp);
197. 		if ($debug) { echo '<p>'.number_format($bytes).' bytes were written to '.$new_mysql_file.'.</p>'; }
198. 	}
199. 
200. 	return true;
201. }
202. 
203. //we can use a query to skip certain lines
204. function example_skip_if($line) {
205. 	$req_string = 'test'; //your required string
206. 	if (strpos($line,$req_string) === false) { return true; } //skip
207. 	else { return false; } //don't skip
208. }
209. 
210. ?>

Note that we use str_getcsv instead of fgetcsv. That's because we need to process the huge file line by line, rather than trying to create a huge array and exceeding our memory allocations. If you get an error because of that, make sure you're running PHP 5.3.0 or later.

If you'd rather just dump that CSV file into one big PHP array, check out our CSV File to Array PHP function instead. It's easier to use than this, but doesn't handle really big files nearly as well.


Comments

Loading…

This post was published on November 3rd, 2015 by Robert James Reese in the following categories: MySQL and PHP. Before using any of the code or other content in this post, you must read and agree to our terms of use.