上QQ阅读APP看书，第一时间看更新

Working on 2D data

Let's stop here for tabular data, and go for 2D data structures. As images are the most commonly used type of data in popular machine learning problems, we will show you some useful methods included in the SciPy stack.

The following code is optimized to run on the Jupyter notebook with inline graphics. You will find the source code in the source file, Dataset_IO.pynb:

import scipy.misc
from matplotlib import pyplot as plt
%matplotlib inline
testimg = scipy.misc.imread("data/blue_jay.jpg")
plt.imshow( testimg)

Importing a single image basically consists of importing the corresponding modules, using the imread method to read the indicated image into a matrix, and showing it using matplotlib. The % starting line corresponds to a parameter modification and indicates that the following matplotlib graphics should be shown inline on the notebook, with the following results (the axes correspond to pixel numbers):

Initial RGB image loaded

The testing variable will contain a height * width * channel number array, with all the red, green, and blue values for each image pixel. Let's get this information:

testimg.shape

The interpreter will display the following:

(1416, 1920, 3)

We could also try to separate the channels and represent them separately, with red, green, and blue scales, to get an idea of the color patterns in the image:

plt.subplot(131)
plt.imshow( testimg[:,:,0], cmap="Reds")
plt.title("Red channel")
plt.subplot(132)
plt.imshow( testimg[:,:,1], cmap="Greens")
plt.title("Green channel")
plt.subplot(133)
plt.imshow( testimg[:,:,2], cmap="Blues")
plt.title("Blue channel")

In the previous example, we create three subplots indicating the structure and position with a three-digit code. The first indicates the row number, the second indicates the column number, and the last, the plot position on that structure. The cmap parameter indicates the colormap assigned to each graphic.

The output will be as follows:

Depiction of the separated channels of the sample image.

Note that red and green channels share a similar pattern, while the blue tones are predominant in this bird figure. This channel separation could be an extremely rudimentary preliminary way to detect this kind of bird in its habitat.

This section is a simplified introduction to the different methods of loading datasets. In the following chapters, we will see different advanced ways to get the datasets, including loading and training the different batches of sample sets.