A convolutional layer preforms a convolution on an input matrix with a kernel matrix (that is learned during training). The input matrix is composed of values given by input nodes. In essence, every input value in the matrix represents one input node into the convolutional layer. The relationship between several input nodes and an output node is known as a Receptive Field. In the gif below, the receptive field is represented by the red square.
This is the representation of a 2D convolutional layer, but nD convlutional layers can exist. The input nodes, kernel and output nodes simply need to be organized in a 3D manner, rather than a 2D one.
Convolutional layers are particularly useful for image data and are used to learn if the presence of a feature (edges, cars, cats) are present in the receptive field. Each kernel matrix represents one feature, though multiple kernel matrices can be learned for a single convolutional layer.
As convolutional layers are stacked, features can be more complex as the size of the total receptive field increases. For example, the first layer may identify curves and edges, while the second layer may identify numbers (for something like MNIST).