Custom AI: Data and software engineering – getting it right from the start

Identifying the correct technological foundation for data capture, management and analysis is key to any successful AI project. As a first step, a data pipeline must be laid to a central data lake and attention must be paid during “ingestion” that the data is properly processed – for example, the timestamps are cleaned up and faulty data is rejected.

Once the data has been collected, machine learning (ML), such as neural networks, can be used to create what are known as “models”. These are functions that generate a particular output from a given input, such as returning a “faulty” reading from collated measurement data. The parameters of the function, in such cases, are learnt autonomously by the machine, using a large volume of predefined input/output pairs.

Like all other software, AI models have to be developed and further versions created. The ML algorithms are constantly being put through their paces with new data, even after the first models have been signed off for production, so that the models can be continuously refined. As the creation and use of new models, in particular, need to be very highly scalable, implementation on a public cloud is almost a prerequisite for an economically viable and successful AI project.