Python Lotto Predictor Code

Based on Predict Lottery Numbers using Artificial Intelligent Neural Network in Kera, Python. (https://youtu.be/vN_EuIfD42g), which was based on "How to Guess Accurately 3 Lottery Numbers Out of 6 using LSTM Model (https://medium.com/@polanitzer/how-to-guess-accurately-3-lottery-numbers-out-of-6-using-lstm-model-e148d1c632d6)" by Roi Polanitzer.

This also used ChatGPT to provide additional comments throughout the process.

It is important to note that predicting lottery numbers is a very challenging problem because it is a highly random process with very low predictability. The outcome of a lottery draw is typically determined by random chance, and there is no inherent pattern or rule that can be used to predict the outcome with high accuracy.

LottoScraper.py

# Class to scrape the current and expired Lotto results into a Pandas DataFrame

import requests
import pandas as pd
import numpy as np
import io

class LotteryScraper:
    def __init__(self, url):
        self.url = url

    def getHistoricDraws(self):
        html = requests.get(self.url).text
        strFind = "<HR><B>All lotteries below have exceeded the 180 days expiry date</B><HR>"
        start_index = html.find(strFind) + len(strFind)
        result = html[start_index:]
        lines = result.split("\n")
        lines = lines[:-6]
        csv_text = "\n".join(lines)
        df = pd.read_csv(io.StringIO(csv_text)).iloc[:, 5:11]
        return df

    def getRecentDraws(self):
        data = requests.get(self.url).text
        lines = data.strip().split("\n")[6:-6]
        data = [line.split(",")[5:11] for line in lines]
        header = data[0]
        data = np.array(data[1:])
        df = pd.DataFrame(data, columns=header)
        return df

Lotto.py

# Importing required libraries
import pandas as pd
import numpy as np
from LotteryScraper import LotteryScraper

# Importing preprocessing and modeling libraries from scikit-learn and Keras
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import LSTM, Dense, Bidirectional, Dropout


# Import Historic Lotto Results into DataFrame.
url = "http://lottery.merseyworld.com/cgi-bin/lottery?days=2&Machine=Z&Ballset=0&order=1&show=1&year=0&display=CSV"
scraper = LotteryScraper(url)
df = scraper.getHistoricDraws()
arr = df.values

# Fit the StandardScaler to the values of the Pandas DataFrame to normalize the features
scaler = StandardScaler().fit(df.values)

# Transform the values in the DataFrame 'df' using a StandardScaler object
# to have zero mean and unit variance. Store the result in the 'transformed_dataset' variable.
'''
"Zero mean and unit variance" is a way of normalizing data in a mathematical sense. Imagine you have a set of numbers, and you want to make sure that the average of those numbers is zero, and that the numbers are spread out evenly. This is what "zero mean and unit variance" means.
To make it simple, consider a test score of a group of students. Normally, the scores would range from 0 to 100, with an average score of around 50. If we want to normalize the scores so that the average is zero and the scores are spread evenly, we would subtract the average score (50) from each student's score and then divide the result by the standard deviation of the scores. This way, the scores would be transformed to have zero mean and unit variance, meaning that the average score would be zero and the scores would be spread evenly around zero.
In a similar way, normalizing data using "zero mean and unit variance" makes it easier to compare and analyze the data, since the data will have a standardized scale.
'''
transformed_dataset = scaler.transform(df.values)

# Create a new DataFrame 'transformed_dataframe' using the values in 'transformed_dataset'
# with the same indices as the original DataFrame 'df'
transformed_dataframe = pd.DataFrame(data=transformed_dataset, index=df.index)

# Get the number of historic draws and store the result in the 'intHistoricDraws' variable
intHistoricDraws = arr.shape[0]

# Determine the optimal number of previous draws to use for prediction
'''
A good starting point might be to consider the last 100-200 draws as previous results
This allows the model to have enough data to learn from while still keeping the size of the dataset manageable
The exact number may vary depending on the characteristics of the data and the model being used
Cross-validation may be used to assess the performance and determine the optimal number
'''
intDrawsToConsiderForPrediction = 200

# Get the number of Lotto Balls and store the result in the 'intLottoBalls' variable
intLottoBalls = arr.shape[1]

# Create empty 3D array for training data with defined shape and data type float
'''
Train the machine learning model using a large dataset
The model learns patterns and relationships in the data during the training process
The goal is to train the model so that it can make accurate predictions on new, unseen data
It's like studying for a test or playing a game of 20 questions - the more data used to train the model, the better it becomes at making predictions
'''
train = np.empty(
[intHistoricDraws - intDrawsToConsiderForPrediction, intDrawsToConsiderForPrediction, intLottoBalls], dtype=float
)

# Create empty array for labels with defined shape and data type float
'''
In a machine learning model, labels represent the correct answers or outcomes that the model tries to predict.
Labels are used to evaluate the accuracy of the model's predictions. For example, in a lottery draw prediction model,
labels would be the numbers drawn in each draw. The model learns patterns and relationships in past draw results through
training data and uses that information to make predictions on future draws. The closer the predictions match the labels,
the more accurate the model is. Labels serve as the ground truth for evaluating and refining the model.
'''
label = np.empty([intHistoricDraws - intDrawsToConsiderForPrediction,intLottoBalls], dtype=float)

# Assign slice of transformed_dataframe to "train" and "label" arrays based on iteration and intDrawsToConsiderForPrediction, intLottoBalls.
'''
Loop through the range of historic draws minus the number of draws considered for prediction
For each iteration, assign the corresponding rows from the transformed dataframe to "train" and "label" arrays
"train" array consists of rows from the transformed dataframe from current iteration to current iteration + intDrawsToConsiderForPrediction,
and only the first intLottoBalls columns
"label" array consists of rows from the transformed dataframe from current iteration + intDrawsToConsiderForPrediction to
current iteration + intDrawsToConsiderForPrediction + 1, and only the first intLottoBalls columns
'''
for i in range(0, intHistoricDraws - intDrawsToConsiderForPrediction):
    train[i] = transformed_dataframe.iloc[i:i +
    intDrawsToConsiderForPrediction, 0: intLottoBalls]

    label[i] = transformed_dataframe.iloc[i+intDrawsToConsiderForPrediction: i +
    intDrawsToConsiderForPrediction+1, 0: intLottoBalls]

# Set Batch Size
batch_size = 100

# Set MAX Number for a Lotto Ball
intMaxLottoNumber = 59

# Initializing a Sequential model in Keras.
'''
A Sequential model is a linear stack of layers in a neural network.
The model can be created by passing a list of layer instances to the constructor or by using the add() method to add layers one at a time.

This type of model is simple and suitable for most use cases, especially for single-input, single-output stacks of layers.
'''
model = Sequential()

# Adding a bidirectional LSTM layer with 240 units, with a defined shape, with return sequences set to True, followed by a dropout layer with rate 0.2 to the model.
'''
The code is adding two layers to a deep learning model called a Sequential model. The first layer being added is a
Bidirectional Long Short-Term Memory (LSTM) layer with 240 units.
The input shape specified is a 3D tensor with the number of draws to consider for prediction as the first dimension,
the number of lotto balls as the second dimension, and 1 as the third dimension.
The return_sequences argument is set to True, which means that the output of this layer will be a sequence and not a single vector.
The second layer being added is a Dropout layer, which is a regularization technique used in deep learning to prevent overfitting.
The dropout rate is set to 0.2, meaning that 20% of the neurons in this layer will be randomly "dropped out" or turned off during each training iteration.
This helps to prevent the model from relying too heavily on any one set of neurons, improving the overall robustness of the model.
'''
model.add(Bidirectional(LSTM(240, input_shape=(
intDrawsToConsiderForPrediction, intLottoBalls), return_sequences=True)))
model.add(Dropout(0.2))

# Adding a bidirectional LSTM layer with 240 units, with a defined shape, with return sequences set to True, followed by a dropout layer with rate 0.2 to the model.
model.add(Bidirectional(LSTM(240, input_shape=(
intDrawsToConsiderForPrediction, intLottoBalls), return_sequences=True)))
model.add(Dropout(0.2))

# Adding a bidirectional LSTM layer with 240 units, with a defined shape, with return sequences set to True.
model.add(Bidirectional(LSTM(240, input_shape=(
intDrawsToConsiderForPrediction, intLottoBalls), return_sequences=True)))

# Adding a Bidirectional LSTM layer to the model with input shape (intDrawsToConsiderForPrediction, intLottoBalls) and return sequences set to False, using only last output in the sequence.
model.add(Bidirectional(LSTM(240, input_shape=(
intDrawsToConsiderForPrediction, intLottoBalls), return_sequences=False)))

# Adding two dense layers with MAX Lotto Number of neurons in the first layer and intLottoBalls neurons in the second layer for lotto prediction.
model.add(Dense(intMaxLottoNumber))
model.add(Dense(intLottoBalls))

# Compiling the model with mean squared error as the loss function, RMSprop as the optimizer, and accuracy as the evaluation metric
'''
The code is telling the machine learning model how to improve itself by minimizing errors and maximizing accuracy.
It does this by using a mathematical formula for measuring errors (called "mean squared error" or "mse") and a method
for adjusting the model's parameters to reduce these errors (called "rmsprop" optimization).
'''
model.compile(loss='mse', optimizer='rmsprop', metrics=['accuracy'])

# Training the model using the fit method with defined batch size and 30 epochs.
'''
Training the model with the training data and corresponding labels
using a batch size of 100, meaning that 100 training samples will be used to
update the model's weights once, before moving on to the next batch.
The training will be run for a total of 30 epochs, meaning that the entire training
dataset will be processed and used to update the model's weights 30 times.
'''
model.fit(train, label, batch_size, epochs=50)

# Set number of previous draws, such as the last 10 or 20 draws, as input to the model for prediction loaded into a NumPy Array.
url = "http://lottery.merseyworld.com/cgi-bin/lottery?days=2&Machine=Z&Ballset=0&order=1&show=1&year=-1&display=CSV"
scraper = LotteryScraper(url)
df = scraper.getRecentDraws()
to_predict = df.values

# Use the trained model to predict the results of the lotto draws, while first transforming the inputs using the scaler
scaled_predicted_output_1 = model.predict(
np.array([scaler.transform(to_predict)]))

# Print the inverse transformed result of the scaled predicted output.
'''
This code is printing the final result of a machine learning prediction.
The prediction was made on a set of lotto numbers that have been scaled, which means they were changed to a different form to be easier for the machine to work with.
Now, we want to see the original form of the predicted numbers, so we use the "scaler.inverse_transform" function.
This function takes the scaled numbers and changes them back to their original form. Finally, we use the ".astype(int)" function to make sure that the result is shown as whole numbers.
The final result is shown to us with the "print" function, which displays the predicted lotto numbers on the screen.
'''
print()
print("Draw Prediction:")
print(scaler.inverse_transform(scaled_predicted_output_1).astype(int)[0])