I wrote a python program that train a neural network to predict characters in Alice in Wonderland. I would like to break down my program in terms of Buddhist's framework of consciousness.
Step 1: Download the scripts for Alice's Adventures in Wonderland from the website. This text is going to be used as the learning data base for my program. I also download the built-in package "urlopen" that contains a long list of words. This package is used for identifying words. In order to recognise words, words are converted to integer numbers. This step is similar to the five consciousness in Buddhism, where the built-in package, together with my codes, are sensors that perceive information.
import string
from urllib.request import urlopen
url='https://gist.githubusercontent.com/phillipj/4944029/raw/75ba2243dd5ec2875f629bf5d79f6c1e4b5a8b46/alice_in_wonderland.txt'
text = urlopen(url).read().decode('utf-8')
text=text.lower()
text=[c for c in text if c not in string.punctuation]
char_to_index={}
for i in range(len(chars)):
dict = chars[i]
answer = i
char_to_index[dict] = answer
text_indices=[]
for i in range(len(text)):
texti = text[i]
indices = str(char_to_index[texti])
text_indices.append(indices)
Step 2: Create feature sequences and targets. The purpose of the algorithm is to find the pattern of words in the writing and imitate such pattern. One way of finding patterns is to create a list of lists, X and a list of y. Each sublist in X corresponds to a hundred consecutive words in the text, and each integer in y corresponds to the word following the hundred consecutive words in the text. This grouping, essentially, prepares to study the relationship between the first 100 words and their following word. This step is processing data through grouping, performing the 6th consciousness. In real life programming, this step has two limitations:
1) The data size is too small. The text of Alice in Wonderland only has 46722 words, which is not a data set that allows machine to learn enough about writing.
2) The grouping, which tries to predict one word based on the hundred words in front might not be the best model. It also disregards essential details such as punctuations.
X = []
y = []
maxx = (len(text_indices)-100)//3
i = 0
for i in range(maxx+1):
X.append(text_indices[3*i:3*i+100])
y.append(text_indices[3*i+100])
i= i+1
X=np.array(X)
y=np.array(y)
X=to_categorical(X)
y=to_categorical(y)
Step 3: Create a `LSTM` model. This model is a recurrent neural network (RNN), which means that the model can process the inputs in multiple ways. After we specify the input shape, we fit the model to X and y, and try fitting them 50 times. This is similar to develop the neural network of the 7th consciousness. However, this step is essentially pre-determined and there is no consciousness formed. I as the programmer supervise the modelling and only allow the algorithm to calculate using the LSTM algorithm, which in my opinion is the best method.
model=Sequential()
model.add(LSTM(128,input_shape=(100,30)))
model.add(Dense(30,activation='softmax'))
model.compile(optimizer='Adam',loss='mean_squared_error')
model.fit(X,y,epochs=50,batch_size=128)
Step 4: Use the fitted model to generate text.
seq=[np.random.randint(0,30) for i in range(100)] #100 random integers for inital prediction
seq=to_categorical(np.array(seq),num_classes=30) #one-hot encode initial sequence
newtext=''
for i in range(1000):
index_pred=np.argmax(model.predict(seq.reshape(1,100,30))) #model prediction from sequence
newtext+=chars[index_pred] #corresponding character
seq=np.vstack([seq,to_categorical(index_pred,num_classes=30)]) #add element to end of sequence
seq=seq[1:] #remove 1st element from sequence so we have another sequence of 100
newtext #display generated text
And here is the derived text:
ues and the dormouse with at all cather alice did not cant dind the king said alice
ill all it worle said the day suided
thinking she was said the door and the the wlat ont
conging
of the soof first to geventthing the world campers and the world she was that suad
lice that the rabbit said to the king went on ablaice in and dind
and gook that the mouse was said alice in at the diele
think the white rabbit was as she was soop
thes the the mock turtle gean of the world alice all it
Comments