Thursday, December 16, 2021

Insufficient Data - NaNo Follow Up

Fig. 1.1 Words Per Minute of each successive trial. 
This year's NaNo goal was to write 30 handwritten pages of fiction, in the form of 90 second to 10 minute writing sprints, to get a good sense of how these sprints, reminiscent of the writing I used to do while waiting for things to print or upload at my last full time job, really worked for my writing process. The results so far are promising, but ultimately inconclusive, because I only managed to do 26 sprints, comprising some six and a half handwritten pages. As a fraction of the goal, this is the worst NaNo on record, but I learned something, and isn't that what matters? 

Most obviously, as you can see clearly from the first graph there, I got faster as I went on. Writing on a tight-but-unknown time limit is a skill like any other, and with this graph you can literally see me getting better at it over time. This is important. (Yes I know the x-axis isn't labeled. It ought to say "Sprints" or something). I am weirdly into imposing a kind of soft digital Taylorism on myself, and knowing that I can get faster even with not actually that much practice is important. The average words per minute for all 26 trials was 13.87, so aiming to get higher than that every time is reasonable, at least for a while, and when I reach the point where it stops being possible to reliably beat my average if I work at it, that will also be good information to have. 

The biggest problem I discovered in the course of this experiment is that there's no obvious time to spend an unknown amount of time between 90 seconds and 10 minutes writing by hand as quickly as possible. It doesn't fit neatly into your day. There are very few times when you go "Ah, yes, this would be an excellent time to do this". Within such time constraints, being spoken more than very briefly is what the people my partner watches on YouTube would call a "run killer", so I need to either announce that I'm doing it, or find a time when I'm unlikely to be interrupted. The practical upshot of this is that if I want semi-randomized sprints to be a viable writing practice, I need to identify trigger events, circumstances that mean it makes sense to stop (or not start) doing something else and do a sprint before I carry on with my day. 

Before we move onto a slightly more detailed statistical analysis and a couple more graphs, we need to talk about the parameters of this data set, just like if we were doing real science. 

26 sprints (n = 26). Mean length of 5 minutes, 17.92 seconds. Mean word count of 71.92 words. Mean words per minute of 13.87 WPM, with a standard deviation of 3.15. Sprints were sorted into "buckets" based on their length. Bucket 1 is 90 seconds to 2 minutes, Bucket 2 is 2 to 3 minutes, Bucket 3 is 3 to 4 minutes, and so on. To illustrate the severe limitations of this data, here is an accounting of how many sprints are in each bucket. 

Bucket 1: 2
Bucket 2: 5
Bucket 3: 4
Bucket 4: 2
Bucket 5: 5
Bucket 6: 0
Bucket 7: 3
Bucket 8: 4
Bucket 9: 1

Yeah. We had no sprints in the 6-7 minute range. It stands to reason that there might not be as many in Bucket 1, since it's half the size of the others, but Bucket 4 also only had 2, and Bucket 9 only had one. This is not a good data set. There's not enough here, and there are some big holes. I've had a hard time determining how many sprints I'd need for really valid data, but it's at least a thousand. Going ahead and assessing it is largely an intellectual exercise, but we're gonna do it anyway, because I said I would post about it and because maybe we'll learn something about statistical analysis together. 

Fig. 1.2 Unadjusted Averages For Each Bucket in WPM

Behold Figure 1.2, I guess, the bar graph of the unadjusted averages for each Bucket. One of the objectively of this experiment was to figure out the best sprint length, or at least start forming a notion of it, possibly to narrow down even further in subsequent experiments until I find the Ideal Number of Seconds for a timed writing sprint. Ambitious, I realize, but. As you can see here, the 3-4 and 4-5 minute ranges are the only ones with averages over 15, and the 2-3 and 8-9 minute ranges are the only ones below the overall average of 13.87, although 9-10 is close to the line and only has one data point, so we don't know if there's drop-off after 8 minutes or if the 8-9 minute range is Especially Bad for some reason. Our four whole data points for that time range are 14.64 WPM, 12.17, 12.31, and 9.04. That middle pair suggest that Bucket 8's being below average is...legitimate, even though that 9.04 is probably the result of someone interrupting me. We're gonna talk about the 2-3 range in a minute, but first we're gonna look at the other bar graph. 

Fig. 1.3 Averages Without Outliers

The reason Bucket 2 just jumped up by more than a full word per minute is because Bucket 2 contains the only actual outlier in our data set, a 2 minute 56 second sprint in which I wrote only 36 words, coming out to 6.43 words per minute. It's still the only bucket other than 8 that comes out below the average though. Currently. I tentatively predict that with additional data, Bucket 1 will start looking more like Bucket 2, but it's hard to know for sure. Bucket 5 is pretty middle of the road here - 4 Buckets have higher averages, 3 have lower, and Bucket 6 doesn't exist. So it's exactly middle of the road. But it's also massively variable, containing both our highest WPM for a single sprint (19.44 WPM) and our lowest that isn't an outlier (8.55 WPM). These are, by the way, 5 minutes 55 seconds and 5 minutes 58 seconds respectively, so it isn't a question of a substantive difference in length, nor is it a question of being very far apart - these are sprints 13 and 26 respectively. Now, if you look back up at that Figure 1.1, you can see that something happened after #13 - there are no other major dips after that, and #14 is, at 18.56 WPM, the second highest words per minute in this experiment. I don't remember any particular thing happening around then - I didn't date these, and I probably should have, so I can't readily account for it, but it's there in the graph.

At this point there's very little to do other than continue doing as many sprints as I can and try to gather better data. If you're interested in following the progress of this experiment, you can view the spreadsheet here.  You want the tab that just says "1.5-10". And if you want another update when we have something more to talk about, let me know in the comments. 

Next post will be Dresden Files, I promise. Until then, be Gay, do Crimes, and read All The Things!


No comments:

Post a Comment