BlogPostStyling with LLMs

This project was to introduce more style and presentation with the Blog Posts on this website, testing how different LLMs would tackle the problem.

Results

The winner was DeepCoder, but it had some user experience issues that I plugged with the implementation provided later by Tulu3:70b. The conversations leading to the final mergeable code can be found in the folder with this ReadMe.

Rankings

More information about how the judging went can be found in This Github branch for the Rankings and related LLM Reasons for Rank.

LLM Working Sessions

There is a video series of working with the LLMs at LLM Sessions by Mind of a Fighting Lion Enthusiast.

There is also collection of Pull Requests that reflect the changes offered by each LLM given the timebox and context I could provide.

Phi4

It did not go very well, but part of that could have been me ramping up with what I wanted from the experiments

DeepCoder

This one was the winner, so it went pretty well.

Gemma3

It was fast to respond, but it often had the wrong answer the first time around. The code did not fare well in the Rankings either.

Nemotron

It did pretty well, but it could not figure out how to fix the Text Flow issue at the end, and I borrowed the answer from another LLM.

Qwen3

It got things wrong a whole lot, and it did not do well in the rankings.

Tulu3

Got everything done in first go without many mistakes. Felt good and productive.

Llama 3.3

Relatively quick, but I ran into several issues that needed to be fixed before getting to the right output.

DeepSeek

This one did work with Roo, although it encountered bugs which forced me to switch over to Continue. Got through all the experiment stages in one session, although it did get stuck on the Back button persistence issue for a long time.

Final Thoughts

The LLMs

It is good to have a mix of LLMs to query. I liked working with Tulu3 the most, but I think several LLMs have merit. In future experiments, I will probably drop phi4, qwen3, nemotron, and gemma3 because they did not feel good to work with.

My Scientific Process

My scientific process is not very consistent for each iteration. In future experiments, I could spend more time in the Preparation stage to try to get a more consistent test, where I give less input in each loop.