Joy and headaches building a well structured multi-platform Flutter app with CoPilot

July 06, 2025

You only understand something when you are forced to do it yourself. I've been looking for a way to force myself to build and test a multi-platform application. A lousy game night scoring experience demanded an NIH solution. So, I built a multi-platform Flutter application using only CoPilot agent prompts. It was a learning experience where I had to cheat, relying on my knowledge to drive CoPilot where I wanted it to go.

The end result is a multi-platform Flutter app, fs_game_score that can be found on GitHub. This is a generic game scoring app created almost entirely using VSCode's Copilot agent mode with virtually no hand coding. There were lots of AI agent prompts with many undo/redo attempts. The application has been tested on Android, IOS, Chrome, macOS, and Windows 11.

At least a dozen lessons learned

Coding

It helps if you know what libraries you want to use. That lets you direct the copilot to the most up-to-date or best practice libraries. I knew I wanted to use Riverpod 3.x and that I needed something beyond the default Flutter table. I prompted the LLM to add the libraries I wanted and then built on top of that because the pubspec.yaml was in context for my prompts. Copilot only knows the code it knows, and there is more older code out there than code using newer libraries.

I fought a lot with the data table behavior for the score sheet. There were cases where the content was too big for the cell, or the scrolling behavior wasn't right. Going out and finding the generally recommended library and having CoPilot add that library cut out days of troubleshooting.

The LLM agent tends to generate long functions or methods when building the UI layout. I wanted smaller, discrete components. I needed to either specifically prompt to build the components or created prompts telling the Agent to extract the code from the scoring table or other layouts into their own components.

The agent LLM creates large amounts of semi-organized code. I had to provide guidance for code organization: which code got its own files, and the folder organization for code files.

I was deliberate about model classes and what they contained. I wanted scope management with cases where I wanted to retain state for various pieces of data. There was a fair amount of trial and error when adding features like the "new game" panel to make sure things didn't get completely erased when I added the column locking controls or reset player names.

Partitioning the reactive pieces took work. I used Riverpod, which provides an opinionated model for state and for reactive widget updates.

The LLM created long pieces of code. I sometimes iterated several times to break that code apart to make it more testable or maintainable. This is similar to one of the items above.

Riverpod reactive style code was finicky because the generated notifiers, scope or data objects didn't handle the corner cases. Describing the broken behavior to CoPilot was enough to fix the problem about 50% of the time.

Sometimes I got blocked because I accepted code that appeared to be working to realize later that I wanted a different structure for the next round of changes.

I wanted the code to be testable and for the Flutter widgets to be findable by ID. I created prompts to get IDs added everywhere and found that I had to make decisions that required an understanding of Flutter testing in order to get the best answer. Most of my widgets were wrappers for the actual field or text that the component represented. This created tension over where we wanted the key to be bound. Some of the components were generated by passing in a key. Some were generated supporting a FieldKey that was actually set on the wrapped component. I ended up standardizing on a key for the custom component. This meant the actual text or field had to be found in the test by searching for descendants of the ID I knew. The alternative was to pass in two keys or only support the field key

There were a lot of iterations around state management to get the lifespan correct for various Riverpod notifiers and 'ref.watch', 'ref.read' operations.

Adding accessibility support required fiddling with the prompts to get the Semantic objects I wanted. There were a couple of cases where providing the same prompt twice in a row solved my problem. The first one did most of the work, and the 2nd one fixed the broken part.

Integration Tests

Long dropdowns are not fully visible when pressed on if the number of options is too long. Copilot never offered to scroll to find the item I wanted. Getting the CoPilot to scroll to find the items I was looking for was painful. It never did generate exactly the code I would have wanted. The code at this time is a bit of a hack where it scrolls by some big amount to force the other end to become visible.

Copilot mostly got the field keys right when using finders 'byKey'. Sometimes it completely lost the plot when trying to iterate fixes, hallucinating key names, especially if they were generated.

My custom components had `keys`. The actual Flutter component we needed to enter data in or validate against was some wrapped component. I had trouble prompting for a solution.

Working with component navigation in tests was a lot easier to prompt for once we had a project example. Then it was almost automatic.

Video

This article in video

Sample code on Github

https://github.com/freemansoft/fs-game-score

Revision History

Created 2025/07

Blog de Joe Freeman