Jump to content

Abstract Wikipedia:Abstract article architectures

From Abstract Wikipedia

This page proposes to discuss the possible different architectures in which the generation of an Abstract Wikipedia article can be organized.

The main issues are how to handle context, and how to keep execution times reasonable.

Pleas feel free to improve this page, to comment below the architecture sections, and to add your own alternative proposals.

Current architecture

[edit source]

This architecture is the currently used one: an Abstract Wikipedia article is the collection of multiple fragments, that are inserted in the article as single function calls. The different types of sentence are encoded as call to different functions, with the content encoded by the function parameters. Each fragment is independent from the other. Currently fragment generating functions output monolingual texts, that then need to be converted into HTML code; it is possible to also define fragment generating functions that directly output HTML code, in order to include formattings/wikilinks/etc.

  • Being the current architecture, it is already supported.
  • For each "fragment" a different function is called, allowing fast-executing functions and to parallelise the generation of an Abstract article.
    • Note that, with the current implementation, this advantage disappears when multiple fragments are grouped in paragraphs or other text-organizing functions.
  • Somewhat intuitive both for Abstract Wikipedia editors and for Functioneers.
  • Each type of sentence requires the definition of its own function. Considering that even simple sentences can have multiple combinations, the number of needed functions would undergo to combinatoric explosion.
    • Also note that each function needs to be singularly implemented in each language, with little possibility of code reuse. It would be almost impossible for smaller languages to keep up with the implementation of an ever growing number of fragments.
  • Each fragment would be generated without any knowledge of the global context. Considering that natural languages heavily rely on context, the generation of grammatical and somewhat natural-looking text would become impossible with articles with more than 2-3 sentences.
  • Currently, the fragment generating functions are very vaguely defined: they are basically defined through the English translation of the fragment, introducing a heavy bias towards English and Indo-European languages, as can be proved for example by the fact that f:Z26039 and f:Z26095 are different functions, while f:Z26570 does not define in which sense the entity is in the location (is it in the geographical area? physically inside the building? near the location?).
    • Note that this con can be resolved by a better redefinition of the fragments. This would still cause the definition of hyper-specific fragment functions, with the consequent lost of intuitivity and the already mentioned combinatorial explosion.

Comments

[edit source]

Modest improvement to current architecture using spans

[edit source]

tl;dr: Do all linguistic processing on plaintext (Z11/Monolingual text), but keep track of which span(s) in the string corresponds to each Item from the outer callsite. Then that outer Function can format and linkify the final string by iterating those ranges as ordered by their start index.

(Existing types, for completeness)
type WFString = Z6;
type MonolingualText = Z11; // = Pair<Langcode, WFString>
type Langcode = Z60;
type HTMLFragment = Z89;
type List<E> = Z881<E>;
type Pair<F, S> = Z882<F, S>;
type Map<K, V> = Z883<K, V>; // = List<Pair<K, V>> w/ key uniqueness invariant
type WDItem = Z6001;
type WDItemRef = Z6091;
type Natural = Z13518;
type Span = Pair<Natural, Natural>;
type Multimap<K, V> = Map<K, List<V>>; // = List<Pair<K, V>> w/o key uniqueness invariant
type NLGOutput = Pair<MonolingualText, Multimap<WDItemRef, Span>>;
type RichTextTransform = fn(HTMLFragment) -> HTMLFragment;

type SuperlativeInLocationDefiningSentence<R> = fn(
	subject: WDItemRef,
	class: WDItemRef,
	location: WDItemRef,
	characteristic: WDItemRef,
	displayLang: Langcode,
) -> R;
pub static superlativeInLocationDefiningSentence: SuperlativeInLocationDefiningSentence<HTMLFragment>
	= |subj, class, loc, charc, lang| applyFormatting(
		toSentenceCaseAndAddPunctuation(
			(match lang {
				Langcode("en", _) => enSuperlativeInLocationDefiningSentence,
				_ => todo!(),
			})(subj, class, loc, charc, lang),
			lang
		),
		Multimap([
			createArticleSubjectEmphasisTransformation(subj),
			createLinkifyTransformation(class, awArticleLinkFor(class)),
			createLinkifyTransformation(loc, awArticleLinkFor(loc)),
		])
	);

pub static toSentenceCaseAndAddPunctuation: (NLGOutput, Langcode) -> NLGOutput
	= todo!();
pub static createArticleSubjectEmphasisTransformation: (WDItemRef) -> RichTextTransform
	= |item| Pair(item, |innerHTML| HTMLFragment(format!("<strong>{innerHTML}</strong>")));
pub static awArticleLinkFor: (WDItemRef) -> WFString
	= |item| format!("https://abstract.wikipedia.org/wiki/{item.qid}");
pub static createLinkifyTransformation: (WDItemRef, WFString) -> RichTextTransform
	= |item, linkTarget| Pair(item, |innerHTML| HTMLFragment(format!("<a href=\"{linkTarget}\">{innerHTML}</a>")));
pub static applyFormatting: (NLGOutput, Multimap<WDItemRef, RichTextTransform>) -> HTMLFragment
	= todo!(); // handwaving, but it would split the Monolingual text in the 1st arg's first item based on the spans from the 1st arg's second item, convert them to HTML fragments, then apply each of the transformations from the 2nd arg
(The implementation enSuperlativeInLocationDefiningSentence, as an example)
pub static enSuperlativeInLocationDefiningSentence: SuperlativeInLocationDefiningSentence<NLGOutput>
	= |subj, class, loc, charc, lang| {
		let words = [
			labelFor(subj, lang),
			enCopulaFor(subj),
			MonolingualText(lang, "the"),
			enSuperlativeFor(charc),
			labelFor(class, lang),
			enPrepositionForLoc(loc),
			labelFor(loc, lang),
		];
		NLGOutput(
			MonolingualText(lang, words.join(" ")),
			[
				// Rust is 0-indexed, but there's probably a bunch of errors here anyway, so don't copy it
				Span(subj, 0..words[0].length),
				Span(class, (4 + &words[0..=3].iter().map(|&text| text.length).sum())
					..(4 + &words[0..=4].iter().map(|&text| text.length).sum())),
				Span(loc, (6 + &words[0..=5].iter().map(|&text| text.length).sum())
					..(6 + &words[0..=6].iter().map(|&text| text.length).sum())),
				Span(charc, (3 + &words[0..=2].iter().map(|&text| text.length).sum())
					..(3 + &words[0..=3].iter().map(|&text| text.length).sum())),
			]
		)
	};
  • The existing corpus of NLG Functions can be either kept as-is (e.g. inflection helpers) or migrated piecemeal (sentence-generating Functions).
  • This texts+spans structure puts to rest the question of which Type to use for the outputs of sentence-generating Functions: Z11, not Z89 (and definitely not Z6).
  • Inherits the mn problem of the status quo: each sentence-generating Function still needs an Implementation in every language. There's no new affordance for constructing grammatically correct sentences from smaller fragments.
  • Inherits other problems of the status quo, covered in #Current architecture above.

Comments

[edit source]

(With apologies to non-Rust programmers for its use here as a statically-typed lingua franca, and with apologies to Rust programmers for weird or broken syntax. I know pub fn <identifier> exists.)
This outlines what I foresee as the most evolved form of the current architecture, and is an expansion of my offhand comment here.
I'm only offering this as a "baseline", I can't advocate its adoption (except as a stop-gap, since as mentioned above it does solve the return type problem). Personally, I want to see an architecture which can attain prose, or something close to it, in most of the world's languages.
YoshiRulz (talk) 21:26, 11 May 2026 (UTC)[reply]

Semantic unit architecture

[edit source]

This architecture is the architecture explained in the Wikifunctions type proposal for Semantic units by user Mahir256 (see the example to better understand how the encoding of abstract content and of context would work). The architecture explained here is mostly taken from the article , with some minor tweaking to accommodate to the actual needs of Abstract Wikipedia.

The entire Abstract article would be generated in a single function call. The input would be the entire abstract content enclosed in a "context", that would encode all the meta-information that is necessary for the generation of the article (like the references to all the real-life objects that do not have a QID that uniquely identifies them).

The (really simplified) rendering process would be as follows:

  1. The abstract article is converted by rendering functions to a syntactic tree, i.e. a tree encoding the lexemes that are present in the sentence and the mutual relation between them. The rendering function acts on a single semantic unit (i.e., a node of the abstract article tree), and recursively calls the rendering functions acting on the child semantic units.
  2. The syntactic tree is rendered into the final HTML article. This renderer comprises various steps, that generally require a complete traversal of the entire syntactic tree or of the internal partial representation.
  • It is an architecture that allows the creation of a cohesive text, thanks to the support of a shared context. It also allows more freedom for the language-specific renderers to organize the text in a way that makes more sense to the specific language.
  • Since the abstract content is encoded as a tree, it prevents the combinatoric explosion of constructor types and of needed implementations.
  • The generation of an entire article with a single function call is now not possible, due to the prohibitively high execution times. This architecture does not allow to parallelise the generation of an article, or just to progressively generate the article in a progressive fashion.
  • The learning curve would initially be very steep both for Abstract Wikipedia editors and for Functioneers; after the initial difficulties, this encoding would not include any particular difficulty, and it would probably be easier to use in order to encode complex abstract content that hyper-specific fragments.

Comments

[edit source]

Stream semantic units

[edit source]

This architecture is based on the semantic unit architecture, but with the difference that the abstract content, instead of being completely provided in a single function call, is split into paragraphs.

Each renderer internally would work very similarly to the renders described in the previous section. The article is generated by the concatenation of the outputs of the various renderers.

Starting from the second renderer, each renderer would have two sources of inputs:

  • the input defined in the function call (i.e., in Abstract Wikipedia), consisting in the Semantic unit representing a part of the article (indicatively, a paragraph),
  • the input coming from the previous renderer, consisting in 2 inputs:
    • the abstract content that the previous renderer decided not to render,
    • the context (initially created on Abstract Wikipedia and passed to the first renderer as an input), updated by all the renderers that have executed previously.

The renderer would then execute, providing three outputs:

  • the HTML containing the rendered abstract content that the renderer decided to render,
  • the abstract content (Semantic unit) that for some reasons it may decide not to render and to pass to the next renderer,
  • the context, updated by the renderer itself.

The first output would go directly to compose the Abstract Wikipedia article, while the other 2 outputs will be used as inputs by the next renderer.

At the end of the chain, a "Final renderer" would render the eventual abstract content that has not been rendered yet.

While it is already technically possible to implement this chain on Wikifunctions, it would be a single function call, and, like the previous proposal, it would suffer from the issue that it would need to generate the entire article in one go. The real advantage of this proposal would come if, instead of being implemented by a regular function, it was implemented through a magical function that would execute the renderers sequentially, giving each one the complete function execution timeout, collecting the partial outputs (and showing the partial generated output directly in the page) and passing the internal outputs to the next renderer, alongside the new external input.

  • It would have all the advantages of the Semantic unit architectures, and still give at least partial outputs in a reasonable time (limiting timeouts).
  • Even if it doesn't allow the constructors to know all the abstract content that eventually will need to be rendered, it still gives them a quantity of foreknowledge that should be enough for practical purposes, while maintaining all the necessary memory of the previous content. After all, this is more or less the same type of knowledge humans have while generating linguistic content.
  • It is not currently implementable, since it requires a magic function that has a special behavior.
  • It requires rendering partial abstract content in discrete sequential chunks, without the possibility to refine them once more abstract content is rendered. However, as mentioned before, realistically this should not be a problem, since the quantity of foreknowledge that renderers have available (i.e., the content of the entire paragraph) should be enough for all practical purposes.

Comments

[edit source]