Enable src packages - edit /etc/apt/sources.list and un-comment lines starting with dev-src.

Install dependencies:

sudo apt update
sudo apt build-dep wesnoth
sudo apt install libcurl4-openssl-dev libsdl2-dev libsdl2-image-dev libsdl2-mixer-dev libsdl2-ttf-dev libboost-all-dev libvorbis-dev libcairo2-dev libpango1.0-dev libssl-dev libreadline-dev cmake make scons pkgconf

Download the source code from https://wiki.wesnoth.org/Download, extract and compile.

tar -xf wesnoth-1.18.0.tar.bz2
cd wesnoth-1.18.0
scons wesnoth

To play the game, you don’t need to install it, simply run in from the same directory:

./wesnoth

Enjoy!


This blog describes how to use Azure libraries (SDK) for Python to list Samba File Shares and read files and folders on Samba File share on Azure.

Overview of Azure SDKs can be found here.

Bicep to create Samba File share

To create SMB File Share using bicep code use Microsoft.Storage storageAccounts/fileServices/shares resource as described here.

The following bicep code creates a file share in an existing storage account:

targetScope = 'resourceGroup'

param storageAccountName string
param fileShareName string


resource storageAccount 'Microsoft.Storage/storageAccounts@2023-01-01' existing = {
  name: storageAccountName
  scope: resourceGroup()
}

resource fileService 'Microsoft.Storage/storageAccounts/fileServices@2021-09-01' = {
  name: 'default'
  parent: storageAccount
}

resource fileShare 'Microsoft.Storage/storageAccounts/fileServices/shares@2023-01-01' = {
  name: fileShareName
  parent: fileService
  properties: {
    enabledProtocols: 'SMB'
  }
}

It can be invoked from a parent bicep file as follows:


param location string = resourceGroup().location
param fileShareName string = 'samba-file-share'
param storageResourceGroupName string = 'rg-storage'
param storageAccountName string = 'sambastorageaccount'

resource storageResourceGroup 'Microsoft.Resources/resourceGroups@2021-04-01' existing = {
  name: storageResourceGroupName
  location: location
}

module fileShares 'file_service_share.bicep' = {
  name: fileShareName
  scope: storageResourceGroup
  params: {
    storageAccountName: storageResourceGroupName
    fileShareName: fileShareName
  }
}

List files and folders in SMB File Share

Authorize requests to Azure Storage

We are going to use role-based access control (RBAC) with Microsoft Entra ID. Azure Storage accepts OAuth 2.0 access tokens from the Microsoft Entra tenant associated with the subscription that contains the storage account. To do so we have to provide appropriate role for the Service Principal.

List of permissions for File service operations can be found here.

Most of the ‘read’ operations require the following roles:

Microsoft.Storage/storageAccounts/fileServices/fileShares/files/read
Microsoft.Storage/storageAccounts/fileServices/readFileBackupSemantics/action

After reviewing list of Azure build in roles it can be found that Storage File Data Privileged Reader role is what we need. It has the following permissions:

"permissions": [
    {
      "actions": [],
      "notActions": [],
      "dataActions": [
        "Microsoft.Storage/storageAccounts/fileServices/fileshares/files/read",
        "Microsoft.Storage/storageAccounts/fileServices/readFileBackupSemantics/action"
      ],
      "notDataActions": []
    }
]

To assign the role to the Service Principal use the following bicep code:

targetScope = 'resourceGroup'

param fileShareName string = 'samba-file-share'
param storageResourceGroupName string = 'rg-storage'
param storageAccountName string = 'sambastorageaccount'
param principalId string = '...' # object id of samba-file-share-sp

var storageFileDataPrivilegedReaderRole = subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'b8eda974-7b85-4f76-af95-65846b26df6d')

resource storageAccount 'Microsoft.Storage/storageAccounts@2022-09-01' existing = {
  name: storageAccountName
  scope: resourceGroup()
}

resource roleAssignment 'Microsoft.Authorization/roleAssignments@2020-10-01-preview' = {
  name: guid(storageAccountName, principalId, storageFileDataPrivilegedReaderRole)
  scope: storageAccount
  properties: {
    roleDefinitionId: storageFileDataPrivilegedReaderRole
    principalId: principalId
  }
}

Python SDK to access files and folders in File Share

Azure Storage File Share client library for Python used in the code below can be found here.

To run the code install necessary Azure libraries for Python:

pip install azure-identity
pip install azure-storage-file-share

The code below uses ClientSecretCredential to authenticate with Azure AD and ShareClient to access the file share:

from azure.identity import ClientSecretCredential
from azure.storage.fileshare import ShareClient, ShareServiceClient

# Set the storage resource group, account, and share names
storage_account = "sambastorageaccount"
share_name = "samba-file-share"

# Set the Azure AD tenant ID, client ID, and client secret for 'samba-file-share-sp'
tenant_id = "..."
client_id = "..."
client_secret = "..."

# Set the account URL
account_url = f"https://{storage_account}.file.core.windows.net"

# Create a client secret credential
credential = ClientSecretCredential(tenant_id=tenant_id, client_id=client_id, client_secret=client_secret)

# Create a share client
share_client = ShareClient(account_url, share_name, credential=credential, token_intent="backup")

# List directories and files
directories_and_files = list(share_client.list_directories_and_files())
print(f"directories_and_files: {directories_and_files}")

# Get directory properties
directory_client = share_client.get_directory_client("smb_folder")
directory_properties = directory_client.get_directory_properties()
print(f"directory_properties: {directory_properties}")

Sample response is presented below:

directories_and_files: [{'name': 'smb_folder', 'last_modified': None, 'etag': None, 'server_encrypted': None, 'metadata': None, 'change_time': None, 'creation_time': None, 'last_write_time': None, 'last_access_time': None, 'file_attributes': None, 'permission_key': None, 'file_id': '13835128424026341376', 'parent_id': None, 'is_directory': True}]
directory_properties: {'name': None, 'last_modified': datetime.datetime(2024, 3, 8, 15, 12, 7, tzinfo=datetime.timezone.utc), 'etag': '"0x8DC3F821EBF3806"', 'server_encrypted': True, 'metadata': {}, 'change_time': datetime.datetime(2024, 3, 8, 15, 12, 7, 391437), 'creation_time': datetime.datetime(2024, 3, 8, 15, 12, 7, 391437), 'last_write_time': datetime.datetime(2024, 3, 8, 15, 12, 7, 391437), 'last_access_time': None, 'file_attributes': 'Directory', 'permission_key': '13873210695300244643*12131762314841879338', 'file_id': '13835128424026341376', 'parent_id': '0', 'is_directory': True}

Details regarding ShareClient class can be found here.

List SMB File Shares

Authorize requests to Azure Storage

We are going to use role-based access control (RBAC) with Microsoft Entra ID. Azure Storage accepts OAuth 2.0 access tokens from the Microsoft Entra tenant associated with the subscription that contains the storage account. To do so we have to provide appropriate role for the Service Principal.

To list shares we would need the following permissions:

Microsoft.Storage/storageAccounts/fileServices/shares/read

After reviewing list of Azure build in roles it can be found that Reader role is what we need. It has the following permissions:

"permissions": [
    {
      "actions": [
        "*/read"
      ],
      "notActions": [],
      "dataActions": [],
      "notDataActions": []
    }
]

To assign the role to the Service Principal use the following bicep code:

targetScope = 'resourceGroup'

param fileShareName string = 'samba-file-share'
param storageResourceGroupName string = 'rg-storage'
param storageAccountName string = 'sambastorageaccount'
param principalId string = '...' # object id of samba-file-share-sp

var readerRole = subscriptionResourceId('Microsoft.Authorization/roleDefinitions', 'acdd72a7-3385-48ef-bd42-f606fba81ae7')

resource storageAccount 'Microsoft.Storage/storageAccounts@2022-09-01' existing = {
  name: storageAccountName
  scope: resourceGroup()
}

resource roleAssignment 'Microsoft.Authorization/roleAssignments@2020-10-01-preview' = {
  name: guid(storageAccountName, principalId, readerRole)
  scope: storageAccount
  properties: {
    roleDefinitionId: readerRole
    principalId: principalId
  }
}

Python SDK to list File Shares

StorageManagementClient library for Python used in the code below can be found here.

To run the code install necessary Azure libraries for Python:

pip install azure-identity
pip install azure-mgmt-storage

The code below uses ClientSecretCredential to authenticate with Azure AD and StorageManagementClient to list file shares:

from azure.identity import ClientSecretCredential
from azure.mgmt.storage import StorageManagementClient

# Subscription Id
subscription_id = "..."

# Set the storage resource group, account, and share names
storage_resource_group = "rg-storage"
storage_account = "sambastorageaccount"

# Set the Azure AD tenant ID, client ID, and client secret for 'samba-file-share-sp'
tenant_id = "..."
client_id = "..."
client_secret = "..."

# Create a client secret credential
credential = ClientSecretCredential(tenant_id=tenant_id, client_id=client_id, client_secret=client_secret)

# Create a storage management client
client = StorageManagementClient(credential=credential, subscription_id=subscription_id)

# List file shares
response = client.file_shares.list(resource_group_name=storage_resource_group, account_name=storage_account)
for item in response:
    print(item)

Sample response is presented below:

{'additional_properties': {}, 'id': '/subscriptions/.../resourceGroups/rg-storage/providers/Microsoft.Storage/storageAccounts/sambastorageaccount/fileServices/default/shares/samba-file-share', 'name': 'samba-file-share', 'type': 'Microsoft.Storage/storageAccounts/fileServices/shares', 'etag': '"0x8DC4088F239CCBE"', 'last_modified_time': datetime.datetime(2024, 3, 9, 22, 33, 30, tzinfo=<isodate.tzinfo.Utc object at 0x7f4380077490>), 'metadata': None, 'share_quota': 5120, 'enabled_protocols': 'SMB', 'root_squash': None, 'version': None, 'deleted': None, 'deleted_time': None, 'remaining_retention_days': None, 'access_tier': 'TransactionOptimized', 'access_tier_change_time': datetime.datetime(2024, 3, 8, 10, 45, 36, tzinfo=<isodate.tzinfo.Utc object at 0x7f4380077490>), 'access_tier_status': None, 'share_usage_bytes': None, 'lease_status': 'unlocked', 'lease_state': 'available', 'lease_duration': None, 'signed_identifiers': None, 'snapshot_time': None}

Very nice, short (you will do it in 1h) course by DeepLearning.AI & Sharon Zhou - How Diffusion Models Work.

The sample code to train your own model is built using PyTorch. Includes Sharon showing DDIM - denoising diffusion implicit models sampler, much faster than Denoising diffusion probabilistic models (DDPMs).


Harvardzki podręcznik okładka

Co działa:

  • Nauka wymaga wysiłku.
  • Ćwiczenia w przywoływaniu - fiszki, quizy, testy.
  • Refleksja (Co się stało? Co zrobiłem? Jak to rozwiązałem? Co zrobiłbym inaczej następnym razem?).
  • Informacja zwrotna dotycząca niewłaściwych odpowiedzi powinna być podana z opóźnieniem.
  • Odpowiedzi typu esej lub krótka odpowiedź są lepsze niż testy wielokrotnego wyboru.
  • Ćwiczenia rozłożone w czasie. Na tyle długie, aby rozpoczął się proces zapominania, minimum 1 dzień.
  • Próba uporania się z problemem, zanim zostaniemy nauczeni jego rozwiązania - generowanie.
  • Ćwiczenia przemieszane i zróżnicowane. Zmiana zagadnienia przed jego przećwiczeniem i opanowaniem w pełni.
  • Uczenie nowych treści na podstawie już znanych.
  • Umieszczanie wiedzy w szerszym kontekście.
  • Wydobywanie kluczowych idei z nowego materiału i organizowanie ich.
  • Wiara, że zdolności intelektualne można rozwijać. Nastawienie na rozwój.
  • Wiara, że popełnianie błędów jest nieodłączną częścią procesu uczenia się.
  • Posiadanie wsparcia w rodzicach, wychowawcach, mentorach, którzy w nas wierzą.
  • Uczenie się reguł, a nie przykładów.
  • Chwalenie za wysiłek, ciężką pracę.
  • Odwaga, ciekawość, wytrwałość.
  • Mnemotechniki, pałac pamięci.

Co nie działa:

  • Ćwiczenia skomasowane, wkuwanie.
  • Ćwiczenia zblokowane.
  • Wielokrotna lektura tekstu w krótkim czasie to strata czasu.
  • Nauka zgodna z preferowanym stylem nauczania (np. słuchowcy vs wzrokowcy).
  • Lęk przed niepowodzeniem i popełnianiem błędów. Zaśmiecamy pamięć roboczą na zadręczanie się trudnością otrzymanego zadania.
  • Słabo oceniamy, kiedy nauka idzie nam dobrze, a kiedy źle.
  • Wiara, że zdolności intelektualne są niezmienne, dziedziczone. Przypisywanie niepowodzeń brakowi zdolności - “nie jestem inteligentny”.
  • Koncentracja na osiągnięciu wyniku (zamiast na zdobuciu umiejętności i rozwoju).
  • Trudności niepożądane - takie, których nie jesteśmy w stanie pokonać.
  • Chwalenie za naturalny talent, np. bystrość.

Proces uczenia się:

  • Kodowanie
  • Konsolidacja
  • Przywoływanie

Przypisy, notatki:

Rodzaje inteligencji według Howard Gardner:

  • inteligencja przestrzenna
  • lingwistyczna
  • kinestetyczna
  • muzyczna
  • interpersonalna
  • wewnętrzna
  • przyrodnicza

Thanks to Mozilla’s llamafile it is now super easy to run LLM locally.

Install and register ape as per https://github.com/Mozilla-Ocho/llamafile#gotchas.

sudo wget -O /usr/bin/ape https://cosmo.zip/pub/cosmos/bin/ape-$(uname -m).elf
sudo chmod +x /usr/bin/ape
sudo sh -c "echo ':APE:M::MZqFpD::/usr/bin/ape:' >/proc/sys/fs/binfmt_misc/register"
sudo sh -c "echo ':APE-jart:M::jartsr::/usr/bin/ape:' >/proc/sys/fs/binfmt_misc/register"

Download the model from Hugging Face, make it executable and run it:

curl -LO https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/main/llava-v1.5-7b-q4-server.llamafile
chmod +x llava-v1.5-7b-q4-server.llamafile
./llava-v1.5-7b-q4-server.llamafile

Open http://127.0.0.1:8080.

llama


AI w strategii_okladka

Firma, która nie ma jasnej wizji wykorzystania AI, nie utrzyma się na rynku. Nie jest już potrzebna strategia dla AI, a strategia zawierająca AI (według Strategy For and With AI).

Problem w tym, że opracowania miejsca sztucznej inteligencji w strategii nie da się w pełni zlecić na zewnątrz. Magicy od AI mogą rozhuśtać część technologiczną, acz koniec końców największym wyzwaniem jest to, jaką dźwignię podłożyć pod AI, by zyskać przewagę konkurencyjną w swojej gałęzi rynku i domenie strategicznej (według What’s Your AI Strategy?).

[…] jak podaje Boston Consulting Group firmy, które zyskały dźwignię dzięki skali AI, przeznaczają 10% swojego budżetu AI na algorytmy, 20% na technologie i nawet 70% na upewnienie się, że AI jest dobrze zintegrowana z procesami biznesowymi. Postrzeganie sztucznej inteligencji głównie jako inwestycji w sprzęt czy oprogramowanie to chybiona wizja - najwięcej pracy do wykonania jest w odpowiedniej implementacji.

Końcowa uwaga: nie istnieje uniwersalne podejście do wdrożenia AI w strategii, bo kluczem do sukcesu w jej wdrożeniu jest zdanie sobie sprawy z charakteru i potrzeb firmy.


Just finished nice course available on Coursera. The course is done by DeepLearning.AI & Amazon Web Services. It’s taught by AWS employees: Antje Barth, Shelbee Eigenbrode, Mike Chambers, Chris Fregly.

The course is technical enough to give you overall understangding about how LLMs work and how one can implement them into a production application.

Lecture notes were made available under Creative Commons License - thanks DeepLearning.AI!

Here they are:

  • Generative AI with LLMs Lecture Notes - week 1
  • Generative AI with LLMs Lecture Notes - week 2
  • Generative AI with LLMs Lecture Notes - week 3

We want to run some process in the background and continue script execution. Then at some point we want to stop and wait for this process to finish. We also want to capture the exit code of this process. Once we have PID of the process we put in background, wait function is all we need. It will return the exit code of the process it waited for, even if the process has finished its execution before wait function was called.

#!/usr/bin/env bash

# start sleep process in background
echo Starting sleep
(sleep 5; exit 3) &
# get the pid of the last process run
pid=$!

echo Processing...
sleep 10
echo "sleep must be done by now"
ps aux | grep sleep

# wait for the process to finish
echo Waiting for sleep to finish
wait $pid
echo "Sleep finished with exit code $?"
echo done

Stanford’s researchers have done a great job evaluating some foundation models against their compliance with proposed EU law on AI.

The results are: llm_vs_ai_act_results

Rubrics used and short description of the 12 requirements taken into account:

llm_vs_ai_act_table

1. Data sources (additive)

- \+1: Very generic/vague description (e.g. “Internet data”)
- \+1: Description of stages involved (e.g. training, instruction-tuning)
- \+1: Sizes (relative or absolute) of different data sources
- \+1: Fine-grained sourcing (e.g. specific URLs like Wikipedia, Reddit)

2. Data governance

- 0 points: No discussion of data governance
- 1 point: Vague mention of governance with no concreteness
- 2-3 points: Some grounded discussion or specific protocols around governance related to suitability and/or bias of data sources
- 4 points: Explicit constraint on requiring governance measures to include data

3. Copyrighted data

- 0 points: No description.
- 1 point: Very generic/vague acknowledgement of copyright (e.g. tied to “Internet data”)
- 2-3 points: Some grounded discussion of specific copyrighted materials
- 4 points: Fine-grained separation of copyrighted vs. non-copyrighted data

4. Compute (additive)

- \+1: model size
- \+1: training time as well as number and type of hardware units (e.g. number of A100s)
- \+1: training FLOPs
- \+1: Broader context (e.g. compute provider, how FLOPs are measured)

5. Energy (additive)

- \+1: energy usage
- \+1: emissions
- \+1: discussion of measurement strategy (e.g. cluster location and related details)
- \+1: discussion of mitigations to reduce energy usage/emissions

6. Capabilities and limitations

- 0 points: No description.
- 1 point: Very generic/vague description
- 2-3 points: Some grounded discussion of specific capabilities and limitations
- 4 points: Fine-grained discussion grounded in evaluations/specific examples

7. Risks and mitigations (additive)

- \+1: list of risks
- \+1: list of mitigations
- \+1: description of the extent to which mitigations successfully reduce risk
- \+1: justification for why non-mitigated risks cannot be mitigated

8. Evaluations (additive)

- \+1: measurement of accuracy on multiple benchmarks
- \+1: measurement of unintentional harms (e.g. bias)
- \+1: measurement of intentional harms (e.g. malicious use)
- \+1: measurement of other factors (e.g. robustness, calibration, user experience)

9. Testing (additive)

- \+1 or +2: disclosure of results and process of (significant) internal testing
- \+1: external evaluations due to external access (e.g. HELM)
- \+1: external red-teaming or adversarial evaluation/stress-testing (e.g. ARC)

10. Machine-generated content

- \+ 1-3 points: Disclosure that content is machine-generated within direct purview of the foundation model provider (e.g when using OpenAI API).
- \+1 point: Disclosed mechanism to ensure content is identifiable as machine-generated even beyond direct purview of foundation model provider (e.g. watermarking).

11. Member states

- 0 points: No description of deployment practices in relation to the EU.
- 2 points: Disclosure of explicitly permitted/prohibited EU member states at the organization operation level.
- 4 points: Fine-grained discussion of practice per state, including any discrepancies in how the foundation model is placed on the market or put into service that differ across EU member states.

12. Downstream documentation

- 0 points: No description of any informational obligations or documentation.
- 1 point: Generic acknowledgement that information should be provided downstream.
- 2 points: Existence of relevant documentation, including in public reports, though mechanism for supplying to downstream developers is unclear.
- 3-4 points: (Fairly) clear mechanism for ensuring foundation model provider provides appropriate documentation to downstream providers.

Full list of requirements found by the researchers in AI Act draft:

  • Registry [Article 39, item 69, page 8 as well as Article 28b, paragraph 2g, page 40]. In order to facilitate the work of the Commission and the Member States in the artificial intelligence field as well as to increase the transparency towards the public, providers of high-risk AI systems other than those related to products falling within the scope of relevant existing Union harmonization legislation, should be required to register their high-risk AI system and foundation models in a EU database, to be established and managed by the Commission. This database should be freely and publicly accessible, easily understandable and machine-readable. The database should also be user friendly and easily navigable, with search functionalities at minimum allowing the general public to search the database for specific high-risk systems, locations, categories of risk under Annex IV and keywords. Deployers who are public authorities or European Union institutions, bodies, offices and agencies or deployers
  • Provider name [Annex VIII, Section C, page 24]. Name, address and contact details of the provider.
  • Model name [Annex VIII, Section C, page 24]. Trade name and any additional unambiguous reference allowing the identification of the foundation model.
  • Data sources [Annex VIII, Section C, page 24]. Description of the data sources used in the development of the foundation model.
  • Capabilities and limitations [Annex VIII, Section C, page 24]. Description of the capabilities and limitations of the foundation model.
  • Risks and mitigations [Annex VIII, Section C, page 24 and Article 28b, paragraph 2a, page 39]. The reasonably foreseeable risks and the measures that have been taken to mitigate them as well as remaining non-mitigated risks with an explanation on the reason why they cannot be mitigated.
  • Compute [Annex VIII, Section C, page 24]. Description of the training resources used by the foundation model including computing power required, training time, and other relevant information related to the size and power of the model.
  • Evaluations [Annex VIII, Section C, page 24 as well as Article 28b, paragraph 2c, page 39]. Description of the model’s performance, including on public benchmarks or state of the art industry benchmarks.
  • Testing [Annex VIII, Section C, page 24 as well as Article 28b, paragraph 2c, page 39]. Description of the results of relevant internal and external testing and optimisation of the model.
  • Member states [Annex VIII, Section C, page 24]. Member States in which the foundation model is or has been placed on the market, put into service or made available in the Union.
  • Downstream documentation [Annex VIII, 60g, page 29 as well as Article 28b, paragraph 2e, page 40]. Also, foundation models should have information obligations and prepare all necessary technical documentation for potential downstream providers to be able to comply with their obligations under this Regulation.
  • Machine-generated content [Annex VIII, 60g, page 29]. Generative foundation models should ensure transparency about the fact the content is generated by an AI system, not by humans.
  • Pre-market compliance [Article 28b, paragraph 1, page 39]. A provider of a foundation model shall, prior to making it available on the market or putting it into service, ensure that it is compliant with the requirements set out in this Article, regardless of whether it is provided as a standalone model or embedded in an AI system or a product, or provided under free and open source licenses, as a service, as well as other distribution channels.
  • Data governance [Article 28b, paragraph 2b, page 39]. Process and incorporate only datasets that are subject to appropriate data governance measures for foundation models, in particular measures to examine the suitability of the data sources and possible biases and appropriate mitigation.
  • Energy [Article 28b, paragraph 2d, page 40]. Design and develop the foundation model, making use of applicable standards to reduce energy use, resource use and waste, as well as to increase energy efficiency, and the overall efficiency of the system. This shall be without prejudice to relevant existing Union and national law and this obligation shall not apply before the standards referred to in Article 40 are published. They shall be designed with capabilities enabling the measurement and logging of the consumption of energy and resources, and, where technically feasible, other environmental impact the deployment and use of the systems may have over their entire lifecycle.
  • Quality management [Article 28b, paragraph 2f, page 40]. Establish a quality management system to ensure and document compliance with this Article, with the possibility to experiment in fulfilling this requirement.
  • Upkeep [Article 28b, paragraph 3, page 40]. Providers of foundation models shall, for a period ending 10 years after their foundation models have been placed on the market or put into service, keep the technical documentation referred to in paragraph 1(c) at the disposal of the national competent authorities.
  • Law-abiding generated content [Article 28b, paragraph 4b, page 40]. Train, and where applicable, design and develop the foundation model in such a way as to ensure adequate safeguards against the generation of content in breach of Union law in line with the generally acknowledged state of the art, and without prejudice to fundamental rights, including the freedom of expression.
  • Training on copyrighted data [Article 28b, paragraph 4c, page 40]. Without prejudice to national or Union legislation on copyright, document and make publicly available a sufficiently detailed summary of the use of training data protected under copyright law.
  • Adherence to general principles [Article 4a, paragraph 1, page 142-3]. All operators falling under this Regulation shall make their best efforts to develop and use AI systems or foundation models in accordance with the following general principles establishing a high-level framework that promotes a coherent humancentric European approach to ethical and trustworthy Artificial Intelligence, which is fully in line with the Charter as well as the values on which the Union is founded: a) ‘human agency and oversight’ means that AI systems shall be developed and used as a tool that serves people, respects human dignity and personal autonomy, and that is functioning in a way that can be appropriately controlled and overseen by humans. b) ‘technical robustness and safety’ means that AI systems shall be developed and used in a way to minimize unintended and unexpected harm as well as being robust in case of unintended problems and being resilient against attempts to alter the use or performance of the AI system so as to allow unlawful use by malicious third parties. c) ‘privacy and data governance’ means that AI systems shall be developed and used in compliance with existing privacy and data protection rules, while processing data that meets high standards in terms of quality and integrity. d) ‘transparency’ means that AI systems shall be developed and used in a way that allows appropriate traceability and explainability, while making humans aware that they communicate or interact with an AI system as well as duly informing users of the capabilities and limitations of that AI system and affected persons about their rights. e) ‘diversity, non-discrimination and fairness’ means that AI systems shall be developed and used in a way that includes diverse actors and promotes equal access, gender equality and cultural diversity, while avoiding discriminatory impacts and unfair biases that are prohibited by Union or national law. f) ‘social and environmental well-being’ means that AI systems shall be developed and used in a sustainable and environmentally friendly manner as well as in a way to benefit all human beings, while monitoring and assessing the long-term impacts on the individual, society and democracy. For foundation models, the general principles are translated into and complied with by providers by means of the requirements set out in Articles 28 to 28b.
  • System is designed so users know its an AI [Article 52(1) Paragraph 1 - not in the Compromise text, but invoked in 28(b), paragraph 4a, page 40]. Providers shall ensure that AI systems intended to interact with natural persons are designed and developed in such a way that natural persons are informed that they are interacting with an AI system, unless this is obvious from the circumstances and the context of use. This obligation shall not apply to AI systems authorised by law to detect, prevent, investigate and prosecute criminal offences, unless those systems are available for the public to report a criminal offence.
  • Appropriate levels [Article 28b, paragraph 2c, page 39]. design and develop the foundation model in order to achieve throughout its lifecycle appropriate levels of performance, predictability, interpretability, corrigibility, safety and cybersecurity assessed through appropriate methods such as model evaluation with the involvement of independent experts, documented analysis, and extensive testing during conceptualisation, design, and development

Source


The paper that kick-started LLM development is attention_is_all_you_need from Google, published in June 2017. It introduced Transformer architecture and “self-attention” mechanism.