The backend stores data in a typical SQL-based database. For python, we used sqlite3. This is a lightweight, high-speed and efficient database built for small-medium scale applications such as mine. As I am paying for the server out of pocket, efficient use of storage allocation was a necessity. SQLite fulfilled these requirements and did not deprive me of any modern features to make my code cleaner.

To run the application, we used an ASGI server (Asynchronous server gateway interface) to run the API; specifically we used uvicorn. Again, it is lightweight, efficient and can handle asynchronous method calls to allow for an overall faster user experience.

Data Structures

All of the data sent from the frontend is sent in the form of JSON data. To allow our backend to interpret this, we used custom BaseModels for each POST & GET call. For example, if our frontend called a login function such as:

this.http.post(this.URL + "/check_student_login", {email: "example@example.com", username: "example", password: "example"}).subscribe((res: any) => {

Where email, username and password are the JSON parameters. As a result our backend model would look like:

from pydantic import BaseModel # Module to create the models
class PostLoginCheckStudentModel(BaseModel):

    email: str  
    username: str
    password: str

And the API would receive the call like so:

@app.post("/api/check_student_login")
async def check_student_login_post(user: PostLoginCheckStudentModel):

  return check_student_login(user.email, user.password) # Calls a function on the backend

These Data Structures are repeated throughout the API and are paramount to our data manipulation methods.

Encryption

Firstly, all passwords are encrypted before being stored in the backend. They are encrypted using Passlib's CryptContext module with 'bcrypt' as the schema. The password is encrypted with the following code:

from pydantic import BaseModel # Import module
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto") # Identify the schema used 
def hash_password(password): # Take a plaintext password & convert it to ciphertext

    return pwd_context.hash(password)

This will encrypt our password. It is extremely secure as each password is non-reversible. This results in the same input string having a variable output hash as the starting 'salt' used by the algorithm will be different. For example, an input password of 1234 could result in:
- $2b$12$zProG7RawgHpdXZ9RfxYZ.pcwZy4N29C6mRzBb8yAFvoHc5TSSyZC
- $2b$12$0DPNbWZDAX5fJl4pyBHnIuWiLk.2w/rxU/MnuKFznmjouUMjFxW5.
- $2b$12$U6PDFiGrqooLxsnHJ7X9gehpzH3tLiI./BH8oj3HejMSyZvGHPneC
- $2b$12$YJK4L0S8pcFKokWUYqf1VuDtjxy.KptvvBN9VTJIvTPTuwc36KwAe
- $2b$12$MWwHq8PHskJqF2EoGOI60..J92SVyjD5FHVc4Ugbv7ZnD10Wso6yO
- $2b$12$r5qvt.4ik//TSFaVWxWZ3uFGdmUXJL35.FXeeb2uDiWSQSZYQWv0i
With there being a large number of potential passwords based on the starting 'salt'.

To verify the value of our password, we do not reverse the hash. This is because reversing the hash is impossible. Instead, we use an inbuilt function that compares the plain-text to the hashed cipher-text. This functionality is used when authenticating a user during login and is done as such:

from pydantic import BaseModel # Import module
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto") # Identify the schema used 
def verify_password(unhashed, hashed): 

    return pwd_context.verify(unhashed, hashed) # Returns a boolean

This function compares the original 'salt' and 'work factor' (Number of encryption iterations) and calculates whether the hash could have been derived from this input. If the algorithm deems it possible, then a boolean value of True is returned, otherwise the function returns False.

Tokens

When the frontend needs to make any request to the backend, it needs to provide a token. This token is valid for 7 days and stores the name, email and id of the user (student) that is currently logged in.

When a user attempts to login, the details they input are checked against the database. If the email and password inputted align with an email and password from the database, a token is created and sent to the frontend. The token is created using the code:

def get_user_token(student: Student):
    to_encode = {
        'details' : {'name': student.name, 'email': student.email, 'id': student.id},
        'expiry' : str(datetime.utcnow() + timedelta(minutes = ACCESS_TOKEN_EXPIRE_MINUTES))

    }
    return jwt.encode(to_encode, SECRET_KEY, algorithm = ALGORITHM)

Where SECRET_KEY is a 256-bit custom code that is used for encryption and the algorithm is HS256. This token is then stored in the user's browser data. As a result of this, the user will not have to log in to the website on every occurrence of accessing the website. The frontend injects this token into every request as shown by the code:

@Injectable()
export class CookieInterceptor implements HttpInterceptor {

    private platformId = inject(PLATFORM_ID);
    intercept(request: HttpRequest<unknown>, next: HttpHandler): Observable<HttpEvent<unknown>> { // Intercepts an incoming request

        let token: string | undefined;
        if (isPlatformBrowser(this.platformId)) {
        token = document.cookie.split("; ").find((row) => row.startsWith("token="))?.split("=")[1];
    } // Extracts the token from the web-browser

        const modifiedRequest = request.clone({
            withCredentials: true,
            setHeaders: token ? { 'token': token } : {} // Adds the token to the request    
        });

    return next.handle(modifiedRequest); // Sends the request to the API
    }

}

Once the backend receives this token, it then verifies the token in every single function before even executing the intended request of the method call. This is done through the code:

@app.post("/api/function_name")
async def function_name(request: Request):
    token_res = validate_student(request.headers.get('token')) # Runs a backend function to validate the token by extracting the user details and checking the expiry of the token
    if token_res == False:

        return JSONResponse(status_code=401, content={"message": "Invalid token"}) # Prevents any functions if the token is invalid
    else:

        # Execute the intended code here

To validate the token, the code firsts calls the function validate_student():

def validate_student(token):
    try:

        res = get_student_from_token(token)
        if res == "Token Expired": # Checks if the token is past its expiry date

            return False
        else:

            # Returns details in the form of a list
            return [res['name'], res['email'], res['id']]
            # If the token is invalid, return False
    except InvalidTokenError: 

        return False

    # If the token cannot be decoded, return False

    except InvalidSignatureError: # Checks if the user has tried to use token injection, resulting in an invalid token

        return False

To get the res variable, our code initially decodes the token using the get_student_from_token(token: str) function:

def get_student_from_token(token):

    payload = jwt.decode(token, SECRET_KEY, algorithms = [ALGORITHM])
    expiry = payload.get('expiry')
    if datetime.utcnow() >= datetime.strptime(expiry, '%Y-%m-%d %H:%M:%S.%f'): # Compares the expiry date to the current date based on UCT

        return "Token Expired"
    else:

        return payload.get('details') # Returns the details as a dictionary of the name, email and UID

This means that an expired or invalid token will immediately be rejected but a valid and correct token will be decoded. The app then performs all of its functions based on this data, allowing the app to extract the user's notes based on the OWNER_EMAIL property of each file, ensuring the privacy of user data and maintains the user experience.

If the app returns a 401 error due to a bad token, the frontend intercepts this error and redirects the user to a login/sign-up page, as shown in the code:

@Injectable()
export class ErrorInterceptor implements HttpInterceptor {
    constructor(private router: Router, private route: ActivatedRoute) {}
    getChildRoute(route: ActivatedRoute): ActivatedRoute {  
        while (route.firstChild) {

            route = route.firstChild;
        }

        return route;
    }

    intercept(request: HttpRequest<any>, next: HttpHandler): Observable<HttpEvent<any>> {

        const modifiedRequest = request.clone({ // Clones any incoming requests

            headers: request.headers.set('X-Requested-With', 'XMLHttpRequest'),
            withCredentials: true
        });

        return next.handle(modifiedRequest).pipe( tap(() => {}, (err: any) => {


            if (err instanceof HttpErrorResponse) { // Checks if we have received an error

                if (err.status != 401 ) {

                    return;
                }

                this.router.navigate(['/login']); // Prevents the request from going through and redirects to login page instead

            }
        }));

    }

}

This all ensures that the user can use the app easily, as they do not have to specify their ID when trying to use the app's functions whilst also preventing bad actors from accessing user data.

There is one exception to each API call having a token requirements. This is the cloud_check() function and is used by me to check if the API is responding to API calls. It is a simple get API call:

@app.get("/api/cloud_check")
async def cloud_check():

    return True

And was initially used by my service provider to notify me if my API goes down. This is the only potential vulnerability evident to me, and is more dependent on my provider's (Linode) protection to unauthorised IPv6 & IPv4 calls to the server. When calling the method, it should return the following page:

Cloud Check

Storing user files

When a user uploads a file, it is stored in our backend as a raw .PDF file. These files are not encrypted and are stored as their raw content. Their security is dependent upon the security of our service provider, Linode. The only feasible point of interception is when the file is being transmitted to the Backend from the Frontend.

Before even allowing the data to be uploaded, we first have to verify its size. Gemini AI, like other commerical AIs converts input data and prompts to tokens that it can decode and understand. Similarly, input files are converted to tokens. There is a limit to the number of tokens that the AI can handle. To ensure that the files aren't too large (i.e. too many tokens), we use the count_tokens function of the API. This counts the tokens with the code:

token_no = client.models.count_tokens(model=model_name,contents=client.files.upload(file=file_path, config={'display_name': 'test_data'})).total_tokens

We then compare it to the limit, returning a boolean value depending on the result.

However, we also have this statement encapsulated in a try...except... loop as if this request fails, then we know that the file is too large so we automatically reject it.

NOTE: Files such as handwritten files have a lower capacity as they require more tokens and often cannot be decoded by the AI. To circumvent this, I am working on implementing a form of Vision AI to convert handwritten text to PDF Format. However, this also has its limitations.

Volatile Data

Current Notes

The note that the user currently has selected is stored as a python array, as this allows for quick read/write times and this data does not need to be permanent as the user can simply reselect the notes they are using.

Caching

We also have caching in place. Previously, it would take anywhere from 30s - 2min for the flashcard & question-answer functionality to respond. To avoid this, any generated flashcards are stored in an array. Before generating them, the code checks if flashcards for the given notes already exist. If they do, then those flashcards are returned. If not, flashcards are generated and stored.

Similarly, the code generates 10 questions at a time and stores them in an array relative to the file. When a question is requested, a question is removed from this list and sent. Once the number of cached questions drops below 3, a new set of questions are requested from the AI. This reduces the wait-time for the user and also makes it more cost-effective on my part as I have to pay for fewer requests, albeit there is a larger size per request.

Non-volatile data

Notes

The notes stored are non-volatile as they are stored as PDF data.

User information such as their emails, password and UID's are stored in a sqlite3 database on the SSD of our server

Tokens

Tokens are actually stored on the browser, and sent to the backend on a per-request basis. The Backend simply verifies their validity therefore they are considered non-volatile.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search

Data Structures

Encryption

Tokens

Storing user files

Volatile Data

Current Notes

Caching

Non-volatile data

Notes

Login Data

Tokens