Nucleotide_Essentials.jl

Nucleotide_Essentials.jl

Data Types

Nucleotide_Essentials.FastqRecord — Type

Nucleotide_Essentials.FastqRecord

Components

ID: The unique sequence identifier associated with that entry
sequence: The nucleotide sequence of that entry
quality: The quality scores of that entry
filename: The original file name

source

Nucleotide_Essentials.FastaRecord — Type

Nucleotide_Essentials.FastaRecord

Components

ID: The unique sequence identifier associated with that entry
sequence: The nucleotide sequence of that entry
filename: The original file name

source

Functions

Nucleotide_Essentials.readFastq — Function

Nucleotide_Essentials.readFastq
readFastq(Path::String)

.fastq file => readFastq(Path) => FastqRecord(ID, sequence, quality, filename)

supported keyword arguments include:

Path::String: The full or relative path to a .fastq file

Example:

# Supply the path to a .Fastq file that you would like to import
myfastq = readFastq("myfastq.fastq")

source

Nucleotide_Essentials.readFasta — Function

Nucleotide_Essentials.readFasta

Imports a .fasta file into julia

readFasta(Path::String)

.fasta file => readFasta(Path) => FastaRecord(ID, sequence, filename)

Supported keyword arguments include:

Path::String: The full or relative path to a .fasta file

Example:

# Supply the path to a .fasta file that you would like to import - it is recommended to include `;` in your command to prevent printing potentially large .fasta files in the REPL
myfasta = readFasta("myfasta.fasta");

source

Nucleotide_Essentials.writeFasta — Function

Nucleotide_Essentials.writeFasta

readFasta(Path::String) FastaRecord => writefasta(inputfasta, out, compressed) => .fasta file/.fasta.gz file

Creates a single or multiple entry FastaRecord and outputs either a .fasta or compressed .fasta.gz file to the desired directory

Supported keyword arguments include:

input_fasta::FastaRecord: A FastaRecord with either a single entry or multiple entries
out::String: The full or relative path to the directory where files should be written to
compressed::Bool: Whether or not to write the .fasta files as compressed files or not.
- If true, files will written as .fasta.gz files
- If false, files will written as .fasta files

Example:

# .fasta files can be written as from an already imported FastaRecord in Julia 
myfasta = readFasta("myfasta.fasta");
writeFasta(input_fasta, "example/output/directory, false)

# .fasta files can be written as a .fasta.gz from an already imported FastaRecord in Julia 
myfasta = readFasta("myfasta.fasta");
writeFasta(input_fasta, "example/output/directory", true)

# .fasta files with multiple sequences can be read and written as individual .fasta or .fasta.gz in the same step
myfasta = readFasta("myfasta.fasta");
writeFasta(readFasta("/myfasta.fasta"), "example/output/directory", true);

source

Nucleotide_Essentials.FastqtoFasta — Function

Nucleotide_Essentials.FastqtoFasta

Converts a FastqRecord to a FastaRecord. Can also input and convert a .fastq file to a FastaRecord in the same function.

FastqtoFasta(Fastq::Union{String, FastqRecord}) .fastq file => FastqtoFasta(Fastq) => FastaRecord(ID, sequence, filename) FastqRecord(ID, sequence, quality, filename) => FastqtoFasta(Fastq) => FastaRecord(ID, sequence, filename)

Supported keyword arguments include:

Fastq::Union{String, FastqRecord}:
- The full or relative path to a .fastq file
- A FastqRecord

Example:

# Supply the path to a .fastq file that you would like to convert to a FastRecord
myfasta = FastqtoFasta("myfastq.fastq");

# Alternatively, a FastqRecord can be used as the input 
myfasta = FastqtoFasta(myFastqRecord);

source

Nucleotide_Essentials.FilterQuality_se — Function

Nucleotide_Essentials.FilterQuality_se

Filters an input .fastq file based upon the encoded Phred+33 or Phred+64 quality scores. The encoding of the reads is automatically deteremined by looking for unique encoding in Phred+33 and Phred+64. Phred+64 encoding is identified by searching for ^, a, ], and f.

Reads are filtered based upon the number of expected errors ($\mathrm{E}$) based on the error rate based on quality score and the sum of error probabilities, following the equation:

$\mathrm{E} = \sum{_ip_i} = \sum{_i}10^{\frac{-Q_i}{10}}$

Stringent filtering (maxEE = 1) is used by default but can be adjusted by the user.

Reads that pass the filtering parameters are output to a file ending in _FilteredReads.fastq in the user-determined directory, as indicated by out.

Supported keyword arguments include:

read1::String: Path to the reads to undergo quality filtering
out::String: Path to the directory where reads that pass the quality filtering should be written
maxEE::Int64 (optional): The max number of expected errors a read can include as the filtering parameter (default: maxEE = 1)
verbose::Bool (optional): Whether or not to show some intermediary feedback on the progress of the function (default = false)

Example:

FilterQuality_se("forward_R1.fasta", "/outdirectory")

source

Nucleotide_Essentials.FilterQuality_pe — Function

Nucleotide_Essentials.FilterQuality_pe

Reads are filtered based upon the number of expected errors ($\mathrm{E}$) based on the error rate based on quality score and the sum of error probabilities, following the equation:

$\mathrm{E} = \sum{_ip_i} = \sum{_i}10^{\frac{-Q_i}{10}}$

Stringent filtering (maxEE = 1) is used by default but can be adjusted by the user.

Output Files:

If both the forward and reverse reads pass the filtering parameters:
- Forward reads are output to a file ending in R1_Paired_filtered.fastq in the user-determined directory, as indicated by out
- Reverse reads are output to a file ending in R2_Paired_filtered.fastq in the user-determined directory, as indicated by out
If only the forward read passes the filtering parameters:
- Forward reads are output to a file ending in R1_Unpaired_filtered.fastq in the user-determined directory, as indicated by out
- Reverse reads are not written to a file
If only the reverse read passes the filtering parameters:
- Reverse reads are output to a file ending in R2_Unpaired_filtered.fastq in the user-determined directory, as indicated by out
- Forward reads are not written to a file

Supported keyword arguments include:

read1::String: Path to the forward reads to undergo quality filtering
read2::String: Path to the reverse reads to undergo quality filtering
out::String: Path to the directory where reads that pass the quality filtering should be written
maxEE::Int64 (optional): The max number of expected errors a read can include as the filtering parameter (default: maxEE = 1)
verbose::Bool (optional): Whether or not to show some intermediary feedback on the progress of the function (default = false)

Example:

FilterQuality_pe("forward_R1.fasta", "reverse_R2.fasta", "/outdirectory")

# changing the filtering parameters 
FilterQuality_pe("forward_R1.fasta", "reverse_R2.fasta", "/outdirectory", 2, true)

source

Nucleotide_Essentials.PlotQuality — Function

Nucleotide_Essentials.PlotQuality

Returns a plot of the quality profile of a .fastq or .fastq.gz file

This function plots a visual summary of the distribution of quality scores (automatically detects Phred+33 or Phred+64 encoding) as a function of sequence position for the input fastq file(s).

The plotted lines show summary statistics at each sequence position:

green is the mean
dashed red lines are the 25th and 75th quantiles

Supported keyword arguments include:

Input::FastqRecord: The name of a FastqRecord for plotting
verbose::Bool (optional): Whether or not to show some intermediary feedback on the progress of the function (default = false)
outputfigure::Bool (optional): Whether or not to output a .png file with the created QualityPlot (default = false)
figurepath::String (optional): If outputting a .png figure to file, specify the path to a directory where the file should be written to (default = pwd())

Example:

# A quality profile can be created by supply the path to a .fastq or .fastq.gz file 
PlotQuality("path/to/my/file.fastq")

source

Nucleotide_Essentials.potential_mismatches — Function

Nucleotide_Essentials.potential_mismatches

Returns an Vector{Any} of potential barcodes with a single nucleotide change, including both deletions and substitutions

Supported keyword arguments include:

Path::String: The full or relative path to a .fastq file
mismatch::Int64: The number of altered nucleotides to include (1 is only supported at this time)

Example:

potential_mismatches("GCGT", 1)
17-element Vector{Any}:
"GCGT"
"CCGT" 
"ACGT" 
"TCGT" 
"GGGT" 
"GAGT" 
"GTGT" 
"GCCT" 
"GCAT" 
"GCTT" 
"GCGG" 
"GCGC" 
"GCGA" 
"CGT"
"GGT"
"GCT" 
"GCG"

source

Nucleotide_Essentials.reverse_complement — Function

Nucleotide_Essentials.reverse_complement

Takes a string of nucleotide bases and returns the reverse complement of that string. Accepts inputs of String and SubString{String} (input from a FastqRecord)

Supported keyword arguments include:

sequence::Union{String, SubString{String}}: A string sequence of nucleotide bases or sequence entry from a FastqRecord

Example:

reverse_complement("ATCGT")
"ACGAT"

source

Nucleotide_Essentials.demultiplex_se — Function

Nucleotide_Essentials.demultiplex_se

Compares a list of provided barcodes with the provided multiplexed reads and separates the reads into individual .fastq files. If a barcode is found within the read, the barcode is removed from the sequence. The quality data of the reads is preserved and written to the outputted .fastq file. If a barcode is not found, the sequnce and quality is written to the unassigned .fastq file unchanged.

The mapping file must be either a .csv or .txt file with two columns. The first column heading must be SampleID and the second column heading must be BarcodeSequence.

EXAMPLE MAPPING FILE:

SampleID	BarcodeSequence
Sample1	Barcode1
Sample2	Barcode2
Sample3	Barcode3
Sample4	Barcode4
Sample5	Barcode5
Sample6	Barcode6
Sample7	Barcode7
Sample8	Barcode8

Supported keyword arguments include:

R1::String: Path to multiplexed reads
Map::String: Path to the mapping file
mismatch::Int64=0 (optional): Number of allowed mismatches in barcode. Potential options include 0 or 1. If 1 mismatch, computation time will significantly increase. Default is to allow for 0 mismatches (exact matches only).
debug::Bool=false (optional): If true, a log file will be created and debugging data will be printed while the function is running (default is false).

Example:

demultiplex_se("multiplexreads.fastq", "mapping_file.fastq")

source

Nucleotide_Essentials.demultiplex_pe — Function

Nucleotide_Essentials.demultiplex_pe

Compares a list of provided barcodes with the provided paired-end multiplexed reads and separates the reads into individual .fastq files. If a barcode is found within R1 reads, the barcode is removed from the sequence. The quality data of the reads is preserved and written to the outputted .fastq file. If a barcode is not found, the sequnce and quality is written to the R1 unassigned .fastq file unchanged. If a barcode is found within R2 reads, the barcode is removed from the sequence. The quality data of the reads is preserved and written to the outputted .fastq file. If a barcode is not found, the sequnce and quality is written to the R2 unassigned .fastq file unchanged.

Dual-indexed reads are not yet supported

The mapping file must be either a .csv or .txt file with two columns. The first column heading must be SampleID and the second column heading must be BarcodeSequence.

EXAMPLE MAPPING FILE:

SampleID	BarcodeSequence
Sample1	Barcode1
Sample2	Barcode2
Sample3	Barcode3
Sample4	Barcode4
Sample5	Barcode5
Sample6	Barcode6
Sample7	Barcode7
Sample8	Barcode8

Supported keyword arguments include:

R1::String: Path to forward multiplexed reads
R2::String: Path to reverse multiplexed reads
Map::String: Path to the mapping file
mismatch::Int64=0 (optional): Number of allowed mismatches in barcode. Potential options include 0 or 1. If 1 mismatch, computation time will significantly increase. Default is to allow for 0 mismatches (exact matches only).
debug::Bool=false (optional): If true, a log file will be created and debugging data will be printed while the function is running (default is false).

Example:

demultiplex_pe("forward_multiplexreads.fastq", "reverse_multiplexreads.fastq", "mapping_file.fastq")

source

Index

Nucleotide_Essentials.FastaRecord
Nucleotide_Essentials.FastqRecord
Nucleotide_Essentials.FastqtoFasta
Nucleotide_Essentials.FilterQuality_pe
Nucleotide_Essentials.FilterQuality_se
Nucleotide_Essentials.PlotQuality
Nucleotide_Essentials.demultiplex_pe
Nucleotide_Essentials.demultiplex_se
Nucleotide_Essentials.potential_mismatches
Nucleotide_Essentials.readFasta
Nucleotide_Essentials.readFastq
Nucleotide_Essentials.reverse_complement
Nucleotide_Essentials.writeFasta

Change Log

Nucleotide_Essentials v0.2.0

Added support for quality filtering of .fastq reads
Added support for Gzip compressed files
Performance improvements in PlotQuality() and added support for exporting quality plots
Added support for automatic quality profile encoding detection (Phred+64 and Phred+33 encoding)
Minor documentation updates